Tag Archives: NCBI


Video Tip of the Week: ProSplign in NCBI’s Genome Workbench


There are many tools at NCBI, with a huge range of functions. Literature, sequence data, variations, protein structure, chemicals and bioassays, and more. It’s hard to keep track of what’s available. Their video tutorials are helping me to be aware of new tools, and new features within existing tools. For this week’s Tip of the Week, we’ll look at their recent video for ProSplign. It’s a tool that will help you align protein information to genomic sequences.

Although the Genome Workbench itself has been around for a while (we featured it as a tip it first in 2013), it is constantly underdevelopment, and new features are available regularly. And although this tip focuses on how to use the ProSplign piece, if you haven’t used it much it will help you to understand how a number of tools within the Workbench can be accessed. You can also see that Splign is available in the tool list–which is another NCBI tool for a similar type of process, but with mRNA sequences as the focus.

If you want to have a text-based type of walk-through instead, there is a page that will take you through the features (see the quick links below).  And there are other videos that will help you to explore the Genome Workbench features as well–there’s a handy special playlist of just those videos. Subscribe to their YouTube channel for notices of their new items.

Quick link:

NCBI’s Genome Workbench: http://www.ncbi.nlm.nih.gov/tools/gbench/

Text-based tutorial page: https://www.ncbi.nlm.nih.gov/tools/gbench/tutorial13/


Kapustin, Y., Souvorov, A., Tatusova, T., & Lipman, D. (2008). Splign: algorithms for computing spliced alignments with identification of paralogs Biology Direct, 3 (1) DOI: 10.1186/1745-6150-3-20

NCBI staff (2016). Database resources of the National Center for Biotechnology Information Nucleic Acids Research, 44 (D1) DOI: 10.1093/nar/gkv1290

Viral Genomes at NCBI

Video Tip of the Week: Viral Resources at NCBI (including new #ZikaResearch module)

Viral Genomes at NCBIIt would be hard to escape hearing and reading about the recent drama about the Zika virus. But in terms of viral genomics, that only serves to highlight how important it is to deliver quality sequence information for researchers. So in a timely coincidence, a recent lecture for the CDC on the Viral Resources at NCBI was just made available.

The lecture by Rodney Brister provides an overview of the features of the site and explores ways to access the data with some tools they’ve had in place for a long time. He shows ways to quickly get to family members. Then he described a newer direction that they are taking that includes a separate feature accessed from the same homepage called the Virus Variation Resource (starts ~11min). If you click through you’ll see that there are modules for various viruses–and a Zika module just became available last week. So you can find sequences collected there, and also links to related health resources like the WHO and CDC relevant pages. (That’s not shown in the webinar, which was recorded in December 2015). He shows a dengue example that’s got some additional features too.

There’s also a Retrovirus Resource available (~25min in), which offers “specialized tools for the analysis of retroviral proteins and genomes”. And that’s the place to go if you want the HIV-1 specific module, it’s not with the others that I mentioned before.

It’s longer than our typical tips–but it’s worth exploring if you want to refine your skills at quickly getting to virus data and tools. Just a quick word about a question they had towards the end–about bacteriophage resources. There’s a resource I particularly like for phages, which you can find here: http://www.phisite.org .

Special note: if you intend to use the AmazonAWS services for any viral genome analysis, you should be aware of this.

Quick links:

Viral Resources at NCBI:  http://www.ncbi.nlm.nih.gov/genome/viruses/

Virus Variation Resource: http://www.ncbi.nlm.nih.gov/genome/viruses/variation/

Zika module: http://www.ncbi.nlm.nih.gov/genome/viruses/variation/Zika/


Brister, J., Ako-adjei, D., Bao, Y., & Blinkova, O. (2014). NCBI Viral Genomes Resource Nucleic Acids Research, 43 (D1) DOI: 10.1093/nar/gku1207


Friday SNPpets

This week’s SNPpets include a range of things, from Pardis Sabeti’s recovery from a serious accident to tardigrade genome drama. There are new databases and tools such as the GMO sequence tracker in the EU, to new uses of tools such as Docker, to explore. Reports of a serious BLAST bug. A look at common spreadsheet formatting mistakes and some solutions. It’s not a gene-editing moratorium. And more.

SNPpets_2Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…


Video Tip of the Week: Explore Gene Pages at NCBI with Variation and Expression Information

NCBI has produced some of the most in-depth and reliable bioinformatic tools, in large part because they’ve been building them since the earliest days of the genomics era. ncbi_logo_black I once noted that I remember the oldest web interface, because it was one of the first places that I went for computational tools back in the day. Check out my post here with some of their older interfaces. I remember all of them.

But they don’t rest on their laurels. NCBI teams are always adding new tools, new features, and new data. Sometimes, though, I think that people take them for granted. Or they only keep re-visiting things they know. So recently they asked me to do a walk-through video about how I use the tools, so that I can show people ways they can go further than they might realize. This week’s Video Tip of the Week shows how I can add and explore a lot of data on a Gene page, to examine additional features: variations and expression data, right in the sequence viewer.  A lot of people may not even be aware these tracks exist.

This video provides a walk-through of how to explore variation data and expression data from Gene pages, using the sequence viewer that’s embedded right on the page. Enhance your understanding of your genes of interest quickly using these additional track options.

I hope this demonstrates how you can add more information to your genes of interest from gene pages, which you might already use. But now you can use them better. You can take advantage of the great depth at NCBI, while staying up-to-date on the tools.

Speaking of staying up-to-date, which is a big need in this field: you should definitely keep an eye out for new features from NCBI. My favorite way is the announcement mailing list, because it has details of new stuff, upcoming webinars, data releases, etc. But you can also watch their blog and twitter for new information all the time. They’ve been doing a lot of outreach–webinars, quick videos, and more. You should sign up to be notified, so you can stay current with the best data and tools. Check out their learning/webinars pages to see what I mean.

We are planning another video, and we’d love any feedback you have on this one.

Quick links:

NCBI Gene (subject of the video): https://www.ncbi.nlm.nih.gov/gene/

NCBI-announce mailing list: http://www.ncbi.nlm.nih.gov/mailman/listinfo/ncbi-announce

NCBI Insights blog: http://ncbiinsights.ncbi.nlm.nih.gov

NCBI twitter: @NCBI

Learn (see webinars link): https://www.ncbi.nlm.nih.gov/home/learn.shtml

NCBI YouTube channel: https://www.youtube.com/channel/UCvJHVo5xGSKejBbBj0A5AyQ


NCBI Resource Coordinators (2014). Database resources of the National Center for Biotechnology Information Nucleic Acids Research, 43 (D1) DOI: 10.1093/nar/gku1130

Disclosure: This video was sponsored and completed under contract to NCBI. 

Friday SNPpets

This week’s SNPpets include The Economist and C&EN cover gene editing in different but useful ways, new software for metagenomics, stranded RNA-Seq, the new CDC Public Health Genomics Knowledge Base, IGB paper, Docker getting attention in biology, FDA precision medicine workshops, a nice popular press version of the ExAc (Exome Aggregate Consortium) story,  and the heresy of Keith Robison who doesn’t hate Excel enough.

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…


NCBI to hold two-day genomics hackathon in January

Because this came to my email on the Wednesday before the holiday, it seemed to me that some people might miss it who might like to attend. So I just wanted to boost the signal a bit by re-posting it. It came from the NCBI Announcement mailing list if you want to see the whole thing, I’m excerpting just some of it here. It has an application piece, FYI.

From January 5th to 7th, NCBI will host a genomics hackathon focusing on advanced bioinformatics analysis of next generation sequencing data. This event is for students, postdocs and investigators already engaged in the use of pipelines for genomic analyses from next generation sequencing data. Working groups of 5-6 individuals will be formed for DNA-Seq/multiomics, RNA-Seq, metagenomics and Epigenomics. These groups will build pipelines to analyze large datasets within a cloud infrastructure.

After a basic organizational session, teams will spend 2.5 days analyzing a challenging set of scientific problems related to a group of datasets. Students will analyze and combine datasets in order to work on these problems. This course will take place on the NIH main campus in Bethesda, Maryland.

Datasets will come from the public repositories housed at NCBI. During the course, students will have an opportunity to include other datasets and tools for analysis. Please note, if you use your own data during the course, we ask that you submit it to a public database within six months of the end of the event.

All pipelines and other scripts, software and programs generated in this course will be added to a public GitHub repository designed for that purpose. A manuscript outlining the design of the hackathon and descripting participant processes, products and scientific outcomes will be submitted to an appropriate journal.

To apply, complete the form linked below (approximately 10-15 minutes to complete). Applications are due December 1st by 5pm EST.

Participants will be selected from a pool of applicants; prior students will be given priority in the event of a tie. Accepted applicants will be notified on December 10th by 9am EST, and have until December 12th at noon to confirm their participation. Please include a monitored email address, in case there are follow-up questions.

[some stuff removed here, with requirements, pre-reqs, and some other details on the actual event stuff. See full version here.]

* Genomics hackathon application form: https://docs.google.com/forms/d/1isJT0Ns-5MHX8mH4xQnDEFbhlu4HombXspQQaADQoec/viewform

Hack away.

Video Tip of the Week: NCBI Variation Viewer

The folks at NCBI recently hosted a webinar that covered a number of resources: GTR, ClinVar, and MedGen. It was a nice introduction to these resources using a case study of exploring information about a 9-year-old child who needed to get clearance for participation in sports. So they follow the course of some details about this kid across the different resources at NCBI to show what you could learn at the different sites.

I was hoping that recording would become available so that could be a triple-tip of the week, but I haven’t seen any announcements of it; I’ll keep an eye out and highlight it in the future if it does. Below I have also referenced a paper that covers some of the same ground as that webinar. But in the meantime they also recently added a new short video about the Variation Viewer that I found handy as well. So that will be this week’s video tip.

I particularly liked the way you can easily select an exon to focus on, with the little bubbles near the top. That wasn’t obvious to me at first.  People are often asking me for handy ways to focus in on the specifics of a single exon.

In addition to this video, I will also offer a screen-cap of one of the slides from the longer webinar that linked to related resources around NCBI. If you haven’t checked out these associated tools you will want to look at them as well. There are a lot of terrific tools available and they are always adding new useful features. Follow them on Twitter for announcements about their tools and trainings–that’s how I stay on top of the new items.

NCBI webinar sitesQuick links:

Variation Viewer: http://www.ncbi.nlm.nih.gov/variation/view/

MedGen: http://www.ncbi.nlm.nih.gov/medgen

GTR: http://www.ncbi.nlm.nih.gov/gtr/

ClinVar: http://www.ncbi.nlm.nih.gov/clinvar/


Landrum M.J., G. R. Riley, W. Jang, W. S. Rubinstein, D. M. Church & D. R. Maglott (2014). ClinVar: public archive of relationships among sequence variation and human phenotype, Nucleic Acids Research, 42 (D1) D980-D985. DOI: http://dx.doi.org/10.1093/nar/gkt1113

Video Tip of the Week: NCBI Sequence Viewer PDF export

A couple of weeks back we did a workshop on the UCSC Genome Browser, and I was asked a question we see pretty frequently: Is there a way to export the browser view that you selected with specific tracks, filters, regions, etc? People may want to have a record of their customized view in a lab notebook, or use it for teaching, or in a seminar perhaps–or of course to publish your awesome observations in journals.

Most of the time I just take screen shots of what I need with a screen capture tool (my personal favorite is Snag-It from TechSmith). But there may be times you want something a bit heavier-duty. If you are going to do a poster, or submit it for publication, for example, you might want a nice PostScript version you can work with and edit further. At UCSC, the way to do that is with the “View” menu option here for PDF/PS:

Export the browser image to a file for further editing or use.

Export the browser image to a file for further editing or use.

When you get a file, you can take it down and use Adobe graphics tools if you have them, or free open-source one like InkScape. You can change the colors, delete stuff, add more annotations, etc.

So when I saw that there was a similar function with the NCBI‘s Sequence Viewer tool, I thought I should mention that as well. They have a nice and clear video that illustrates how to accomplish getting the image out of the Viewer and into a file.

To try this yourself on the sample file they showed, you can go to NC_000022.10 in the Nucleotide database. From that page, click the “Graphics” link as shown here:

Click the "Graphics" link on the page to open the Sequence Viewer.

Click the “Graphics” link on the page to open the Sequence Viewer.

After you get to the sequence viewer, follow the instructions just as it plays out in the YouTube video. It’s pretty straight-forward–just watch out to click the right menu for PDFs.

If you haven’t used the NCBI Sequence Viewer much, you should definitely check it out. There are some other helpful videos for more features as well. And another neat feature is that you can embed sequence viewer in your own web pages.

All of the genome browsers have different features and functions, and it’s nice to know that there are various strategies to accomplish tasks you might need to get done.

Quick links:

Sequence Viewer homepage http://www.ncbi.nlm.nih.gov/tools/sviewer/

Videos: http://www.ncbi.nlm.nih.gov/tools/sviewer/videos/


Karolchik D., Barber G.P., Casper J., Clawson H., Cline M.S., Diekhans M., Dreszer T.R., Fujita P.A., Guruvadoo L. & Haeussler M. & (2013). The UCSC Genome Browser database: 2014 update, Nucleic Acids Research, 42 (D1) D764-D770. DOI:

Acland A., Agarwala R., Barrett T., Beck J., Benson D.A., Bollin C., Bolton E., Bryant S.H., Canese K. & Church D.M. & (2013). Database resources of the National Center for Biotechnology Information, Nucleic Acids Research, 42 (D1) D7-D17. DOI:

Antique (or maybe “classic”) UCSC Genome Browsers retired

Everyone is pretty comfortable with the concept of the non-standard but commonly used dog-years as a way to compare life spans to humans. Car enthusiasts have a taxonomy for the ages of vehicles. But I’ve been sitting here wondering what the genome-browser-years scale should be. I’ve been thinking about it because of the recent announcement over the UCSC list the other day:

Change in visualization access for old assemblies

Hello everyone!

Over the past 12 years we have made efforts to maintain visualization of many old, archived assemblies on our Archive server at http://genome-archive.cse.ucsc.edu/, in addition to providing download access to the associated data sets. Unfortunately, this visualization is no longer sustainable for very old assemblies due to the many changes in the Genome Browser software as it has matured. We are therefore reducing access to certain old assemblies to data downloads only, and are announcing the shutdown of our Archive server. We will continue to provide Genome Browser access for the 4 most current human assemblies, and at least the 2 most current assemblies for all other organisms with some exceptions. We will discontinue our visualization support for all other old assemblies, but will continue to make these data sets available on our download servers. The assemblies currently on our archive server for which we have discontinued visualization support include early human assembly drafts, hg4, hg5, hg6, hg7, hg8, hg10, hg11, hg12, hg13, hg15, rn1, rn2, mm1, mm2, mm3, mm4, mm5, rheMac1, bosTau1, ce1, danRer1, and danRer2. Links to the data and annotations associated with these assemblies have been added in the appropriate places on our Downloads page at http://hgdownload.soe.ucsc.edu/. Please contact us if you have difficulty locating a data set of interest.

Matthew Speir
UCSC Genome Bioinformatics Group

A view of the UCSC Genome Browser in the early days, ~2004.

A view of the UCSC Genome Browser in the early days, ~2004.

I’m sure the browser versions aren’t used very much anymore, but it was something I needed to be aware of. In our workshops I mention that the older versions have been available from the “archives” navigation on the landing page, but that will be gone now. Occasionally there are old papers that reference a genomic span that you want to revisit–but that’s becoming less common for those really old assemblies at this point. The data will persist for downloading, but the browser visuals will be gone. But it made me want to go back and look through my old materials to see what the early browsers looked like (click on the image to embiggen). A lot of the foundational structure is the same, but if you look at one of these old assemblies you might be surprised.  A lot fewer tracks, that’s for sure. In my shot, there are only a few species in the Conservation track (human, chimp, mouse, rat, chicken). We just didn’t have that much data available–not just across species, but other types of techniques and tools. Fewer tracks. Fewer functionality buttons.

This was almost as much fun as looking back at the old NCBI interfaces that I remember from way back. Those of you who have been in this rodeo for a while may remember those. Some of you will even remember the key “Pedro’s Tools” from back in the day. Sometimes it’s worth looking back at where we came from to realize how much further we are than we realized. I know there’s a lot of grousing about not having cured cancer and changed the pharmaceutical industry with the human genome sequence yet–but we really haven’t had the data that long in browser-years. Or maybe it’s more like Mars years–longer than you realize, despite the fairly comparable span of a single day. The arc of science is long, but it bends toward answers.

Edit: I realized I should look at the earliest paper and add that reference below. Check out Figure 1 for an even older view of the data.

Quick link:

UCSC Genome Browser: http://genome.ucsc.edu


Kent W.J., Sugnet C.W., Furey T.S., Roskin K.M., Pringle T.H., Zahler A.M. & Haussler D. (2002). The Human Genome Browser at UCSC, Genome Res., 12 996-1006. DOI:

Tip of the Week: NCBI Genome Workbench

Today’s tip is from NCBI. Specifically, NCBI’s Genome Workbench. The workbench is

…an integrated application for viewing and analyzing sequence data. With Genome Workbench, you can view data in publically available sequence databases at NCBI, and mix this data with your own private data.

It’s a useful program and they have a great set of videos to introduce you to the workbench’s functions and features. The video embedded here is the introduction, but they also have several additional videos including how to load a genome into the workbench, phylogenies and others. Check it out.

( forgive the delay of this week’s tip. Snow canceled work, and knocked out Internet access!)