Tag Archives: NCBI


Video Tip of the Week: Explore Gene Pages at NCBI with Variation and Expression Information

NCBI has produced some of the most in-depth and reliable bioinformatic tools, in large part because they’ve been building them since the earliest days of the genomics era. ncbi_logo_black I once noted that I remember the oldest web interface, because it was one of the first places that I went for computational tools back in the day. Check out my post here with some of their older interfaces. I remember all of them.

But they don’t rest on their laurels. NCBI teams are always adding new tools, new features, and new data. Sometimes, though, I think that people take them for granted. Or they only keep re-visiting things they know. So recently they asked me to do a walk-through video about how I use the tools, so that I can show people ways they can go further than they might realize. This week’s Video Tip of the Week shows how I can add and explore a lot of data on a Gene page, to examine additional features: variations and expression data, right in the sequence viewer.  A lot of people may not even be aware these tracks exist.

This video provides a walk-through of how to explore variation data and expression data from Gene pages, using the sequence viewer that’s embedded right on the page. Enhance your understanding of your genes of interest quickly using these additional track options.

I hope this demonstrates how you can add more information to your genes of interest from gene pages, which you might already use. But now you can use them better. You can take advantage of the great depth at NCBI, while staying up-to-date on the tools.

Speaking of staying up-to-date, which is a big need in this field: you should definitely keep an eye out for new features from NCBI. My favorite way is the announcement mailing list, because it has details of new stuff, upcoming webinars, data releases, etc. But you can also watch their blog and twitter for new information all the time. They’ve been doing a lot of outreach–webinars, quick videos, and more. You should sign up to be notified, so you can stay current with the best data and tools. Check out their learning/webinars pages to see what I mean.

We are planning another video, and we’d love any feedback you have on this one.

Quick links:

NCBI Gene (subject of the video): https://www.ncbi.nlm.nih.gov/gene/

NCBI-announce mailing list: http://www.ncbi.nlm.nih.gov/mailman/listinfo/ncbi-announce

NCBI Insights blog: http://ncbiinsights.ncbi.nlm.nih.gov

NCBI twitter: @NCBI

Learn (see webinars link): https://www.ncbi.nlm.nih.gov/home/learn.shtml

NCBI YouTube channel: https://www.youtube.com/channel/UCvJHVo5xGSKejBbBj0A5AyQ


NCBI Resource Coordinators (2014). Database resources of the National Center for Biotechnology Information Nucleic Acids Research, 43 (D1) DOI: 10.1093/nar/gku1130

Disclosure: This video was sponsored and completed under contract to NCBI. 

Friday SNPpets

This week’s SNPpets include The Economist and C&EN cover gene editing in different but useful ways, new software for metagenomics, stranded RNA-Seq, the new CDC Public Health Genomics Knowledge Base, IGB paper, Docker getting attention in biology, FDA precision medicine workshops, a nice popular press version of the ExAc (Exome Aggregate Consortium) story,  and the heresy of Keith Robison who doesn’t hate Excel enough.

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

NCBI to hold two-day genomics hackathon in January

Because this came to my email on the Wednesday before the holiday, it seemed to me that some people might miss it who might like to attend. So I just wanted to boost the signal a bit by re-posting it. It came from the NCBI Announcement mailing list if you want to see the whole thing, I’m excerpting just some of it here. It has an application piece, FYI.

From January 5th to 7th, NCBI will host a genomics hackathon focusing on advanced bioinformatics analysis of next generation sequencing data. This event is for students, postdocs and investigators already engaged in the use of pipelines for genomic analyses from next generation sequencing data. Working groups of 5-6 individuals will be formed for DNA-Seq/multiomics, RNA-Seq, metagenomics and Epigenomics. These groups will build pipelines to analyze large datasets within a cloud infrastructure.

After a basic organizational session, teams will spend 2.5 days analyzing a challenging set of scientific problems related to a group of datasets. Students will analyze and combine datasets in order to work on these problems. This course will take place on the NIH main campus in Bethesda, Maryland.

Datasets will come from the public repositories housed at NCBI. During the course, students will have an opportunity to include other datasets and tools for analysis. Please note, if you use your own data during the course, we ask that you submit it to a public database within six months of the end of the event.

All pipelines and other scripts, software and programs generated in this course will be added to a public GitHub repository designed for that purpose. A manuscript outlining the design of the hackathon and descripting participant processes, products and scientific outcomes will be submitted to an appropriate journal.

To apply, complete the form linked below (approximately 10-15 minutes to complete). Applications are due December 1st by 5pm EST.

Participants will be selected from a pool of applicants; prior students will be given priority in the event of a tie. Accepted applicants will be notified on December 10th by 9am EST, and have until December 12th at noon to confirm their participation. Please include a monitored email address, in case there are follow-up questions.

[some stuff removed here, with requirements, pre-reqs, and some other details on the actual event stuff. See full version here.]

* Genomics hackathon application form: https://docs.google.com/forms/d/1isJT0Ns-5MHX8mH4xQnDEFbhlu4HombXspQQaADQoec/viewform

Hack away.

Video Tip of the Week: NCBI Variation Viewer

The folks at NCBI recently hosted a webinar that covered a number of resources: GTR, ClinVar, and MedGen. It was a nice introduction to these resources using a case study of exploring information about a 9-year-old child who needed to get clearance for participation in sports. So they follow the course of some details about this kid across the different resources at NCBI to show what you could learn at the different sites.

I was hoping that recording would become available so that could be a triple-tip of the week, but I haven’t seen any announcements of it; I’ll keep an eye out and highlight it in the future if it does. Below I have also referenced a paper that covers some of the same ground as that webinar. But in the meantime they also recently added a new short video about the Variation Viewer that I found handy as well. So that will be this week’s video tip.

I particularly liked the way you can easily select an exon to focus on, with the little bubbles near the top. That wasn’t obvious to me at first.  People are often asking me for handy ways to focus in on the specifics of a single exon.

In addition to this video, I will also offer a screen-cap of one of the slides from the longer webinar that linked to related resources around NCBI. If you haven’t checked out these associated tools you will want to look at them as well. There are a lot of terrific tools available and they are always adding new useful features. Follow them on Twitter for announcements about their tools and trainings–that’s how I stay on top of the new items.

NCBI webinar sitesQuick links:

Variation Viewer: http://www.ncbi.nlm.nih.gov/variation/view/

MedGen: http://www.ncbi.nlm.nih.gov/medgen

GTR: http://www.ncbi.nlm.nih.gov/gtr/

ClinVar: http://www.ncbi.nlm.nih.gov/clinvar/


Landrum M.J., G. R. Riley, W. Jang, W. S. Rubinstein, D. M. Church & D. R. Maglott (2014). ClinVar: public archive of relationships among sequence variation and human phenotype, Nucleic Acids Research, 42 (D1) D980-D985. DOI: http://dx.doi.org/10.1093/nar/gkt1113

Video Tip of the Week: NCBI Sequence Viewer PDF export

A couple of weeks back we did a workshop on the UCSC Genome Browser, and I was asked a question we see pretty frequently: Is there a way to export the browser view that you selected with specific tracks, filters, regions, etc? People may want to have a record of their customized view in a lab notebook, or use it for teaching, or in a seminar perhaps–or of course to publish your awesome observations in journals.

Most of the time I just take screen shots of what I need with a screen capture tool (my personal favorite is Snag-It from TechSmith). But there may be times you want something a bit heavier-duty. If you are going to do a poster, or submit it for publication, for example, you might want a nice PostScript version you can work with and edit further. At UCSC, the way to do that is with the “View” menu option here for PDF/PS:

Export the browser image to a file for further editing or use.

Export the browser image to a file for further editing or use.

When you get a file, you can take it down and use Adobe graphics tools if you have them, or free open-source one like InkScape. You can change the colors, delete stuff, add more annotations, etc.

So when I saw that there was a similar function with the NCBI‘s Sequence Viewer tool, I thought I should mention that as well. They have a nice and clear video that illustrates how to accomplish getting the image out of the Viewer and into a file.

To try this yourself on the sample file they showed, you can go to NC_000022.10 in the Nucleotide database. From that page, click the “Graphics” link as shown here:

Click the "Graphics" link on the page to open the Sequence Viewer.

Click the “Graphics” link on the page to open the Sequence Viewer.

After you get to the sequence viewer, follow the instructions just as it plays out in the YouTube video. It’s pretty straight-forward–just watch out to click the right menu for PDFs.

If you haven’t used the NCBI Sequence Viewer much, you should definitely check it out. There are some other helpful videos for more features as well. And another neat feature is that you can embed sequence viewer in your own web pages.

All of the genome browsers have different features and functions, and it’s nice to know that there are various strategies to accomplish tasks you might need to get done.

Quick links:

Sequence Viewer homepage http://www.ncbi.nlm.nih.gov/tools/sviewer/

Videos: http://www.ncbi.nlm.nih.gov/tools/sviewer/videos/


Karolchik D., Barber G.P., Casper J., Clawson H., Cline M.S., Diekhans M., Dreszer T.R., Fujita P.A., Guruvadoo L. & Haeussler M. & (2013). The UCSC Genome Browser database: 2014 update, Nucleic Acids Research, 42 (D1) D764-D770. DOI:

Acland A., Agarwala R., Barrett T., Beck J., Benson D.A., Bollin C., Bolton E., Bryant S.H., Canese K. & Church D.M. & (2013). Database resources of the National Center for Biotechnology Information, Nucleic Acids Research, 42 (D1) D7-D17. DOI:

Antique (or maybe “classic”) UCSC Genome Browsers retired

Everyone is pretty comfortable with the concept of the non-standard but commonly used dog-years as a way to compare life spans to humans. Car enthusiasts have a taxonomy for the ages of vehicles. But I’ve been sitting here wondering what the genome-browser-years scale should be. I’ve been thinking about it because of the recent announcement over the UCSC list the other day:

Change in visualization access for old assemblies

Hello everyone!

Over the past 12 years we have made efforts to maintain visualization of many old, archived assemblies on our Archive server at http://genome-archive.cse.ucsc.edu/, in addition to providing download access to the associated data sets. Unfortunately, this visualization is no longer sustainable for very old assemblies due to the many changes in the Genome Browser software as it has matured. We are therefore reducing access to certain old assemblies to data downloads only, and are announcing the shutdown of our Archive server. We will continue to provide Genome Browser access for the 4 most current human assemblies, and at least the 2 most current assemblies for all other organisms with some exceptions. We will discontinue our visualization support for all other old assemblies, but will continue to make these data sets available on our download servers. The assemblies currently on our archive server for which we have discontinued visualization support include early human assembly drafts, hg4, hg5, hg6, hg7, hg8, hg10, hg11, hg12, hg13, hg15, rn1, rn2, mm1, mm2, mm3, mm4, mm5, rheMac1, bosTau1, ce1, danRer1, and danRer2. Links to the data and annotations associated with these assemblies have been added in the appropriate places on our Downloads page at http://hgdownload.soe.ucsc.edu/. Please contact us if you have difficulty locating a data set of interest.

Matthew Speir
UCSC Genome Bioinformatics Group

A view of the UCSC Genome Browser in the early days, ~2004.

A view of the UCSC Genome Browser in the early days, ~2004.

I’m sure the browser versions aren’t used very much anymore, but it was something I needed to be aware of. In our workshops I mention that the older versions have been available from the “archives” navigation on the landing page, but that will be gone now. Occasionally there are old papers that reference a genomic span that you want to revisit–but that’s becoming less common for those really old assemblies at this point. The data will persist for downloading, but the browser visuals will be gone. But it made me want to go back and look through my old materials to see what the early browsers looked like (click on the image to embiggen). A lot of the foundational structure is the same, but if you look at one of these old assemblies you might be surprised.  A lot fewer tracks, that’s for sure. In my shot, there are only a few species in the Conservation track (human, chimp, mouse, rat, chicken). We just didn’t have that much data available–not just across species, but other types of techniques and tools. Fewer tracks. Fewer functionality buttons.

This was almost as much fun as looking back at the old NCBI interfaces that I remember from way back. Those of you who have been in this rodeo for a while may remember those. Some of you will even remember the key “Pedro’s Tools” from back in the day. Sometimes it’s worth looking back at where we came from to realize how much further we are than we realized. I know there’s a lot of grousing about not having cured cancer and changed the pharmaceutical industry with the human genome sequence yet–but we really haven’t had the data that long in browser-years. Or maybe it’s more like Mars years–longer than you realize, despite the fairly comparable span of a single day. The arc of science is long, but it bends toward answers.

Edit: I realized I should look at the earliest paper and add that reference below. Check out Figure 1 for an even older view of the data.

Quick link:

UCSC Genome Browser: http://genome.ucsc.edu


Kent W.J., Sugnet C.W., Furey T.S., Roskin K.M., Pringle T.H., Zahler A.M. & Haussler D. (2002). The Human Genome Browser at UCSC, Genome Res., 12 996-1006. DOI:

Tip of the Week: NCBI Genomics Workbench

Today’s tip is from NCBI. Specifically, NCBI’s Genome Workbench. The workbench is

…an integrated application for viewing and analyzing sequence data. With Genome Workbench, you can view data in publically available sequence databases at NCBI, and mix this data with your own private data.

It’s a useful program and they have a great set of videos to introduce you to the workbench’s functions and features. The video embedded here is the introduction, but they also have several additional videos including how to load a genome into the workbench, phylogenies and others. Check it out.

( forgive the delay of this week’s tip. Snow canceled work, and knocked out Internet access!)

Video Tip of the Week: 1000 Genomes Dataset Browser from NCBI

A recent NCBI Newsletter announced the release of a new resource named the 1000 Genomes Dataset Browser, and that is the resource that I will be featuring in this tip. It is one of the tools available through the new NCBI Variation resources page, which also features resources such as dbSNP, dbVar, dbGaP and ClinVar (many of which OpenHelix has tutorials for) as well as other variation tools – Variation Reporter (pre-release version), Clinical Remap (beta version) and the Phenotype-Genotype Integrator.

Before I discuss NCBI’s 1000 Genomes Dataset Browser, I’d like to spend a bit of time on the 1000 Genomes project, in order to distinguish what is from NCBI and what is from the project itself. From the 1000 Genomes Pilot paper:

“The aim of the 1000 Genomes Project is to discover, genotype and provide accurate haplotype information on all forms of human DNA polymorphism in multiple human populations. Specifically, the goal is to characterize over 95% of variants that are in genomic regions accessible to current high-throughput sequencing technologies and that have allele frequency of 1% or higher (the classical definition of polymorphism) in each of five major population groups (populations in or with ancestry from Europe, East Asia, South Asia, West Africa and the Americas).”

You can access the full paper from the link below. The project has now moved past the pilot phase and is releasing new data all the time. You can see announcements and project details, or access that data, through the official 1000 Genomes project site, or through the official 1000 Genomes version of the Ensembl Browser. As you might imagine for a “big data” project such as this, data has been added to a variety of NCBI databases, including dbSNP, the Sequence Read Archive (SRA) and BioSample. Although you could search for this data through the universal Entrez search system, previously to view the data you would have to view individual results at each separate database. The 1000 Genomes Browser at NCBI has been created as a powerful interface for comprehensively searching for, and viewing, 1000 Genomes data contained in NCBI resources on a single page.

In the video tip I will familiarize you to the various areas of the page - the browser is created with series of widgets, each with its own function. I will not be able to cover all of the features, or demonstrate how users can upload their own variation data to the browser – I’ll leave you the fun of exploring those on your own. Because the tool is so young, bugs and suggestions/comments are still being actively requested – if you find something, check out the FAQs (which discuss bugs at various stages of being fixed) and then email the team.

Quick Links:
NCBI Newsletter announcement July 20, 2012: http://1.usa.gov/RQu5dR

NCBI Variation page: http://www.ncbi.nlm.nih.gov/variation/

NCBI 1000 Genomes Browser page:

1000 Genomes Project site: http://www.1000genomes.org/home

The 1000 genomes project specific version of the Ensembl Browser:

The 1000 Genomes Project Consortium (2010). A map of human genome variation from population-scale sequencing Nature, 467, 1061-1073 DOI: 10.1038/nature09534

Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

Video Tip of the Week: the New PubMed Filters Sidebar

In today’s tip I am linking to a YouTube video from NCBI that briefly explains the new Filters Sidebar feature that has been added to PubMed. We first saw a tweet that the change was coming back on May 2nd, just as I was completing a total update to our full PubMed tutorial*.

I struggled with whether to hold our production team for the new sidebar, or to produce our tutorial with the plan to update in the near future – it is always a struggle to know which is the best option because resource changes can occur at the speed of light, or according to geological time scales (ok, that’s an exaggeration but it feels that way when you want to release a wonderful, up-to-date project & something holds you up and causes delayed publication of our tutorial materials). With PubMed I was lucky – I saw a tweet that the sidebar feature would be added “in the next week”. I asked our voice professional to put the script on hold & I paced around PubMed waiting to see what (& when) things would occur.

True to their word, the sidebar feature showed up on PubMed results on May 10th, exactly one week since I had seen the “in the next week” announcement – my THANKS to the NCBI & PubMed Teams! :) Not only did they push out their updates in a timely manner, they made a YouTube video explaining the changes & discussing where future changes are slated to go. The video is clear, and quick, so I am using it as my tip this week. I’m not sure the feature is 100% stable, as I show in the image below, and describe later in the post, but I think the change might accomplish NCBI’s goal – for more people to notice & utilize filters for their searches.

In the video the narrator states that the filters area is gone & the two default filters are permanently selected, as indicated by the check marks that can’t be “unclicked”. I”m not seeing those check marks on either “Free full text available” link (shown) or the “Review” link, which is not in view in my image. I also see a difference as to whether I get the right filtered subsets depending on whether I am logged into My NCBI (the upper window shown in the back of the image), or not (the lower, front window). In my hands IE 9.0 & Firefox 12.0 both function similarly in these aspects.

The NCBI video doesn’t really show how results look after filters are added, but in playing with it to me it looks like all of your filters are applied to your search & you only get one set of results, not links to various subsets. Although it is now easier to add filters to searches, if that’s how filters are going to work going forward, I think I will miss the old filters – I kind of like being able to switch between various subcategories of results without having to change my filters or rerun searches. Be sure to share your thoughts & preferences with NCBI so that they can create the best resource for their users needs!

* OpenHelix tutorial for this resource available for individual purchase or through a subscription.

Quick links:

OpenHelix Introductory Tutorial on using PubMed (soon to be updated): http://www.openhelix.com/cgi/tutorialInfo.cgi?id=70

PubMed Resource: http://www.pubmed.gov/

PubMed Reference:
Sayers, E.W., Barrett, T., Benson, D.A., Bolton, E., Bryant, S.H., Canese, K., Chetvernin, V., Church, D.M., DiCuccio, M., Federhen, S. & (2011). Database resources of the National Center for Biotechnology Information, Nucleic Acids Research, 40 (D1) D25. DOI: 10.1093/nar/gkr1184