The last tip of the week I did was Genome Variation Tour I where we started our journey following one SNP in an individual’s genome through various databases to see what we can find out about that variation. In that tip we started out by looking at a SNP in the CYP4F2 gene in the UCSC Genome Browser and followed it to dbSNP. Today’s tip will continue our journey to OMIM to see what information we can find there. We’ll find this variation is clinically associated with Warfarin dosage effects and specifically this individual’s C/T heterozygosity indicates an intermediate dosage for effectiveness if indeed he ever needed this drug. In some ways, your guess is as good as mine as to what we will find and what avenues we will be taking in the next few tips I’ll be doing. I’m am discovering information as I go along too. I can tell you though that the next installment of the genome variation tour will take us to PubMed, and a few not particularly well known but gem databases perhaps and probably back to the UCSC Genome Browser to expand our look at the interactions of several variations in this individuals genome.
Comprehensive tutorials on the publicly available NCBI resources enable researchers to quickly and effectively use these invaluable resources.
The National Center for Biotechnology Information, NCBI, is home to many of the most commonly used publicly available databases and tools in molecular biology today. They house such popular and widely used databases as GenBank, PubMed, GEO, Entrez Gene, Entrez Protein, and more. NCBI also produces, maintains and updates a variety of tools, like the large family of BLAST sequence similarity searching tools and the Entrez search and retrieval tools. In addition, they provide an extensive variety of services for education, news dissemination and different types of data submission. This tutorial presents a broad overview of NCBI’s databases, tools, educational resources and data submission protocols. In addition to an update on this overview, OpenHelix has updated both it’s PubMed and GEO tutorials. PubMed is the premiere search engine for biomedical literature. More than 18 million citations from life science journals can be searched through this free service. The Gene Expression Omnibus, or GEO, is a valuable resource designed to store high-throughput gene expression and molecular abundance data. These three tutorials, in conjunction with the many other OpenHelix up-to-date tutorials on NCBI resources such as BLAST, Entrez, dbSNP, MMDB, Viral resoruces, MapViewer and others will give you a set of training resources to help be efficient and effective at accessing and analyzing genome data.
The tutorial suites, available through an annual OpenHelix subscription, contain an online, narrated, multimedia tutorial, which runs in just about any browser connected to the web, along with slides with full script, handouts and exercises. With the tutorials, researchers can quickly learn to effectively and efficiently use these resources. The scripts, handouts and other materials can also be used as a reference or for training others.
These tutorials will teach users:
*to understand the basic structure of NCBI and its different types of resources
*to navigate NCBI to find the databases and analysis tools you need
*what types of educational resources are available at NCBI
*basic data submission procedures and background information
*how to search the entire NCBI site, as well as just the subset of Entrez databases
*basic, advanced, and Boolean search methods
*additional searching methods like the Entrez Global query and the MeSH query
*tips to understand the visual cues and displays
*to use My NCBI to customize your results and save searches which can be run and emailed regularly
Gene Expression Omnibus (GEO)
*efficient ways to query GEO for specific genes or experimental designs
*how to navigate through GEO output displays to find the specific information you want
*how to navigate GEO’s complex data architecture to search GEO by specific record types
OpenHelix, LLC, (www.openhelix.com) provides a bioinformatics and genomics search and training portal, giving researchers one place to find and learn how to use resources and databases on the web. The OpenHelix Search portal searches hundreds of resources, tutorial suites and other material to direct researchers to the most relevant resources and OpenHelix training materials for their needs. Researchers and institutions can save time, budget and staff resources by leveraging a subscription to nearly 100 online tutorial suites available through the portal. More efficient use of the most relevant resources means quicker and more effective research.
Today’s tip of the week is actually the first in a series of tips I will be doing over the next couple months. The recent paper in Lancet did a clinical assessment of an individual genome. In doing so, the researchers used various genomic resources do ascertain and interpret the data. We have a free tutorial on NIEHS SNPs that walks through some of these resources, but I thought it might be useful to follow one specific nucleotide variation through a lot more genomic databases to show the user what data is available and how to access it. Each tip I do over the next couples months (not every week, I do tips every 2-3 weeks) will follow a specific SNP through the databases. In this case, rs108622 in the CYP4F2 gene (cytochrome P450, family 4). These tips aren’t for the genome jockey’s and SNP surfers among us, they are more an introductory tour of what’s out there. They will be useful for those just starting to look at genomic variations, medical practitioners, clinicians or those just curious what is available. Today’s tip will start with the UCSC Genome Browser, find the variation and follow it through to dbSNP. Next tip will look closer at the dbSNP information and then follow the trail to OMIM and GeneTests. In later tips we’ll take the variation to another 4-6 different databases and genomic variation resources from HapMap and others. In the posts themselves I’ll link to even other variation databases. There is a plethora of them.
The Lancet paper, Clinical assessment incorporating a personal genome, has held my fascination this weekend (yes, I read it at the beach). Mary posted Friday and again Saturday on the paper and related NPR segment. It feels to me to be a seminal paper, though I do agree with Daniel at Genetic Future, there are a lot there we still don’t know. A large portion of the variation is in non-coding regions, and thus predictions and propensities are hard to come by with the available analysis. In fact, as he pointed out, many of the coding region variations have little information as to their effect on disease. I would add also that even if we get to that holy grail of $1,000 to sequence a personal genome, this kind of extensive analysis would still be time and cost-prohibitive for the vast majority of sequenced genomes.
Yet, as with all early steps in science and medicine, there’s missing pieces, large gaps and huge efforts (think “space travel,” “computers,” “microwave ovens,” “internet,”) that over time become inexpensive and commonplace (ok, so the former isn’t necessarily “inexpensive”). Sequencing genomes will become inexpensive before the analysis does, but both will come. And I think this paper is pointing to that future.
The other hurdle to large scale personal genomics I see (of course) is the understanding and use of the genomics and data resources. The authors use a large (and excellent, in my opinion) suite of genomics resources to do obtain data and do their analysis. I’ll list them here with links in alphabetical order:
All of these resources have a wealth of data, but even then, that is a lot of analysis and familiarization that is needed with each tool. Each tool does have documentation and tutorials, and of course OpenHelix has tutorials on many of the ones mentioned (those with linked “T”s after the name). Still, this one analysis took a large number of tools and familiarization.
The paper does have a pretty good figure (figure 1) outlining the analysis process. For example, they SIFTed the genome to find gene-associated, non-synonymous, rare and novel and disease associated variations and then analyzed those using dbSNP, HGMD, OMIM and PubMed to analyze something like HFE2 which might have an association with Haemochromotosis. One of my quibbles with the paper, as often is with these papers, is that there isn’t a good methods ‘walk-through’ of the paper using something like Galaxy or Taverna in a history or workflow that would help reproduce the analysis.
We also have a tutorial I’d like to point you to, one that walks through a similar process and teaches users the basics of walking through that process. You can find this tutorial here, it’s free and publicly available. The tutorial walks the user through the analysis of a gene variation, in this case in the CYPC9 that effects an individual’s response to Warfarin. There is a similar variation (different gene, affects same drug response) in the paper. The tutorial uses the NIEHS SNPs site to get an overview of the variation including SIFT and PolyPhen predictions, then to the UCSC Genome Browser to find an overview of the region, walks through the dbSNP information and does a quick tag SNP analysis using GVS. That tutorial is only one very small step in what will have to be a immense education into genomics and genomics resources.
That is all to point out that the paper is an fascinating first step, and as a first step suggests the gaping holes we will have in bringing personal genomics to medicine.
Ashley, E., Butte, A., Wheeler, M., Chen, R., Klein, T., Dewey, F., Dudley, J., Ormond, K., Pavlovic, A., & Morgan, A. (2010). Clinical assessment incorporating a personal genome The Lancet, 375 (9725), 1525-1535 DOI: 10.1016/S0140-6736(10)60452-7
There are a lot of databases to search for to find SNP data, HapMap, dbSNP, SeattleSNPs, Genome Variation Server and many more. I’m going to add one more to your data mining arsenal, F-SNP. F-SNP (described more fully here in the 2008 NAR Database issue),
provides integrated information about the functional effects of SNPs obtained from 16 bioinformatics tools and databases. The functional effects are predicted and indicated at the splicing, transcriptional, translational, and post-translational level. As such, the F-SNP database helps identify and focus on SNPs with potential pathological effect to human health.
…as they say in the introduction. It looks to be a good first stop to find SNPs of functional relevance. The databases they pull from to get their information include several I’ve mentioned above and also the UCSC Genome database, Ensembl, SIFT and PolyPhen predictions and more. I’ve given a quick intro in the tip this week on how to get functional SNP information from F-SNP.
dbSNP is the largest polymorphism database available, including SNPs from many different organisms. dbSNP now has a new search mechanism that allows the researcher to search using HGVS nomenclature for human variation. Not only this, but the feature allows you to annotate the dbSNP rs record that you found, or if you haven’t found one, add the new information to the database. To find out more about HGVS nomenclature, you should check out these recommendations. And, as a side note to this tip, you might want to check out their list of human variation databases and resources.
We go through the thousands of resources and databases available online in our search to do tutorials we found many that are great resources but for one or more reasons we don’t or can’t do a tutorial for. Yet they are great resources. So, we occasionally do “Tip of the Week” on some, but even those are not enough to at least touch on all the great resources out there, so occasionally I we are going to give a quick “shout out” to some of these resources occasionally.
So today it’s F-SNP.
Mary pointed out (and I’ve Tivo’d for my daughter, the great lover of rats) that the History channel had a special on rats recently. Well, not to be outdone, Nature Genetics May issue is all about rats. There are some great articles in that issue about rat genetics and rat genetics as a model for human disease genetics. And if you are at all interested in Rat genomics, the article “What everybody should know about the rat genome and it’s online resources” is a very useful read (and freely accessible I believe). The article will walk you through some of the rat genome data and tools available at Ensembl, NCBI, UCSC Genome Browser and RGD *. Definitely worth checking out.
For funding reasons, NCBI (home of PubMed, BLAST, dbSNP, OMIM and more) has cut their outreach staff, canceled all onsite training seminars and this has to mean decreased support for online help, documentation and tutorials.
When we wrote our NIH grant, one of the models of success in the bioinformatics training area that we highlighted was the NCBI Field Guide program. For those who may be unfamiliar with it, it is a set of training modules delivered by the outreach team at NCBI. They would come to your site, cover many NCBI tools and do hands-on workshops. Another course (Enhanced Field Guide) drew science librarians and other trainers together to train them, and those folks could go back to their institutions and offer more-and-better searches and training for their constituents. We thought the Guides are a terrific group of people who were interested in people getting their hands on the myriad tools at NCBI and using them effectively. It wasn’t really a competitive situation—their remit was only for NCBI tools, and there were plenty of others out there for us to do. In fact, many people who contacted us for training did so because their local users enjoyed the NCBI training and they wanted similar engagements for other tools.
Recently, though, the calls changed. We found we were getting calls from people who said they weren’t going to be able to get any more Field Guide trainings. NCBI is discontinuing the outreach program. Quite frankly, we were surprised. A sample of the notifications people were getting: http://www.library.uiuc.edu/blog/bicnews/archives/2008/02/ncbi_field_cour.html
Unfortunately, that tremendous training opportunity will NOT occur. Yesterday NCBI Field Guide coordinator, Peter Cooper, sent the following email:
Because of budgetary constraints, NCBI has made reductions in some of its programs, and the education programs are affected. In fact, all outreach education programs (Field Guide, Mini-courses, Structures, PubChem) are terminated effective immediately. At this point we cannot reschedule this course or accept requests for future courses of any kind. This was as much a surprise to me as it is to you. Feel free to contact me if you have questions.
The Field Course, as well as the Mini-Courses and the Structure course, has been tremendously popular and useful (see list of sites where the Field Course has been offered recently), but the NCBI budget situation will not allow NCBI to continue to travel and offer these courses for the foreseeable future.
Here’s a link to a similar letter at another location: http://www.twu.edu/as/bio/NCBI/FieldGuide/
We’ve confirmed this with a number of people directly involved; they have laid off nearly all of the outreach team. Some got reassigned. There can hardly be anyone there to even answer emails to the helpdesk anymore—and they get lots of emails every day.
I’ve been through layoffs before, a few times. It actually feels like a punch to the gut when I hear about it anywhere else—especially among people I know. I expect layoffs at companies, though. But if there was any group that was solidly in place, going to be around for a long time, I would have thought it was the NCBI outreach team. I’m quite sorry to hear that it has been dissolved.
In this time of so many resources & so much need for increased understanding, outreach has become an intregal part of a resource’s success – fewer instructional resources is an unfortunate consequence of decreased funding for science.
A lot of researchers use dbSNP, some don’t know that you can blast the dbSNP database with a query sequence of interest to find out if there is a polymorphism reported for a homologous sequence in the database. You can, it’s simple, use Blast SNP. It’s not a prominent link, so you might of overlooked it. This quick tip will show you where it is and quickly how to use it.