Before I discuss NCBI’s 1000 Genomes Dataset Browser, I’d like to spend a bit of time on the 1000 Genomes project, in order to distinguish what is from NCBI and what is from the project itself. From the 1000 Genomes Pilot paper:
“The aim of the 1000 Genomes Project is to discover, genotype and provide accurate haplotype information on all forms of human DNA polymorphism in multiple human populations. Specifically, the goal is to characterize over 95% of variants that are in genomic regions accessible to current high-throughput sequencing technologies and that have allele frequency of 1% or higher (the classical definition of polymorphism) in each of five major population groups (populations in or with ancestry from Europe, East Asia, South Asia, West Africa and the Americas).”
You can access the full paper from the link below. The project has now moved past the pilot phase and is releasing new data all the time. You can see announcements and project details, or access that data, through the official 1000 Genomes project site, or through the official 1000 Genomes version of the Ensembl Browser. As you might imagine for a “big data” project such as this, data has been added to a variety of NCBI databases, including dbSNP, the Sequence Read Archive (SRA) and BioSample. Although you could search for this data through the universal Entrez search system, previously to view the data you would have to view individual results at each separate database. The 1000 Genomes Browser at NCBI has been created as a powerful interface for comprehensively searching for, and viewing, 1000 Genomes data contained in NCBI resources on a single page.
In the video tip I will familiarize you to the various areas of the page - the browser is created with series of widgets, each with its own function. I will not be able to cover all of the features, or demonstrate how users can upload their own variation data to the browser – I’ll leave you the fun of exploring those on your own. Because the tool is so young, bugs and suggestions/comments are still being actively requested – if you find something, check out the FAQs (which discuss bugs at various stages of being fixed) and then email the team.
In looking through the 2012 Web Server Issue of NAR, Nucleic Acids Research journal, I couldn’t help notice resource names that revealed a bit about the developers’ sense of humor, such as “TaxMan” and “XXmotif“. There were others on the list (“MAGNET“, “GENIES” and “VIGOR“, for example) whose names made me cringe imagining someone trying to find them with the average search engine. [Our family's favorite such resource is iHOP, or Information Hyperlinked Over Proteins - I gotta think that the developers aimed at that name in honor of the other IHOP and breakfasts everywhere.]
I scrolled through many such names until I found a resource to feature in today’s tip. I wanted something dealing with a current topic – they all pretty much fit that criteria – and one that I was interested in, but that was outside my “normal area of expertise”. I decided on “MetaboAnalyst 2.0“, which is the resource that I will feature in today’s tip. It is described in the article “MetaboAnalyst 2.0—a comprehensive server for metabolomic data analysis” as follows:
“MetaboAnalyst is a web-based suite for high-throughput metabolomic data analysis. It was originally released in 2009… MetaboAnalyst 2.0 now includes a variety of new modules for data processing, data QC and data normalization. It also has new tools to assist in data interpretation, new functions to support multi-group data analysis, as well as new capabilities in correlation analysis, time-series analysis and two-factor analysis. We have also updated and upgraded the graphical output to support the generation of high resolution, publication quality images.”
As I often do, I began “exploring” MetaboAnalyst 2.0 by reading their NAR article. It is well written and describes how the goal of the interface is to be user friendly and intuitive, so I headed over to MetaboAnalyst 2.0 “kick some tires”, so to speak. I found that the interface is quite easy & intuitive to use. And to really help users understand the resource before launching into uploading their own data, the developers provide a wide range of example data sets that users can play with, as well as step-by-step guides (pdf, PowerPoint, & two articles that require journal subscriptions, no videos yet). In my video I use one of their datasets & show a quick example of some analysis steps. Of course there isn’t time to fully cover MetaboAnalyst 2.0, but hopefully I show you enough to tempt you to try it out on your own.
*Please note that the developers suggest that you download results immediately because all user data is treated as private and confidential by MetaboAnalyst 2.0 will remain on the server for only 72 hours before automatically deleted.
References: Jianguo Xia, Rupasri Manda, Igor V. Sinelnikov, David Broadhurst, & David S. Wishart (2012). MetaboAnalyst 2.0—a comprehensive server for metabolomic data analysis Nucleic Acids Research, 40 (W1) DOI: 10.1093/nar/gks374
Jianguo Xia, Nick Psychogios, Nelson Young, & David S. Wishart (2009). MetaboAnalyst: a web server for metabolomic data analysis and interpretation Nucleic Acids Research Volume 37, Issue suppl 2 Pp. W652-W660. , 37 DOI: 10.1093/nar/gkp356
” provides a central location for voluntary submission of genetic test information by providers. The scope includes the test’s purpose, methodology, validity, evidence of the test’s usefulness, and laboratory contacts and credentials. The overarching goal of the GTR is to advance the public health and research into the genetic basis of health and disease.”
I’m always interested in checking out new resources from NCBI, especially when it is my turn to do a weekly tip. Initially I figured that I would check out the GTR and post a video on how to use it – but the NCBI beat me to that. You can see their YouTube tips (there are two) by clicking the link on their homepage & learn some search tips, etc. [Note, the two videos continued to loop for me & I needed to stop them after viewing them once].
But the question that I came up with is, “What will the GTR provide me with that I am not already getting from other clinical resources that I use, and that OpenHelix trains on?” I try to address that question in my video by doing the same search, for “Cystic fibrosis”, at five different clinically-related resources, and discussing what each offers and specializes in doing. Of course, in a five minute video I can’t be comprehensive – either for resources or what they cover – but I think it will give you enough of a taste for you to appreciate what the GTR offers you, or to continue the comparison on your own.
The resources that I visit in the tip movie are: the GTR, GeneTests, the Genetic Home Reference (GHR), OMIM, and Orphanet. At each resource I do a basic search for the the disease “Cystic fibrosis” and show the initial results display that resulted. I don’t have time to compare the detailed reports available at each, but lower on the post I link to a reference on the resource (if available), as well as the landing page for OpenHelix training materials on the resource – since we have a tutorial on many of these resources. I also include direct links to each resource.
I’d suggest that you read the NIH News article on the GTR release for some background on the GTR. I won’t cover everything here, but there are a couple of paragraphs that I want to point your attention to. The first explains the relationship between GeneTests and GTR, and says:
“GTR is built upon data pulled from the laboratory directory of GeneTests, a pioneering NIH-funded resource that will be phased out over the coming year. GTR is designed to contain more detailed information than its predecessor, as well as to encompass a much broader range of testing approaches, such as complex tests for genetic variations associated with common diseases and with differing responses to drugs. GeneReviews, which is the section of GeneTests that contains peer-reviewed, clinical descriptions of more than 500 conditions, is also now available through GTR.”
It seems to be another case where it was deemed easier to start a new resource (GTR) than to try and revamp an old resource (GeneTests) to handle the amazing influx of new data. Often resources aren’t retired as soon as expected, due to user feedback, but it is important to note that GTR seems to be in place to eventually replace GeneTests. I assume the GeneReviews will still be edited by & copyright to the University of Washington, Seattle, but I don’t have a reference for that. The similar transition occurred for OMIM, which was hosted at NCBI for years but now has a new URL at Johns Hopkins (watch for our new tutorial on OMIM, which is currently in the works).
The second paragraph that I found particularly interesting was the one on what the GTR contains, and will contain. It states:
“In addition to basic facts, GTR will offer detailed information on analytic validity, which assesses how accurately and reliably the test measures the genetic target; clinical validity, which assesses how consistently and accurately the test detects or predicts the outcome of interest; and information relating to the test’s clinical utility, or how likely the test is to improve patient outcomes.”
I didn’t immediately find mention of who will provide the validity or utility information in the GTR documentation, which is currently under construction. It is clear that much of the content of the database will be “voluntarily submitted by test providers”, and it is stated that “NIH does not independently verify information submitted to the GTR; it relies on submitters to provide information that is accurate and not misleading.”, but I also saw that experts will input on GTR’s content regularly, as can be read here. The GTR team is also very interested in receiving input on the resource, which can be submitted through the GTR feedback form.
*OpenHelix tutorials for these resources available for individual purchase or through a subscription
For GeneTests (free from PMC) – Pagon RA (2006). GeneTests: an online genetic information resource for health care providers. Journal of the Medical Library Association : JMLA, 94 (3), 343-8 PMID: 16888670
For GHR (free from PMC) – Mitchell JA, Fomous C, & Fun J (2006). Challenges and strategies of the Genetics Home Reference. Journal of the Medical Library Association : JMLA, 94 (3), 336-42 PMID: 16888669
For OMIM (open access article) – Amberger, J., Bocchini, C., & Hamosh, A. (2011). A new face and new challenges for Online Mendelian Inheritance in Man (OMIM®) Human Mutation, 32 (5), 564-567 DOI: 10.1002/humu.21466
For Orphanet (full access requires subscription) - Aymé, S., & Schmidtke, J. (2007). Networking for rare diseases: a necessity for Europe Bundesgesundheitsblatt – Gesundheitsforschung – Gesundheitsschutz, 50 (12), 1477-1483 DOI: 10.1007/s00103-007-0381-9
The UCSC Bioinformatics Group announces two free webinars on the UCSC Genome Browser (http://genome.ucsc.edu/). The webinars will be conducted by OpenHelix, the provider of training on 100s of free, publicly accessible bioinformatics and genomics resources.
The hour and 15 minute long webinars will cover the topics needed to effectively use this powerful, free, publicly-accessible tool. The first webinar, held Tuesday, May 24, 11:00-12:15 PT/2:00-3:15 ET (EDIT to add: this is 18:00 UTC/GMT), will be an introduction to the genome browser, designed for new users of the UCSC Genome Browser, and those who want to improve their skills at basic navigation and display.
In this webinar, you’ll learn:
• to perform basic text searches
• explore and understand display feature in a genomic region of interest
• customize displays to fit your needs
• use filters to highlight data you are interested in, such as displaying non-synonymous SNPs in red to stand out
• how to set up a view the way you want, and then save that as a “Session” to share with others
The second webinar, held Thursday, May 26, 1:00-2:15 PT/4:00-5:15 ET (EDIT to add: this is 20:00 UTC/GMT), will cover advanced topics including creating Custom Tracks and using the Table Browser.
In this webinar, you’ll learn to:
• perform advanced searches of the UCSC genome databases
• export and download large quantities of targeted data
• create custom tracks resulting from your advanced searches
• create custom annotation tracks of your data to share with others
Seminar Summary: What: “Introduction to the UCSC Genome Browser” and “UCSC Genome Browser: Custom Tracks and Table Browser” free webinars sponsored by UCSC Bioinformatics Group and presented by OpenHelix, LLC.
When: [note that the times are different]
Introduction: Tuesday, May 24, 11:00-12:15 PT/2:00-3:15 ET (EDIT to add: this is 18:00 UTC/GMT)
Custom Tracks and Table Browser: Thursday, May 26, 1:00-2:15 PT/4:00-5:15 ET (EDIT to add: this is 20:00 UTC/GMT)
Who: Anyone interested in learning how to use the UCSC Genome Browser. Requires knowledge of genomic/biological concepts. No programming skills required.
About UCSC Bioinformatics Group
The UCSC Bioinformatics Group is part of the Center for Biomolecular Science and Engineering (CBSE) at the University of California, Santa Cruz. Director and HHMI investigator David Haussler leads a team of scientists, engineers and students in the study and comparative analysis of mammalian and model organism genomes. Research Scientist Jim Kent heads up the engineering team that develops and maintains the UCSC Genome Browser (http://genome.ucsc.edu), a research tool that integrates the work of hundreds of scientists worldwide into a graphical display of genome sequences and aligned annotations. The Genome Browser — originally developed to assist in the initial assembly of the human genome — now features a rich set of annotations on a multitude of mammalian and model organism genomes. The UCSC Bioinformatics Group continues to uphold its original mission to provide free, unrestricted public access to genome data on the Web.
About OpenHelix, LLC.
OpenHelix, LLC, provides the genomics knowledge you need when you need it. OpenHelix provides online self-run tutorials, web seminars, and on-site training for institutions and companies on the most powerful and popular free, web based, publicly accessible bioinformatics resources. In addition, OpenHelix also is contracted by resource providers to provide comprehensive, long-term training and outreach programs.
OpenHelix has its headquarters in Seattle, with offices in San Francisco and Boston. Further information can be found on www.openhelix.com or by calling 1-888-861-5051
Brought to you by OpenHelix and BioMed Central :D. We really like the feature and idea (of course) and thought we’d pass it on.
BioMed Central (BMC) is an open access publisher. BMC along with OpenHelix launched a new feature recently to give readers of BMC journals timely access to relevant genomic resource tutorials. When reading a research article at BMC, researchers are now provided links to online tutorials of many of the genomics resources and tools used or cited in the article. The link takes the reader directly to the training landing page on the OpenHelix site. BMC has a large selection of open access high quality peer-reviewed research journals and much of the research reported today uses and cites many of the resources OpenHelix trains on. Researchers can now quickly find training on the databases and tools used in the research. For example, this recent article , Genomewide Characterization of non-polyadenylated RNAs, in BMC’s Genome Biology cites several tools used in their research including GEO, MEME and others. The new feature finds these citations in the article and lists links to the OpenHelix tutorials on those resources as seen in the image.
It can be hard to find a quick link to a relevant resource in papers–the citations are sometimes incomplete, or not linked to the site.
We have plans to expand this feature in several ways to make training on relevant and important genomics resources simpler and quicker for researchers.
We’ve already gotten some great feedback on this–Great idea!
We usually don’t blog specifically about OpenHelix tutorial purchasing (we do that with press releases), it’s not the purpose of the blog, but I really wanted to give a quick heads up. Many of our tutorials are free to the end-user because the resource provider has funded the training and outreach. UCSC, ENCODE, PDB, VISTA, SBKB and Galaxy are just some examples. The bulk of our tutorials (check out the catalog: reaching 100!) are behind a subscription wall. For trainers, professors teaching genomics, power learners, groups and institutions, subscriptions make a lot of sense. Sometimes though, individual users need to train on one or two resources and their need is fulfilled. We’ve just added a individual purchase function to our tutorials for those users.
If you are not subscribed, you’ll notice new green “purchase” and “subscribe” buttons (if you are subscribed, those buttons won’t appear of course, the tutorials are unlimited access). Click on the “purchase” button and you can get access to that specific tutorial immediately after a $28.50 purchase (through Google checkout, requires a free Google account, if you have a google email, that will be al you need). You’ll have immediate access that will last for 3 days after purchase. That will give unlimited access to the Flash movie for three days and the ability to download the slides, handouts and exercises.
Again, you can see the tutorials in our list here, or search for the resources on our home and search page. Just type in the resource (or general topic) you are interested in the search box. If we have a tutorial on the resource, there will be a ‘puzzle’ icon to the left of the search result. Green means it’s sponsored and free, red means you can view it with a subscription, or individual purchase. Just click the icon :D.
OpenHelix has had a newsletter for the last couple years. We send it to subscribers and others interested. It’s a bit of an open secret :). Well, I (Trey) took over putting it together this year and am in the midst of doing our latest one, I thought I’d mention it on the blog.
The newsletter comes out 3-4 times a year. We highlight a few of our more popular or interesting Tips of the Week, write a short article and update subscribers on the new tutorials we’ve added or recently updated ones. Regular readers of this blog will probably already have already gotten much of this if they keep up on the tips or the OH on the right hand column, but it might be a good way to remind yourself of some of our tips, or to get in one place all the tutorials we’ve updated or added in the past 3-4 months.
This isn’t particularly genomics related, but interesting and related to our work here at OpenHelix. We are a semi-virtual company. Our scientists, including myself, work full time at home. We do have a physical office in Bellevue, Wa where the CEO and support staff work, but the rest of us (including Mary, Jennifer and I here on the blog) work from home.
Though it was an adjustment for me learning how to telecommute (find a physically separate place to work, make rigid routines, and for me use a ‘get things done’ system, instill in family and friends… 1,000 times over.. that you WORKING and it’s not your free time just because you are in the building, and technical aspects.. yeah for skype!). Yet, I love working from home. It gives me the flexibility to raise a child, cook dinners from scratch, enjoy hobbies and it saves me money in gas and car upkeep. In fact, if it wasn’t for the 2 mile drive to our daughters’ school, we wouldn’t even need a car. The drawbacks (little social time with colleagues, etc) are minor and easily fixed.
I am a solid advocate of telecommuting. Done right it’s great for the employee, their families and the environment.
It turns out that not all work hours are the same. The BYU researchers calculated a “break point,” that is, the point where 25 percent of workers reported that work was interfering with family life. Among people who have to log all their hours in an office during certain times, this break point happened at 38 hours. Since many full-time workers log 40-45 hours per week, this means a lot of people are feeling conflict.
If you give employees some flexibility about their schedules, though, and give them the option to work some of the time from home, the break point doesn’t hit until 57 hours. That’s 19 more hours per week — 50 percent more than the office-only workers, and the equivalent of 2.5 full days.
We’ve found that it saves us money. Smaller physical footprint, no office rental, etc. But, according to that study linked above, it also gets more actual _work_ from employees. I think I can vouch for that. I personally think I hit my “too much work” point much later than if I was working from an office I’d have to commute to.
For one thing, I don’t have the 30-90 minute or more commute to travel to work every day. There’s a good solid hour or two of free time or work time. And because I can be flexible, for example picking up our daughter from school at 3:00 to take her to some activity, it gives me a lot leeway in how to use my time, making work less stressful. That in turn, makes more work easier to do. I find myself often working after the kid is in bed, or I have a free hour on the weekend. I know that at-the-office workers also have to often do that, but for me… I’m doing it more often when I want to, and those “have to” moments are less often.
I’m sold on telecommuting. I think, for many companies, employees and families… and the environment, it’s a win-win-win-win :D.
There are also webinars. Our research suggests that webinars are not particularly popular, so I’m curious how these turn out. There are also ‘how-to’ guides, documentation, community, teacher resources. It’s quite a nice site with lots of things to check out.
Our logo stylized DNA is left-handed. We know. We’ve known for a while. We saw it and we’ve periodically been told it. I’m saying it here now.
Of course I can blame it on the business/graphics guy who did it, but hey, I guess you could say that us biologists should have caught it right away.
But I’m going to go with the story that we meant it to be that way. Left-handed or ‘Z-DNA’ is naturally occurring, if rare, in vivo. In fact, it appears to have some function in regulating brain activity (Z-DNA, an active element in the genome). So you could say that we deliberately chose left-handed DNA because we train scientists brains on genomics and we felt it was a perfect symbol of what we do.
I guess you could say that, not that I am. Just that we could ;-).