In looking through the 2012 Web Server Issue of NAR, Nucleic Acids Research journal, I couldn’t help notice resource names that revealed a bit about the developers’ sense of humor, such as “TaxMan” and “XXmotif“. There were others on the list (“MAGNET“, “GENIES” and “VIGOR“, for example) whose names made me cringe imagining someone trying to find them with the average search engine. [Our family’s favorite such resource is iHOP, or Information Hyperlinked Over Proteins - I gotta think that the developers aimed at that name in honor of the other IHOP and breakfasts everywhere.]
I scrolled through many such names until I found a resource to feature in today’s tip. I wanted something dealing with a current topic – they all pretty much fit that criteria – and one that I was interested in, but that was outside my “normal area of expertise”. I decided on “MetaboAnalyst 2.0“, which is the resource that I will feature in today’s tip. It is described in the article “MetaboAnalyst 2.0—a comprehensive server for metabolomic data analysis” as follows:
“MetaboAnalyst is a web-based suite for high-throughput metabolomic data analysis. It was originally released in 2009… MetaboAnalyst 2.0 now includes a variety of new modules for data processing, data QC and data normalization. It also has new tools to assist in data interpretation, new functions to support multi-group data analysis, as well as new capabilities in correlation analysis, time-series analysis and two-factor analysis. We have also updated and upgraded the graphical output to support the generation of high resolution, publication quality images.”
As I often do, I began “exploring” MetaboAnalyst 2.0 by reading their NAR article. It is well written and describes how the goal of the interface is to be user friendly and intuitive, so I headed over to MetaboAnalyst 2.0 “kick some tires”, so to speak. I found that the interface is quite easy & intuitive to use. And to really help users understand the resource before launching into uploading their own data, the developers provide a wide range of example data sets that users can play with, as well as step-by-step guides (pdf, PowerPoint, & two articles that require journal subscriptions, no videos yet). In my video I use one of their datasets & show a quick example of some analysis steps. Of course there isn’t time to fully cover MetaboAnalyst 2.0, but hopefully I show you enough to tempt you to try it out on your own.
*Please note that the developers suggest that you download results immediately because all user data is treated as private and confidential by MetaboAnalyst 2.0 will remain on the server for only 72 hours before automatically deleted.
References: Jianguo Xia, Rupasri Manda, Igor V. Sinelnikov, David Broadhurst, & David S. Wishart (2012). MetaboAnalyst 2.0—a comprehensive server for metabolomic data analysis Nucleic Acids Research, 40 (W1) DOI: 10.1093/nar/gks374
Jianguo Xia, Nick Psychogios, Nelson Young, & David S. Wishart (2009). MetaboAnalyst: a web server for metabolomic data analysis and interpretation Nucleic Acids Research Volume 37, Issue suppl 2 Pp. W652-W660. , 37 DOI: 10.1093/nar/gkp356
” provides a central location for voluntary submission of genetic test information by providers. The scope includes the test’s purpose, methodology, validity, evidence of the test’s usefulness, and laboratory contacts and credentials. The overarching goal of the GTR is to advance the public health and research into the genetic basis of health and disease.”
I’m always interested in checking out new resources from NCBI, especially when it is my turn to do a weekly tip. Initially I figured that I would check out the GTR and post a video on how to use it – but the NCBI beat me to that. You can see their YouTube tips (there are two) by clicking the link on their homepage & learn some search tips, etc. [Note, the two videos continued to loop for me & I needed to stop them after viewing them once].
But the question that I came up with is, “What will the GTR provide me with that I am not already getting from other clinical resources that I use, and that OpenHelix trains on?” I try to address that question in my video by doing the same search, for “Cystic fibrosis”, at five different clinically-related resources, and discussing what each offers and specializes in doing. Of course, in a five minute video I can’t be comprehensive – either for resources or what they cover – but I think it will give you enough of a taste for you to appreciate what the GTR offers you, or to continue the comparison on your own.
The resources that I visit in the tip movie are: the GTR, GeneTests, the Genetic Home Reference (GHR), OMIM, and Orphanet. At each resource I do a basic search for the the disease “Cystic fibrosis” and show the initial results display that resulted. I don’t have time to compare the detailed reports available at each, but lower on the post I link to a reference on the resource (if available), as well as the landing page for OpenHelix training materials on the resource – since we have a tutorial on many of these resources. I also include direct links to each resource.
I’d suggest that you read the NIH News article on the GTR release for some background on the GTR. I won’t cover everything here, but there are a couple of paragraphs that I want to point your attention to. The first explains the relationship between GeneTests and GTR, and says:
“GTR is built upon data pulled from the laboratory directory of GeneTests, a pioneering NIH-funded resource that will be phased out over the coming year. GTR is designed to contain more detailed information than its predecessor, as well as to encompass a much broader range of testing approaches, such as complex tests for genetic variations associated with common diseases and with differing responses to drugs. GeneReviews, which is the section of GeneTests that contains peer-reviewed, clinical descriptions of more than 500 conditions, is also now available through GTR.”
It seems to be another case where it was deemed easier to start a new resource (GTR) than to try and revamp an old resource (GeneTests) to handle the amazing influx of new data. Often resources aren’t retired as soon as expected, due to user feedback, but it is important to note that GTR seems to be in place to eventually replace GeneTests. I assume the GeneReviews will still be edited by & copyright to the University of Washington, Seattle, but I don’t have a reference for that. The similar transition occurred for OMIM, which was hosted at NCBI for years but now has a new URL at Johns Hopkins (watch for our new tutorial on OMIM, which is currently in the works).
The second paragraph that I found particularly interesting was the one on what the GTR contains, and will contain. It states:
“In addition to basic facts, GTR will offer detailed information on analytic validity, which assesses how accurately and reliably the test measures the genetic target; clinical validity, which assesses how consistently and accurately the test detects or predicts the outcome of interest; and information relating to the test’s clinical utility, or how likely the test is to improve patient outcomes.”
I didn’t immediately find mention of who will provide the validity or utility information in the GTR documentation, which is currently under construction. It is clear that much of the content of the database will be “voluntarily submitted by test providers”, and it is stated that “NIH does not independently verify information submitted to the GTR; it relies on submitters to provide information that is accurate and not misleading.”, but I also saw that experts will input on GTR’s content regularly, as can be read here. The GTR team is also very interested in receiving input on the resource, which can be submitted through the GTR feedback form.
*OpenHelix tutorials for these resources available for individual purchase or through a subscription
For GeneTests (free from PMC) – Pagon RA (2006). GeneTests: an online genetic information resource for health care providers. Journal of the Medical Library Association : JMLA, 94 (3), 343-8 PMID: 16888670
For GHR (free from PMC) – Mitchell JA, Fomous C, & Fun J (2006). Challenges and strategies of the Genetics Home Reference. Journal of the Medical Library Association : JMLA, 94 (3), 336-42 PMID: 16888669
For OMIM (open access article) – Amberger, J., Bocchini, C., & Hamosh, A. (2011). A new face and new challenges for Online Mendelian Inheritance in Man (OMIM®) Human Mutation, 32 (5), 564-567 DOI: 10.1002/humu.21466
For Orphanet (full access requires subscription) - Aymé, S., & Schmidtke, J. (2007). Networking for rare diseases: a necessity for Europe Bundesgesundheitsblatt – Gesundheitsforschung – Gesundheitsschutz, 50 (12), 1477-1483 DOI: 10.1007/s00103-007-0381-9
As many of you know, OpenHelix specializes in helping people access and utilize the gold mine of public bioscience data in order to further research. One of the ways that we do this is by creating materials to train people – researchers, clinicians, librarians, and anyone interested in science - on where to find data they are interested in, and how to access data at particular public databases and data repositories. We’ve got over 100 such tutorials on everything from PubMed to the Functional Glycomics Gateway (more on that later).
In addition creating these tutorials, we also spend a lot of time to keep them accurate and up-to-date. This can be a challenge, especially when lots of databases or resources all have major releases around the same time. Our team continually assesses and updates our materials and in this post I am happy to announce recently released updates to three of our tutorials: UniProt, World Tour, and Overview of Genome Browsers.
Our Introductory UniProt tutorialshows users how to: perform text searches at UniProt for relevant protein information, search with sequences as a starting point, understand the different types of UniProt records, and create multi-sequence alignments from protein records using Clustal.
In today’s tip I will briefly introduce you to the beta version of the updated DGV resource. The Database of Genomic Variants, or DGV, was created in 2004 at a time early in the understanding of human structural variation, or SV, which is defined by DGV as genomic variation larger than 50bp. DGV has historically provided public access to SV data in humans who are non-diseased. In the past it both accepted direct data submissions on SV and also provided high quality curation and analysis of the data such that it was appropriate for use in biomedical studies.
We’ve had an introductory tutorial on using DGV for years, and we’ve posted on changes at DGV in the past, so we were quite interested to read in their recent newsletter that there is a newly updated beta version of the DGV resource. The increase in SV data being generated by many large-scale sequencing projects as well as individual labs, has made it difficult for the DGV to continue to collect SV data, to provide a stable and comprehensive data archive AND to manually curate it at the level they have in the past. Therefore the DGV team is now partnering with DGVa at EBI and dbVar at NCBI. DGVa and dbVar will accept SV data submissions, and will function as public data archives (PDA) and, according to the publication sited below, DGVa and dbVar will:
“...provide stable and traceable identifiers and allow for a single point of access for data collections, facilitating download and meta-analysis across studies.“
DGV will no longer accept data submissions, but will instead use accessioned SV data from the archives and focus on providing the scientific community and public at-large with a subset of the data. Again quoting from the paper referenced below:
“The main role of DGV going forward will be to curate and visualize selected studies to facilitate interpretation of SV data, including implementing the highest-level quality standards required by the clinical and diagnostic communities.“
The original DGV resource is still available while comments are collected on the updated beta site. For more information on the updated DGV I suggest you check out this documentation from the DGV team: From their FAQ – “What is the data model used for DGV2?” and from a link in their top navigation area – “DGV Beta User Tutorial“. Be sure to check out the new displays & data that’s available, and most importantly to send your comments & suggestions to the group so that they can design a resource best suited for your needs.
Reference: Church, D., Lappalainen, I., Sneddon, T., Hinton, J., Maguire, M., Lopez, J., Garner, J., Paschall, J., DiCuccio, M., Yaschenko, E., Scherer, S., Feuk, L., & Flicek, P. (2010). Public data archives for genomic structural variation Nature Genetics, 42 (10), 813-814 DOI: 10.1038/ng1010-813
(Free access from PubMed Central here)
Edit, March 5, 2012 – I wanted to add a clarification that we recieved through our contact link. I am pasting it in full, with permission from Margie:
We at TCAG think you did a great job on your video blog of the New Database of Genomic Variants.
I wanted to make a correction to one of your statements: “The increase in SV data (…) at the level they have in the past.”
We, the DGV team, have built a system that CAN handle the new volumes and types of SV data now being published, and we are able to curate all of these data. The reason we partnered with DGVa and dbVar was primarily to provide stable, “universal” accessions for SV data. We also work with DGVa and dbVar to define standard terminology, data types, and data exchange formats.
I just wanted to make sure it was clear that we are fully capable to handle the SV data being published now. Our reason for partnership was to foster standardized data and open data sharing across systems.
Thanks again for your blog post!
I did this tip about two years ago. That was in our old system and I wanted to update it to our Scivee system. In addition, we did this tip using Galaxy and Galaxy has had a lot of changes since then. In this weeks tip I am going to walk you through a quick task of getting the flanking sequence of a list of chromosomal locations. In Galaxy, this is relatively simple, as you will see from the tip. There is a lot more you can do with this once you’ve obtained the sequence, manipulating the text to obtain columns of data necessary, etc. You might want to check out our tutorial on Galaxy or the Galaxy screencasts to learn more.
Online tutorial suite teaches how to access ENCODE (ENCyclopedia of DNA Elements) data in the UCSC Genome Browser.
The ENCODE project and data are crucial to ongoing genomics research and have already changed our understanding of the organization and function of the genome.
Bellevue, WA (PRWEB) November 9, 2010
An online tutorial suite is now available that teaches users how to access the ENCODE data in the UCSC Genome Browser. The online tutorial, created by OpenHelix in conjunction with the UCSC Bioinformatics Group can be viewed for free at http://www.openhelix.com/encode
The ENCODE Project, (ENCyclopedia of DNA Elements), is an international consortium of researchers who are moving beyond the basic information of the reference genome sequence. Researchers are using the newest sequencing technologies and numerous strategies to generate data to learn as much as possible about variations, genes, non-coding transcripts, regulatory elements, and genome structure and more, in extensive detail across the entire genome.
The ENCODE project is coordinated by the NHGRI. The UCSC Genome Browser is the designated Data Coordination Center (DCC) , for the ENCODE project, and the official ENCODE data repository.
“The ENCODE project and data are crucial to ongoing genomics research and have already changed our understanding of the organization and function of the genome,” said Kate Rosenbloom, the ENCODE technical project manager at UCSC. “New data are continually submitted to the Data Coordination Center before appearing in the literature. To maximize the impact on the broader biomedical community it is important to bring people up to speed quickly and efficiently on how to navigate the data. The OpenHelix tutorial suite will contribute greatly to our outreach and usability efforts for ENCODE.”
The online narrated tutorial, which runs in just about any browser, can be viewed from beginning to end or navigated using chapters and forward and backward sliders. The approximately 60 minute tutorial examines aspects of the ENCODE project and data types, and explores ways for you to access and learn about the ENCODE data available within the UCSC Genome Browser. Combined with the OpenHelix tutorials on the search and display features of the UCSC Genome Browser, the ENCODE data will enable researchers to access cutting-edge data, including pre-publication information.
The tutorial can be used by new users to introduce them to ENCODE, for previous users to view new features and functionality, or simply as a reference tool to understand specific features.
In addition to the tutorial, users can also access useful training materials including the animated PowerPoint slides used as a basis for the tutorial, suggested script for the slides, slide handouts, and exercises. This can save a tremendous amount time and effort for teachers and professors to create classroom content.
In addition to the ENCODE tutorial suite, OpenHelix offers over 90 tutorial suites on some of the most powerful and popular bioinformatics and genomics tools available on the web. Some of the tutorials suites are freely available through support from the resource providers. The whole catalog of tutorials suites is available through a subscription. Users can view the tutorials and download the free materials at openhelix.com.
About UCSC Bioinformatics Group
The UCSC Bioinformatics Group is part of the Center for Biomolecular Science and Engineering (CBSE) at the University of California, Santa Cruz. Director and HHMI investigator David Haussler leads a team of scientists, engineers and students in the study and comparative analysis of mammalian and model organism genomes. Research Scientist Jim Kent heads up the engineering team that develops and maintains the UCSC Genome Browser(http://genome.ucsc.edu). The UCSC Bioinformatics Group continues to uphold its original mission to provide free, unrestricted public access to genome data on the Web.
OpenHelix, LLC, (http://www.openhelix.com) provides a bioinformatics and genomics search and training portal, giving researchers one place to find and learn how to use resources and databases on the web. The OpenHelix Search portal searches hundreds of resources, tutorial suites and other material to direct researchers to the most relevant resources and OpenHelix training materials for their needs. Researchers and institutions can save time, budget and staff resources by leveraging a subscription to nearly 100 online tutorial suites available through the portal. More efficient use of the most relevant resources means quicker and more effective research.
Hey, just wanted to let you know that our updated tutorial on the RCSB PDB is now free, as announced by the RCSB PDB on Tuesday:
“Comprehensive training materials to introduce users to the features and functionality of the RCSB PDB are now freely available at openhelix.com.
The new training tools include an online narrated tutorial that demonstrates basic and advanced searches, how to generate reports, the different options for exploring individual structures, and many of the resources and tools available at the RCSB PDB for research and education. The full tutorial runs for about an hour, and can be navigated by specific chapters.
The animated PowerPoint slides used as a basis for the tutorial can be downloaded, along with slide handouts, and exercises. These materials are freely available for teachers and professors to create classroom content…”
I’m sure that I am biased, but I think this tutorial is great & will help anyone use the RCSB PDB more efficiently. Check it out & be sure to let us know what you think of it, when you get a chance!
Comprehensive tutorials on the ASTD, Entrez Protein, and MMDB databases enable researchers to quickly and effectively use these invaluable variation resources.
Seattle, WA September 24, 2008 — OpenHelix today announced the availability of new tutorial suites on the Alternative Splicing and Transcript Diversity (ASTD) database, Entrez Protein and the Molecular Modeling Database (MMDB). ASTD is an European Bioinformatics Institute (EBI) resource for alternative splice events and transcripts for the human, mouse, and rat systems. Entrez protein is a comprehensive database of protein information brought to you by the National Center for Biotechnology Information (NCBI). MMDB is another NCBI resource which contains an extensive collection of three-dimensional protein structures with detailed annotation that can be used to learn about the structure and function of many proteins. Together these three tutorials give the researcher an excellent set of resources to carry their research from transcript to 3d protein structure.
The tutorial suites, available for single purchase or through a low-priced yearly subscription to all OpenHelix tutorials, contain a narrated, self-run, online tutorial, slides with full script, handouts and exercises. With the tutorials, researchers can quickly learn to effectively and efficiently use these resources. These tutorials will teach users:
to perform Quick and Advanced searches
to navigate gene and transcript report pages
to predict intron/exon boundaries and likely regulatory protein binding site
to search manually curated data regarding alternate splicing
to perform basic and advanced searches utilizing the many available tools and options
to understand the protein records and exploit the many internal and external links you are provided with
to explore some of the resources provided by the NCBI network of databases, such as “My NCBI”
to search MMDB using both basic and advanced query techniques
to understand the detailed results you obtain
to visualize and manipulate structures using NCBI’s Cn3D structural viewer
OpenHelix, LLC, provides the genomics knowledge you need when you need it. OpenHelix currently provides online self-run tutorials and on-site training for institutions and companies on the most powerful and popular free, web based, publicly accessible bioinformatics resources. In addition, OpenHelix is contracted by resource providers to provide comprehensive, long-term training and outreach programs.
UCSC announced the Archaeal Genome Browser created by the Lowe Lab at UCSC last week. The browser has been accessible for a while, but this is the public ‘unveiling’ and announcement. The interface and use is very similar to the UCSC Genome Browser (free tutorial), though of course modified and geared to the analysis of Archaeal genomes. So add another resource and database to your toolbox, it looks like another good and useful one. As the announcement says:
Currently there are more than 50 completed archaeal genomes, the least studied domain of life. Although archaea and bacteria are both prokaryotes, often co-existing in the same environments, many aspects of archaeal cell biology such as DNA replication, repair, transcription, and translation are homologous to those found in eukaryotes. Some members of archaea are also notable for inhabiting extreme environments, including boiling terrestrial hot springs, black smoker vents at the bottom of the ocean, the ultra briny water of the Dead Sea, and highly acidic drainage water from ore mines, to name a few.
Comprehensive Tutorial on the SMART Bioinformatics Resource enables researchers to quickly and effectively use invaluable resource.
January 11, 2008 (Seattle WA) OpenHelix today announced the availability of a tutorial suite on the SMART bioinformatics resource (http://smart.embl-heidelberg.de/). SMART, created by the Bork Lab at the European Molecular Biology Laboratory (EMBL), contains information on hundreds of domains, and provides extensive annotations, phylogenetic analyses, and links to relevant resources about the domains.The tutorial suite, available for single purchase or through a low-priced yearly subscription to all OpenHelix tutorials, contains a narrated, self-run, online tutorial, slides, handouts and exercises. With the tutorial, researchers can quickly learn to effectively use this resource that helps them understand the evolution of function within multi-domain proteins. SMART brings together several functions that a researcher would otherwise have to perform separately.The tutorial will teach users to:
perform various text and sequence searches
find proteins based on their domain architecture
browse for key information about the evolutionary history and relationships of domains of interest to your researchTo find out more about this and other tutorial suites visit www.openhelix.com.
OpenHelix, LLC, (www.openhelix.com) provides the genomics knowledge you need when you need it. OpenHelix currently provides online self-run tutorials and on-site training for institutions and companies on the most powerful and popular free, web based, publicly accessible bioinformatics resources. In addition, OpenHelix is contracted by resource providers to provide comprehensive, long-term training and outreach programs.