Useful Links: MaizeGDB MaizeGDB tutorials GBrowse OpenHelix GBrowse Tutorial Harper LC, Schaeffer ML, Thistle J, Gardiner JM, Andorf CM, Campbell DA, Cannon EK, Braun BL, Birkett SM, Lawrence CJ, & Sen TZ (2011). The MaizeGDB Genome Browser tutorial: one example of database outreach to biologists via video. Database : the journal of biological databases and curation, 2011 PMID: 21565781
Do you RTFM? C’mon—tell the truth. When you approach some new software, do you read the freakin’ manual first? Yeah. Thought so. Not to worry, it’s a common phenotype. In fact, we’re pretty sure that the scientists at OpenHelix are among a tiny fraction of people with a rare allele for software manual reading. But the good news is we’ve found a way to help non-rare allele people.
(RTFM… “Read the ‘forlorn’ manual” … F being something else… this is a family-friendly blog after all).
We absolutely know that a hour or a few of some structured training and learning about a biological database or analysis tool will save a researcher days, sometimes months, of work. Sometimes it will mean the difference between making an amazing discovery… or hitting a dead end.
Well, it was nice to read today in a comment in Nature (David Piston, p440, behind a subscription wall) a perfect example of something we are evangelists about:
As head of Vanderbilt University’s core microscopy labs, I recently met a colleague and his student to discuss their confusing results from an experiment studying protein interactions in cells. After applying a treatment that should have disrupted the interaction of two particular proteins inside mitochondria, they still saw the proteins interacting. The student said that to measure the interaction he had used a commercial automated image- analysis system. He didn’t understand how it worked, so he just used a colleague’s settings from a different experiment. But, without him realizing, this had masked all of the cell except for the mitochondria. If he had
modified the settings to leave the entire cell unmasked, he would have seen that the proteins were now present within the mitochondria in relatively small amounts compared with the rest of the cell, and so their interaction had been disrupted — the treatment was, in fact, working.
In this case, it wasn’t inspiration that was lacking — it was instruction.
He goes on to discuss how automation in research has been a boon, yet at the same time a bane.
It’s made it easier to get research done. This is going to sound like a ‘up the hill both ways… in the snow’ story, but a short (yes, I say.. short) 20 years ago when I started sequencing retrotransposable elements in drosophila, I used the Sanger method and used to fill lanes and lanes of polyacrylimide gels. A few thousand base pairs of sequence took a long time, as in days (or weeks when I made a stupid mistake and cracked the huge glass set up our brilliant lab technician set up). Today, it’s automated and fast, blazing fast. As in hundreds of millions of base pairs in the same amount of time.
But, as Dr. Piston explains in the comment, this leads to researchers not understanding fully how the tool works and the parameters best to use:
they waste time by using a tech- nique improperly or, equally tragically, miss something exciting when they assume that a strange result means that they did something wrong and they never follow it up.
He speaks mainly to experimental tools at the bench and the need for more education and instruction. But it goes doubly so for databases and computational analysis tools, databases and tools that pretty much any and all biological researchers now need to access and use regularly… or should.
We have so many of our own anecdotes along these lines. There was the researcher in mycology who, when we showed them a couple databases of mycological information, were ecstatic. My best anecdote was a prominent researcher (won’t say who, when or where :D) who took a workshop with us on the UCSC Genome and Table browsers. This researcher came up to me afterwards and showed me the research being done in the lab and the month they had spent in a dead end trying to analyze the data. At the end of the workshop, they figured out their answer and told us that the 4 hour course probably just saved them 6 months of work.
We have a lot of anecdotes like that.
Read the comment, Dr. Piston has some very valid and important points and suggestions.
NCBI was created in 1988 and has maintained the GenBank database for years. They also provide many computational resources and data retrieval systems for many types of biological data. As such they know all too well how quickly the data that biologists collect has changed and expanded. As uses for various data types have been developed, it has become obvious that new types of information (such as expanded metadata) need to be collected, and new ways of handling data are required.
NCBI has been adapting to such needs throughout the years and recently has been adapting its genome resources. Today’s tip will be based on some of those changes. My video will focus on the “completely redesigned Genome site”, which was recently rolled out and announced in the most recent NCBI newsletter. I haven’t found a publication describing the changes, but the newsletter goes into some detail and the announcement found at the top of the Genome site (& that I point out in the video) has very helpful details about the changes.
As you will see in the announcement, the Genome resource is not the only related resource to have undergone changes recently, including the redesign of the Genome Project resource into the BioProject resource and the creation of the BioSample resource. I won’t have time to go into detail about those two resources but at the end of my post I will link to two recent NCBI publications that came out in Nucleic Acids Research this month – these are good resources to read for more information on BioProject, BioSample, and on the NCBI as a whole. For a historical perspective I also link to the original Genome reference, which is in Bioinformatics and currently free to access.
Some of the changes are very interesting, including that “Single genome records now represent an organism and not a genome for one isolate.” The NCBI newsletter states that “Major improvements include a more natural organization at the level of the organism for prokaryotic, eukaryotic, and viral genomes. Reports include information about the availability of nuclear or prokaryotic primary genomes as well as organelles and plasmids. ” There’s also a note that “Because of the reorganization to a natural classification system, older genome identifiers are no longer valid. Typically these genome identifiers were not exposed in the previous system and were used mainly for programmatic access. ” That makes me wonder what changes this will mandate to other NCBI’s resources, as well as external resources. I haven’t seen any announcements on that yet, so I’ll just have to stay tuned & check around often.
Enjoy the tip & let us, or NCBI, know what you think of their changes!
Historic Entrez Genome reference: Tatusova, T., Karsch-Mizrachi, I., & Ostell, J. (1999). Complete genomes in WWW Entrez: data representation and analysisBioinformatics, 15 (7), 536-543 DOI: 10.1093/bioinformatics/15.7.536
Barrett, T., Clark, K., Gevorgyan, R., Gorelenkov, V., Gribov, E., Karsch-Mizrachi, I., Kimelman, M., Pruitt, K., Resenchuk, S., Tatusova, T., Yaschenko, E., & Ostell, J. (2011). BioProject and BioSample databases at NCBI: facilitating capture and organization of metadataNucleic Acids Research DOI: 10.1093/nar/gkr1163
Sayers, E., Barrett, T., Benson, D., Bolton, E., Bryant, S., Canese, K., Chetvernin, V., Church, D., DiCuccio, M., Federhen, S., Feolo, M., Fingerman, I., Geer, L., Helmberg, W., Kapustin, Y., Krasnov, S., Landsman, D., Lipman, D., Lu, Z., Madden, T., Madej, T., Maglott, D., Marchler-Bauer, A., Miller, V., Karsch-Mizrachi, I., Ostell, J., Panchenko, A., Phan, L., Pruitt, K., Schuler, G., Sequeira, E., Sherry, S., Shumway, M., Sirotkin, K., Slotta, D., Souvorov, A., Starchenko, G., Tatusova, T., Wagner, L., Wang, Y., Wilbur, W., Yaschenko, E., & Ye, J. (2011). Database resources of the National Center for Biotechnology InformationNucleic Acids Research DOI: 10.1093/nar/gkr1184
If you go over to the OpenHelix home page, you’ll start to notice some differences. Our tutorial landing pages have an added new feature. We now have a section on each landing page that shows the most recent research in the BioMed Central journals. For example, the screenshot here is for our GeneMANIA tutorial suite. At the bottom left corner you’ll see the 5 most recent research articles that have used GeneMANIA from the BioMed Central catalog of research journals. This will be useful to users to get an idea of what kind of data a resource has and research that is done using the resource. This will also be helpful to find other resources that might be of use (often a related-resource will be mentioned in conjunction with the resource of interest).
This is the beginning of a series of new features we hope to add to the landing pages that will enhance the ability of researchers to learn all they can about the data and tools available to them. We will be adding a “recent video tips” section and more.
And, if you are a publisher of research literature and would like to get a ‘recent research list’ like this on our landing pages for your journals (will need access to full text of articles), please contact us and we’d love to work with you to do so for the benefit of your journals, the resources, the researchers and yes, OpenHelix :).
In our continuing effort to maintain and expand our search database and engine, our tutorials, our blog and more, you will also notice we have added advertising to our site. The ads are a top banner ad, a side skyscraper ad and a small ad on tutorial landing pages. Please consider viewing these sponsors if they interest you. Ads will not be on sponsored tutorials such as UCSC Genome Browser and PDB, and others, nor will they be visible for any subscribers to the catalog of tutorials.
If you would like to sponsor a tutorial so that it is publicly available, whether you are the developer of the tool or a company, please contact us about the opportunity to get training and outreach to a large number of researchers.
Bioinformatics analysis is a powerful technique applicable to a wide variety of fields, and the subject of many a blog post here at OpenHelix. I’ve had two particular bioinformatics articles on my desk for a couple of months now, waiting for me to be able to articulate my thoughts on them. They both offer great information about their particular area of interest – predicting either SNV impacts or protein identities – and sage bioinformatics advice.
The second article is an open access article from BioTechniques entitled “Mistaken identities in proteomics“. It offers a romp through the history of mass spectrometry (MS) technology and rising standards for documenting techniques used for protein identification in journals. The article also concludes with sage bioinformatics advice, including this quote:
Proteomic researchers should be able to answer key questions, according to Giddings. “What are you actually getting out of a search engine?” she says. “When can you believe it? When do you need to validate?”
Both papers suggest that researchers who wish to use bioinformatics resources in their research should investigate the theoretical underpinnings and assumptions of each tool before deciding on a tool to use, and then should go at every analysis with a level of disbelief in the tool results. That just sounds like common sense, and makes good theoretical advice.
HOWEVER, the level of investigation that is required to truly know each tool and algorithm is prohibitively huge. As for me, my “practical” suggestion for researchers is a bit of a “filtering shortcut”. Before diving into all the publications on all possible tools, just spend a few minutes with some documentation – the resource’s FAQ, or an intro tutorial – we’ve got a few we can offer you – to get an idea of what the tool is about & what you might be able to get from it. Once you’ve got a general idea of how to approach the resource begin “banging” on it lightly. An initial kick the tires test of an algorithm, database, or other resource can be as easy as keeping a “test set” on hand at all times & running it through any new tool you want to use. Make sure that the set includes a partial list of some very well known proteins/pathways/SNPs/etc. (whatever you work on & will be interested in analyzing) and that it has some of your fields ‘flukes’. Think about what you expect to get back from your set. Then run your tester set through any new tool you are considering using in your research, and look at your results – are they what you know they should be? Can they handle the flukes, or do they break? As an example, when I approach a new protein interaction resource, I’ll use a partial parts list for some aspect of the yeast cell cycle, and include one or two of the hyphenated gene names. If the tool is good, I get a completed list with no bogging on the “weird” names. If it bogs, I know the resource may not be 100% worked out for yeast & may have issues with other species as well. If the full list of interactors comes back with a bunch of space-junk proteins I begin investigating what data is included in the resource and if settings can be tweaked to get better answers. Then, if things still look promising with the tool, I am likely to dig deep into the literature, etc. for the tool – just to be sure – because the authors of these articles are absolutely right, chasing false leads is expensive, frustrating & time consuming. It is amazing how many lemons & jalopies you can weed out with a 5 minute bioinformatics tire kick!
I also don’t think the responsibility should solely be on the back of each end user – the resource developer does have some responsibility for making their tool rigorous and for accurately representing its capabilities in publications and documentation. Calls for open source code can help improve some bioinformatics tools, so can education & outreach – but that discussion will have to wait for another day…
Cline, M., & Karchin, R. (2010). Using bioinformatics to predict the functional impact of SNVs Bioinformatics, 27 (4), 441-448 DOI: 10.1093/bioinformatics/btq695
Brought to you by OpenHelix and BioMed Central :D. We really like the feature and idea (of course) and thought we’d pass it on.
BioMed Central (BMC) is an open access publisher. BMC along with OpenHelix launched a new feature recently to give readers of BMC journals timely access to relevant genomic resource tutorials. When reading a research article at BMC, researchers are now provided links to online tutorials of many of the genomics resources and tools used or cited in the article. The link takes the reader directly to the training landing page on the OpenHelix site. BMC has a large selection of open access high quality peer-reviewed research journals and much of the research reported today uses and cites many of the resources OpenHelix trains on. Researchers can now quickly find training on the databases and tools used in the research. For example, this recent article , Genomewide Characterization of non-polyadenylated RNAs, in BMC’s Genome Biology cites several tools used in their research including GEO, MEME and others. The new feature finds these citations in the article and lists links to the OpenHelix tutorials on those resources as seen in the image.
It can be hard to find a quick link to a relevant resource in papers–the citations are sometimes incomplete, or not linked to the site.
We have plans to expand this feature in several ways to make training on relevant and important genomics resources simpler and quicker for researchers.
We’ve already gotten some great feedback on this–Great idea!
We usually don’t blog specifically about OpenHelix tutorial purchasing (we do that with press releases), it’s not the purpose of the blog, but I really wanted to give a quick heads up. Many of our tutorials are free to the end-user because the resource provider has funded the training and outreach. UCSC, ENCODE, PDB, VISTA, SBKB and Galaxy are just some examples. The bulk of our tutorials (check out the catalog: reaching 100!) are behind a subscription wall. For trainers, professors teaching genomics, power learners, groups and institutions, subscriptions make a lot of sense. Sometimes though, individual users need to train on one or two resources and their need is fulfilled. We’ve just added a individual purchase function to our tutorials for those users.
If you are not subscribed, you’ll notice new green “purchase” and “subscribe” buttons (if you are subscribed, those buttons won’t appear of course, the tutorials are unlimited access). Click on the “purchase” button and you can get access to that specific tutorial immediately after a $28.50 purchase (through Google checkout, requires a free Google account, if you have a google email, that will be al you need). You’ll have immediate access that will last for 3 days after purchase. That will give unlimited access to the Flash movie for three days and the ability to download the slides, handouts and exercises.
Again, you can see the tutorials in our list here, or search for the resources on our home and search page. Just type in the resource (or general topic) you are interested in the search box. If we have a tutorial on the resource, there will be a ‘puzzle’ icon to the left of the search result. Green means it’s sponsored and free, red means you can view it with a subscription, or individual purchase. Just click the icon :D.
I’ve mentioned this before, but as I am trying to get this weeks tip ready, I thought I’d remind our readers that we have a community over at Scivee (youtube for science :): Genomics Resource Training. We post all our tips there now and we add videos from other users that train users about genomics resources. We have about 2-3 dozen videos in our community now. Come on over and join!
GeneMANIA is a free public resource that offers a simple, intuitive web interface that shows the relationships between genes in a list and analyzes and extends the list to include other related genes. The web interface is backed by powerful analysis software and a large data warehouse containing extensive amounts of existing functional genomics data, and also includes Cytoscape Web, a web based advanced visualization tool to enable browsing of query results and creation of publication-ready figures.
“GeneMANIA will soon be updated to include significantly increased functionality,” according to Gary Bader and Quaid Morris, assistant professors in the Donnelly Centre (http://tdccbr.med.utoronto.ca/) and co-principal investigators for GeneMANIA. “OpenHelix based their tutorial on our development site, and even provided user feedback on our new features that resulted in improvements to our system. OpenHelix had very strong understanding of the GeneMANIA interface, which then translated into a powerful learning resource. The OpenHelix tutorial suite is sure to help current and new users to get up to speed on our site and its new features, and therefore get their results more quickly to support their research.”
The new training initiatives include a free online tutorial suite on GeneMANIA. The online narrated tutorial (http://www.openhelix.com/genemania ), which runs in just about any browser, can be viewed from beginning to end or navigated using chapters and forward and backward sliders. The approximately 60 minute tutorial highlights and explains the features and functionality needed to start using GeneMANIA effectively. The tutorial can be used by new users to introduce them to GeneMANIA, for previous users to view new features and functionality, or simply as a reference tool to understand specific features.
In addition to the tutorial, GeneMANIA users can also access useful training materials including the animated PowerPoint slides used as a basis for the tutorial, a suggested script for the slides, slide handouts, and exercises. This can save a tremendous amount time and effort for teachers and professors to create classroom content.
“GeneMANIA is an innovative, hypothesis generating tool that can be used to extend a given gene list to find related genes sharing similar functions,” said OpenHelix founder and President Dr. Mary Mangan. “OpenHelix is excited to contribute to furthering the field of gene function prediction by assisting researchers in effectively and efficiently using such a critical tool.”
In addition to the GeneMANIA tutorial suite, OpenHelix offers over 90 tutorial suites on some of the most powerful and popular bioinformatics and genomics tools available on the web. Some of the tutorials suites are freely available through support from the resource providers. The whole catalog of tutorials suites is available through a subscription. Users can view the tutorials and download the free materials at www.openhelix.com .
GeneMANIA (www.genemania.org ) is a free web-based prediction tool that finds other genes that are related to a set of input genes, using a very large set of functional association data. Association data include protein and genetic interactions, pathways, co-expression, co-localization and protein domain similarity. You can use GeneMANIA to find new members of a pathway or complex, find additional genes you may have missed in your screen or find new genes with a specific function, such as protein kinases. Your question is defined by the set of genes you input. If members of your gene list make up a protein complex, GeneMANIA will return more potential members of the protein complex. If you enter a gene list, GeneMANIA will return connections between your genes, within the selected datasets.
GeneMANIA is actively developed at the University of Toronto, in the Donnelly Centre for Cellular and Biomolecular Research, in the labs of Gary Bader and Quaid Morris, with input from an independent scientific advisory board. GeneMANIA development is funded by Genome Canada, through the Ontario Genomics Institute (2007-OGI-TD-05).
OpenHelix, LLC, (www.openhelix.com ) provides a bioinformatics and genomics search and training portal, giving researchers one place to find and learn how to use resources and databases on the web. The OpenHelix Search portal searches hundreds of resources, tutorial suites and other material to direct researchers to the most relevant resources and OpenHelix training materials for their needs. Researchers and institutions can save time, budget and staff resources by leveraging a subscription to nearly 100 online tutorial suites available through the portal. More efficient use of the most relevant resources means quicker and more effective research.