…an integrated application for viewing and analyzing sequence data. With Genome Workbench, you can view data in publically available sequence databases at NCBI, and mix this data with your own private data.
It’s a useful program and they have a great set of videos to introduce you to the workbench’s functions and features. The video embedded here is the introduction, but they also have several additional videos including how to load a genome into the workbench, phylogenies and others. Check it out.
( forgive the delay of this week’s tip. Snow canceled work, and knocked out Internet access!)
PATRIC is a integration portal (as the name implies) of data concerning disease-causing infectious bacteria. Or to put it in their words:
PATRIC is the Bacterial Bioinformatics Resource Center, an information system designed to support the biomedical research community’s work on bacterial infectious diseases via integration of vital pathogen information with rich data and analysis tools.
We mentioned PATRIC at the beginning of the year in a SNPpets. Also, recently I was speaking with a threat abatement specialist who was lamenting the lack of coordinated data on infectious bacteria genomes. I was sure there was such a site, so we checked our blog here and voila, sure enough, exactly what they needed.
PATRIC indeed coordinates a lot of different types of data from disease-causing infectious bacteria. This includes data from all NIAID biodefense A/B/C pathogens. This includes hundreds of genomes from many isolation sources. For example, as of this writing there are nearly 500 genomes, including 57 complete, of Escherichia. In addition to genomic data, there are many other types of data including phylogenetic, host-pathogen protein-protein interactions, protein, pathways and more. One interesting feature, of many, is the disease map (for mycobacterium only right now) that shows local outbreaks and alerts. There are many tools to access and analyze this data from specialized searches to browsers.
Gillespie, J., Wattam, A., Cammer, S., Gabbard, J., Shukla, M., Dalay, O., Driscoll, T., Hix, D., Mane, S., Mao, C., Nordberg, E., Scott, M., Schulman, J., Snyder, E., Sullivan, D., Wang, C., Warren, A., Williams, K., Xue, T., Seung Yoo, H., Zhang, C., Zhang, Y., Will, R., Kenyon, R., & Sobral, B. (2011). PATRIC: the Comprehensive Bacterial Bioinformatics Resource with a Focus on Human Pathogenic Species Infection and Immunity, 79 (11), 4286-4298 DOI: 10.1128/IAI.00207-11
Well, not that kind of galaxy (though visualizing those are quite nice), this kind of Galaxy. Galaxy is an excellent tool to analyze, reproduce and share genomics data and the Galaxy folks are always updating, improving and adding features to the tool. We have a tutorial for Galaxy to help you get started using this tool. As you might have guessed from the previous sentence, Galaxy is a moving target. The basics (and that’s what the tutorial is for) are the same, but the tutorial is in the process of being updated to reflect some of those changes. That update should be out sooner rather than later, but that said, we just can’t fit everything into the tutorial. The relatively new visualization tool is something that will not be in the tutorial. As there are no tutorials on visualization at the Galaxy site that I can find (if you know of any, link them here!), I’ve included a quick intro to visualizations using Galaxy in this tip of the week.
There are other ways to visualize the data analyzed at Galaxy. Galaxy datasets can often be viewed directly at UCSC Genome Browser, Ensembl, RViewer or in GeneTrack within Galaxy. Those are all excellent tools and powerful ways to view and explore your analysis in depth. In addition, the Galaxy visualization tool is a way to quickly visualize your data to help discovery, direct further analysis and share what you’ve found. It is obviously not a full fledged browser, but is very useful in doing a simple visualization of your data from within Galaxy. Today’s tip gives a quick introduction to Galaxy visualization.
P.S. You might here some bird song in the background. I am in, and working from, Hawaii for the next month (yeah, it’s tough work but someone has got to do it). No way to get those birds (or the frogs at night) to be silent for a bit.
Recently, the Broad Institute announced a new tool: GenomeSpace. When I first looked at it, admittedly a very cursory look, I wasn’t sure how it would be much different than an integrator of tools like Galaxy or GenePattern. Obviously that cursory look was wrong at first glance since both Galaxy and GenePattern are in their list of tools that are supported. So what is GenomeSpace? Well, you can read the answer here at their “What is GenomeSpace” page :). Basically, GenomeSpace has several functions. As described here, “GenomeSpace supports several bioinformatics tools, all integrated to allow easy accessibility, easy conversion, and frictionless sharing.” It is a space (in that every expanding Amazon cloud) that allows you to store your data files and, importantly, GenomeSpace allows you to seamlessly move those files between the tools to complete complex, or simple, analyses. It achieves this by automatically converting file formats and by allowing the user to attach their accounts at the tools to their account at GenomeSpace, thus alleviating the need to log in several times when using more than one tool.
GenomeSpace is an integration of integrators,” Nekrutenko said. “The benefit to the user is that this brings together distinctive collections of functionalities offered by individual tools.”
The site is new, and only in beta. They only recently opened up registration from their invite-only stage. As such, there are some bugs and some features that aren’t quite at full capacity. For example, the Galaxy and UCSC Table Browser integration is with the test versions of those tools during beta. Thus, for example, your account at Galaxy will not be recognized when trying to link that account with GenomeSpace. I had to create a new one on the test site. And, if you go to the public version of the Table Browser, it will look different (no link to GenomeSpace as there is on the test site). Currently there are seven tools, more to come.
All that aside, it’s definitely a tool to get acquainted with. And with that in mind, take a quick introductory spin with me in this week’s video tip to get an idea of what you might be able to do.
A couple years ago (yes, we’ve been doing tips for almost 3 years now!) I did a tip on CTD, introducing the database to new users. As I stated then, CTD:
…is an excellent database to find information on chemical-gene-disease interactions. It is a manually curated database of chemical-gene interactions, chemical-disease and gene-disease associations.
Well, the developers of CTD have been busy. CTD has had a few interface changes that will make it a bit simpler to navigate. The tip linked above still stands as a pretty good introduction to what CTD is and does, but they’ve also added a few very nice features for comparisons and more. You can read a lot more about it in their recent NAR database issue article. Reading that article, I found that there are a lots of new tools and features. Can’t do them all :), so in today’s tip I present you with a very brief re-introduction to CTD and an introduction to one of the several new features at CTD, VennViewer. VennViewer creates a Venn Diagram “to compare associated data sets for up to three chemicals, diseases or genes.” A useful way to compare interactions.
Read the paper and explore the database to find more!
Davis AP, King BL, Mockus S, Murphy CG, Saraceni-Richards C, Rosenstein M, Wiegers T, & Mattingly CJ (2010). The Comparative Toxicogenomics Database: update 2011. Nucleic acids research PMID: 20864448
I’ve mentioned this before, but as I am trying to get this weeks tip ready, I thought I’d remind our readers that we have a community over at Scivee (youtube for science :): Genomics Resource Training. We post all our tips there now and we add videos from other users that train users about genomics resources. We have about 2-3 dozen videos in our community now. Come on over and join!
Today’s tip of the week is actually the first in a series of tips I will be doing over the next couple months. The recent paper in Lancet did a clinical assessment of an individual genome. In doing so, the researchers used various genomic resources do ascertain and interpret the data. We have a free tutorial on NIEHS SNPs that walks through some of these resources, but I thought it might be useful to follow one specific nucleotide variation through a lot more genomic databases to show the user what data is available and how to access it. Each tip I do over the next couples months (not every week, I do tips every 2-3 weeks) will follow a specific SNP through the databases. In this case, rs108622 in the CYP4F2 gene (cytochrome P450, family 4). These tips aren’t for the genome jockey’s and SNP surfers among us, they are more an introductory tour of what’s out there. They will be useful for those just starting to look at genomic variations, medical practitioners, clinicians or those just curious what is available. Today’s tip will start with the UCSC Genome Browser, find the variation and follow it through to dbSNP. Next tip will look closer at the dbSNP information and then follow the trail to OMIM and GeneTests. In later tips we’ll take the variation to another 4-6 different databases and genomic variation resources from HapMap and others. In the posts themselves I’ll link to even other variation databases. There is a plethora of them.
Today’s Tip of the Week is a short introduction to WAVe, or the Web Analysis of the Variome. The tool was recently introduced to us, and I’ve found it a welcome introduction to the tools available to the researcher to analyze human variation. This is apropos considering the recent paper we’ve been discussing on the clinical assessment of a personal genome (here, here and here) and that papers implications for personalized medicine and the use of online variation resources. WAVe also has introduced me to some additional tools I’ve either not been aware of, or haven’t used, which might be of use such as: LOVD (Leiden Open Variation Database), QuExT (Query Expansion Tool, also from the same developers as WAVe), and others. Of course there are also database information pulled in from Ensembl, Reactome, KEGG, InterPro, PDB, UniProt, NCBI and many others. Take some time to check it out.
Several ortholog databases are now available online. Most ofthem, however, consider orthology from the aspect of proteinsequences individually, including HomoloGene,EnsemblCompara, Inparanoid , Roundup , OrthoMCL and OrthoDB . There exist ambiguous and incompleteortholog assignments because of the interference mediated byalternative splicing (AS). Isoforms of one gene might beassigned to different orthologous clusters.
They also use GO terms in depth to help determine functions and annotations. The tip of the week this week is a quick introduction to this new database.
If you’ve been following this blog, you’ll know that every Wednesday, come rain, shine or travel, we post a tip of the week. 98% of the time that’s a short 3-5 minute screencast of some genomics database or resource we like or an aspect of one we think you might find useful*. We’ve been doing this for just over 2 years now, with over 100 tips. That’s a lot of screencasting! At the end of last year we recapped that year’s tips in two posts, the first half of the year, and the second half. We are going to do that this year to over these last two weeks of the year. Today I’ll list the tips we posted during the first half of the year, next week Mary will list the second half of the year’s worth. The list for January through June is below the fold: