Tip of the Week: Comparing Microbial Databases
A few weeks ago a commenter asked me to compare IMG (Integrated Microbial Genomes) to the UCSC Microbial Genome browser. I’ve been exploring & thinking since then & am going to give a very brief comparison of those two resources in today’s tip & I’ll expand the comparison to other resources here in the text of this post.
Each of these resources could (and does in many cases) have an hour long tutorial devoted to it so I will only be able to give the briefest of overviews in this 5 minute tip, but I think (hope) it will be enough to get you thinking and exploring. The way I see things, comparing these two resources is kind of like comparing apples and aardvarks: they start with the same thing – namely microbial genome information from NCBI’s RefSeq database – but after that they are very different organisms.
* The UCSC Microbial Genome browser includes archaea, bacteria and archaeal virus genomes, and is based on a slightly modified version of the UCSC Genome Browser system, which is an amazingly powerful browser that we know and love here at OpenHelix. On their homepage they describe the resource as:
The UCSC Microbial Genome Browser is a window on the biology of more than 300 microbial species from Bacteria and Archaea domains. Basic gene annotaiton is derived from NCBI Genbank/RefSeq entries, with overlays of sequence conservation across multiple species, nucleotide and protein motifs, non-coding RNA predictions, operon predictions, and other types of bioinformatic analyses. In addition, we display available gene expression data, and soon, high-throughput RNA sequencing. Direct contributions of functional genomic data or bioinformatic analyses are welcome.
Information is presented as ‘tracks’ aligned along the sequence of the genome. These tracks can be hidden, expanded and customized to display exactly the information a researcher is interested in, in the exact format that the researcher would like it. The database contains hundreds of microbial genomes and information from these genomes can be retrieved and analyzed by intersecting datasets using UCSC’s powerful Table Browser resource. We have multiple free tutorials on the general UCSC Genome Browser which would be applicable to using the UCSC Microbial Browser.
* The UCSC Archaeal Genome Browser is another microbial resource based on the general UCSC Genome Browser & displays information very similarly to that of the UCSC Microbial Genome Browser, but the two offer differences in the information tracks and species available. We just created a full tutorial on this resource in July of this year & the homepage has been updated significantly since then, so this group must really be active! (Watch for an announcement of our updated tutorial in the near future…)
* The Integrated Microbial Genomes resource is from the Joint Genome Institute, and also obtains its sequences from NCBI’s RefSeq database, as well as from their own sequencing efforts. It contains sequence data on archaea, bacteria, eukaryotes (for comparative purposes), viruses and plasmids. To quote the IMG homepage:
The Integrated Microbial Genomes (IMG) system ( Nucleic Acids Research, 2010, Vol. 38) serves as a community resource for comparative analysis and annotation of all publicly available genomes from three domains of life in a uniquely integrated context…
… The IMG user interface (see User Interface Map) allows navigating the microbial genome data space along its three key dimensions (genes, genomes, and functions), and groups together the main comparative analysis tools.
However, rather than presenting vast amounts of information aligned along a sequence, this resource aims to provide researchers with the most state-of-the-art technology for microbial comparative analyses. They provide an array of tools for finding precise sets of genomes, genes of functions according to various characteristics. The researcher is then able to use Abundance profilers, functional alignment tools, analysis carts and more to compare the items within the set to one another. Visualizations present the comparative information beautifully and clearly, but are not flexible in the same way that UCSC browser displays are using the track controls.
* The Integrated Microbial Genomes with Microbiome samples (IMG/M) resource is a Data Management & Analysis System that is related to IMG (which I bet you guessed) that specializes in the unique issues surrounding metagenomic sequences. It currently contains metagenome data on 133 microbiomes, specialized tools for metagenomic analyses, and all of the IMG data for reference in comparisons.
* The Complete Microbial Resource, or CMR, from the J. Craig Venter Institute is another general microbial resource, with archaeal, bacterial, and viral genomes. Its genome browser and comparative functions put it somewhere between the UCSC Microbial Browser & IMG, but with a real emphasis on the ability to easily download lists of genes, evidence or genomic elements as well as sequences, etc. that result from you analyses.
* The Complete Microbial Genomes database is from NCBI and offers an extensive collection of data, resources and tools for prokaryotic genomic analysis. Data and tools are organized into three major tables, including Organism info, Complete genomes, and Genomes in progress. Sequence information is available for over 1000 archaeal and bacterial genomes and utilizes NCBI’s extensive resources to provide extensive linkout options to additional information. Complete Microbial Genomes is one of NCBI’s many Entrez Genome Projects.
These resources are just a few of the many general microbial resources publicly available to researchers. Then there are species specific resources such as EcoliHub, subject specific resources such as MiST (Microbial Signal Transduction Database), and resources associated with specific projects such as the Human Microbiome Project (HMP) resources: Data Analysis and Coordination Center (DACC) for the Human Microbiome Project (HMP), the NIH Human Microbiome Project (HMP) Roadmap Project, and the HMP NIH Intramural Skin Microbiome Consortium (NISMC) data at dbGaP, which Mary heard about at a recent meeting. I think I’ll leave exploration of these specialized projects to another day though.
UCSC Microbial Genome Browser and UCSC Archaeal Browser:
Schneider, K. (2006). The UCSC Archaeal Genome Browser Nucleic Acids Research, 34 (90001) DOI: 10.1093/nar/gkj134
Markowitz, V., Chen, I., Palaniappan, K., Chu, K., Szeto, E., Grechkin, Y., Ratner, A., Anderson, I., Lykidis, A., Mavromatis, K., Ivanova, N., & Kyrpides, N. (2009). The integrated microbial genomes system: an expanding comparative analysis resource Nucleic Acids Research, 38 (Database) DOI: 10.1093/nar/gkp887
Markowitz, V., Ivanova, N., Szeto, E., Palaniappan, K., Chu, K., Dalevi, D., Chen, I., Grechkin, Y., Dubchak, I., Anderson, I., Lykidis, A., Mavromatis, K., Hugenholtz, P., & Kyrpides, N. (2007). IMG/M: a data management and analysis system for metagenomes Nucleic Acids Research, 36 (Database) DOI: 10.1093/nar/gkm869
Davidsen, T., Beck, E., Ganapathy, A., Montgomery, R., Zafar, N., Yang, Q., Madupu, R., Goetz, P., Galinsky, K., White, O., & Sutton, G. (2009). The comprehensive microbial resource Nucleic Acids Research, 38 (Database) DOI: 10.1093/nar/gkp912
Complete Microbial Genomes:
Sayers, E., Barrett, T., Benson, D., Bolton, E., Bryant, S., Canese, K., Chetvernin, V., Church, D., DiCuccio, M., Federhen, S., Feolo, M., Geer, L., Helmberg, W., Kapustin, Y., Landsman, D., Lipman, D., Lu, Z., Madden, T., Madej, T., Maglott, D., Marchler-Bauer, A., Miller, V., Mizrachi, I., Ostell, J., Panchenko, A., Pruitt, K., Schuler, G., Sequeira, E., Sherry, S., Shumway, M., Sirotkin, K., Slotta, D., Souvorov, A., Starchenko, G., Tatusova, T., Wagner, L., Wang, Y., John Wilbur, W., Yaschenko, E., & Ye, J. (2009). Database resources of the National Center for Biotechnology Information Nucleic Acids Research, 38 (Database) DOI: 10.1093/nar/gkp967