Tag Archives: comparative genomics

Tip of the Week: Chromhome, for karyotype level comparative genomics

Usually when we think about comparative genomics data, we are thinking about genomes that are pretty well sequenced, and we want to look at that data with variety of tools and algorithms.  But this past week we saw a question about less-well-sequenced genomes, and we thought it was an interesting inquiry.  The question was: is there a web site that displays comparative karyotype data?  So we went looking. And we found Chromhome.

Chromhome has a very straightforward interface.  You choose a target species.  You choose the probe species.  You click paint–and you get a look at the chromosome level homology.  When the data was performed with actual probes and reported in the literature, that data is provided. At the time the paper was published this consisted of more than 100 data sets.

There is also the opportunity to see inferred painting as well.  I’ll let the Chromhome paper authors describe that strategy:

If species A and species B are mapped on species N, then it is possible to deduce some of the chromosomal arrangements of A on B or B on A with respect to the arrangements of N chromosomes. Many of the species in Chromhome have been mapped on human chromosomes using chromosome painting. It is therefore possible to infer homologies between two species each of which have been hybridized with human probes.

So if this type of comparative genomics may be of interest to you, check out Chromhome.


Nagarajan, S., Rens, W., Stalker, J., Cox, T., & Ferguson-Smith, M. (2008). Chromhome: A rich internet application for accessing comparative chromosome homology maps BMC Bioinformatics, 9 (1) DOI: 10.1186/1471-2105-9-168

We’re all Neanderthal now, and I can analyze that…

If you haven’t been reading the news, the draft sequence of the Neandertal (or is it Neanderthal, spell check won’t take the former) was released and published in Science today. There is a lot of fascinating stuff over there. Still reading it. Of course the big news, the stuff thats flying through the news, are non-African genomes are 1-4% Neanderthal. This seems to conclusively settle the question that yes, we are a little bit Neanderthal and we didn’t replace them, we absorbed them with some interbreeding. Perhaps not so completely as that but definitely some admixture going on. As Razib of Gene Expression points out, it’s fascinating to watch how quickly, in the face of data, the paradigm has shifted. (great post and discussion, should read it).

As Razib points out, and as you can read in the announcement at UCSC, the UCSC Genome Browser now has the draft data up in the hg18 genome assembly. Like the coding region allelic differences data, selective sweep data, etc. With the Neanderthal data now being in the UCSC Genome Browser and other data sources, we can pull apart that data, analyze it.. (and you know I’ll be putting my personal genome in a comparative track when I ever get it. Just curious ya know).

(btw, there is an interesting photo, copyrighted… so I won’t post it here, you might want to check out. There’s an interesting story there, how our illustrations of Neanderthal have evolved over the years to be more ‘humanizing’ as we learn that they made tools, had culture and now… are part of our ancestry…”)

I am itching to go play there and see what I can see, as I am sure many scientists are. It’s also fascinating to be in this world of huge amounts of data coming quickly. I think a lot of paradigms will be shifting for a while.

UCSC Main site: http://genome.ucsc.edu and click Neandertal from left navigation button.

Guest Post: CoGe, The Suite for Comparative Genomics – Eric Lyons

This next post in our continuing semi-regular Guest Post series is from Eric Lyons, of CoGe at the University of California, Berkeley. If you are a provider of a free, publicly available genomics tool, database or resource and would like to convey something to users on our guest post feature, please feel free to contact us at wlathe AT openhelix DOT com.

Thanks both for the prior CoGe post (editors note: a tip of the week on GoGe) and the invitation to write a bit about CoGe.  Since most people are probably not familiar with CoGe, let me begin with how it is designed:

CoGe’s architecture and philosophy:  Solve a problem once

CoGe is a web-based platform for comparative genomics and consists of many interconnected web-based tools.  The entire system is hooked up to a database that can store any version of any genome in any state of assembly from any organism (currently ~9000 genomes from ~8000 organisms). Each of CoGe’s tools is designed to do one task (e.g. search and display information about a genome, compare two genomes and generate syntenic dotplots, search any number of genomes for similar sequence, manage a list of genes, etc.), and are linked to one another. This means that there is no predefined analysis workflow. Instead, people can begin exploring a genome of interest, compare it to what they want, find something interesting, explore that, finding something else, explore that, etc.) People anywhere in the world can perform computationally intense analyses by clicking a few buttons on a web-page, and letting our servers crunch away on whatever genomes we have currently loaded in our system .  Since each tool is web-based, links are used to move from tool to tool which creates an easy way to save an analysis for future work or to send to a colleague. This also has the benefit that as we develop new tools to solve a specific problem, we can generalize the solution, and plug it into CoGe’s database and connect it to its pre-existing tool set. Overall, this allows an easy way for us to expand CoGe’s functionality.

Continue reading

Lawrence Berkeley National Laboratory and OpenHelixTM Announce an Updated Free Tutorial Suite for VISTA: Tools for Comparative Genomics

Free Tutorial Suite available from OpenHelix on the VISTA bioinformatics resource.

Bainbridge Island, WA (PRWEB) March 1, 2010 — The Lawrence Berkeley National Laboratory (LBNL) announces an updated, free OpenHelixTM tutorial suite on the VISTA bioinformatics resources (http://genome.lbl.gov/vista).

VISTA’s focus is comparative genomics, providing a comprehensive suite of programs and databases for biomedical researchers. VISTA includes excellent tools to examine genomic sequences—coding, non-coding, and important regulatory regions like transcription factor binding sites.

Our history with OpenHelix has proven that their tutorial suite is an excellent and cost-effective method for us to provide that training.

VISTA recently added VISTA Point which combines capabilities of the three tools currently available at the site – VISTA Gateway, VISTA Browser, and Text Browser. VISTA Point makes analyzing multiple and pairwise genome alignments and extracting relevant numerical data much more straightforward.

“As we continually update and improve the VISTA suite of tools, it is critical we give users the training they need to use the tools efficiently and effectively,” said Inna Dubchak, Principal Investigator for the VISTA project, “Our history with OpenHelix has proven that their tutorial suite is an excellent and cost-effective method for us to provide that training.”

The online narrated tutorial (www.openhelix.com/vista), which runs in just about any browser, can be viewed from beginning to end or navigated using chapters and forward and backward sliders. The approximately 60 minute tutorial highlights and explains the features and functionality needed to start using VISTA effectively. The tutorial can be used by new users to introduce them to VISTA, or by previous users to view new features and functionality or simply as a reference tool to understand specific features.
In addition to the tutorial, VISTA users can also access useful training materials including the animated PowerPoint slides used as a basis for the tutorial, suggested script for the slides, slide handouts, and exercises. This can save a tremendous amount time and effort for teachers and professors to create classroom content.

In addition to the VISTA tutorial, OpenHelix offers nearly 90 tutorial suites on some of the most powerful and popular bioinformatics and genomics tools available on the web. Some of the tutorials suites are freely available through support from the resource providers. The whole catalog of tutorials suites is available through a subscription. Users can view the tutorials and download the free materials atwww.openhelix.com. ;

About VISTA and LBNL
VISTA family of tools has been developed and hosted at Genomics Division of Lawrence Berkeley National Laboratory. This project was originally supported by the Programs for Genomic Applications grant from the NHLBI/NIH and is currently supported by the Office of Biological and Environmental Research, Office of Science, US Department of Energy.

Lawrence Berkeley National Laboratory (Berkeley Lab) has been a leader in science and engineering research for more than 70 years. Located on a 200 acre site in the hills above the University of California’s Berkeley campus, adjacent to the San Francisco Bay, Berkeley Lab holds the distinction of being the oldest of the U.S. Department of Energy’s National Laboratories. The Lab is managed by the University of California, operating with an annual budget of more than $500 million and a staff of about 3,800 employees, including more than 500 students.

About OpenHelix
OpenHelix, LLC, (www.openhelix.com) provides a bioinformatics and genomics search and training portal, giving researchers one place to find and learn how to use resources and databases on the web. The OpenHelix Search portal searches hundreds of resources, tutorial suites and other material to direct researchers to the most relevant resources and OpenHelix training materials for their needs. Researchers and institutions can save time, budget and staff resources by leveraging a subscription to nearly 100 online tutorial suites available through the portal. More efficient use of the most relevant resources means quicker and more effective research.

Guest Post: New at VISTA- Inna Dubchak

Our first guest post in our new semi-regular Guest Post series is from Inna Dubchak , principal investigator at the LBNL/JGI group, developers of the VISTA comparative genomics resource (who sponsors a tutorial, free to the users). If you are a provider of a free, publicly available genomics tool, database or resource and would like to convey something to users on our guest post feature, please feel free to contact us at wlathe AT openhelix DOT com.

I would like to give you a heads up on some new VISTA updates and ongoing development!

Updates: As you probably know from this blog, a new, still free VISTA tutorial is available now. We have introduced a lot of updates to these tools - built new programs, improved the existing ones, and entirely changed the design of the site to make it more up-to-date and convenient.

Main addition to the site – VISTA Point – combines capabilities of the three tools currently available at the site – VISTA Gateway, VISTA Browser, and Text Browser usually used step-by-step. VISTA Point makes analyzing multiple and pairwise genome alignments and extracting relevant numerical data much more straightforward, it is easy to update, expand and add new programs.

Soon: We are actively working on visualizing synteny at scales ranging from whole-genome alignment to the conservation of individual genes, with seamless navigation across different levels of resolution. In our upcoming VISTA-Dot tool we used the concept of two-dimensional “dot-plots”, historically employed in the analysis of local alignment, and an interactive Google-map-like interface to visualize whole-genome alignments. You will be able to get a display and analyze large-scale duplication in plants in one click! It can also be useful in genome assembly and finishing. Another addition coming in the near future, VISTA Synteny Viewer, presents a novel interface as three cross-navigable panels representing different scales of the alignment.

Attention: do not forget to use our whole-genome capabilities – Whole-genome VISTA to align sequence of any quality, from draft to finished, up to 10MB long, and Whole Genome rVISTA to evaluate which transcription factor binding sites (TFBS) are over-represented in upstream regions in a group of genes.

-Inna Dubchak

VISTA, genome comparison resource

The VISTA comparative genome analysis resource updated their interface a few months ago. Additionally, they’ve added VISTA-Point (which replaces and greatly extends VISTA text browser) which, as the site says, allows the user to:

Access complete data and visual presentation of pairwise and multiple alignments of whole genome assemblies.

The homepage has undergone a very nice redesign. Much of the underlying VISTA browser and other tools functionality and use is similar (though updated of course). We understand also that there will be upcoming updates to some tools and the addition of others. Look for that here :D.

Also, we’ve updated our tutorial to reflect the new site and functions. As before, this tutorial is free to users and sponsored by VISTA. Check it out.

Corn: 85% not corn, and missing big pieces

popcornSo I’m all excited about the genome festival that I’m seeing, related to the publication of the new sequence version of corn. You can access the main paper in Science, and there’s a very neat diagram in figure 1 that is like looking across time at the sequence data and into the corn nebula.  But the thing that cracked me up was this line from the abstract:

Nearly 85% of the genome is composed of hundreds of families of transposable elements, dispersed nonuniformly across the genome.

That means 85% of corn isn’t corn!!  And what business do those elements have messing with the genomes??  I am told all the time that messing with plant genomes is wrong and unnatural.  Heh.

For full coverage of the big news today I’ll point you to James and the Giant Corn (appropriately enough) who seems to be the CNN (Corn News Network) of 24-hour coverage of many aspects of the work.

I spent my morning looking over the PLoS Maize Special Collection papers, including the intriguing appetizer:  10 Reasons to be Tantalized by the B73 Maize Genome.  But I spent longer looking at the CNVs and PAVs paper.  I’ve been thinking about CNVs a lot  lately, and was interested to see this covered in a non-mammalian species.

Figure 1 is a nice example of how to use VISTA for effective displays in comparative genomics.  (If you haven’t used VISTA before you might check out our sponsored free tutorial on that–we are currently working with the VISTA team to update that with their new features too.)

There’s a really striking segment of chromosome 6 that appears to be present in one of the strains they examine and absent in the other (illustrated in figure 4).  And it looks like it has genes that are expressed and active in the B73 strain.  The ongoing investigation of that is pretty intriguing as well.

The structural variations are not evenly distributed across the genomes.  Some places have large occurrences, and some are untouched.  It’s clear that just in these two strains there’s a lot more structural diversity than in other species that have been examined:

In the human, rat, dog, mouse, macaque and chimpanzee genomes the average number of CNVs between two individuals is between 15 and 75 [43]–[48]. A high resolution study of eight human genomes [49] revealed only several hundred insertions and deletions, including CNV and PAV sequences, in the comparison of any two human genomes. In contrast, even after very stringent filtering we identified >3,700 CNV or PAV sequences that represent at least 2,000 events between these two maize genomes.

Emphasis mine.  Plants are so much more flexible, apparently….

This is going to lead to some neat clues on heterosis (or hybrid vigor) as the research proceeds with these new tools.  What a great time to be a plant scientist.  There are some very exciting projects coming along with the tools of genomics.

What I couldn’t locate was any reference to a CNV database (like DGV or CHOP CNV) where you can examine the whole set.  I’ll dig through the supplement data to see if I can find out more on that.  But I wanted get this post out to celebrate the very nice work and collection of papers on this project. Congrats to the teams involved!


Springer, N., Ying, K., Fu, Y., Ji, T., Yeh, C., Jia, Y., Wu, W., Richmond, T., Kitzman, J., Rosenbaum, H., Iniguez, A., Barbazuk, W., Jeddeloh, J., Nettleton, D., & Schnable, P. (2009). Maize Inbreds Exhibit High Levels of Copy Number Variation (CNV) and Presence/Absence Variation (PAV) in Genome Content PLoS Genetics, 5 (11) DOI: 10.1371/journal.pgen.1000734

Schnable, P., Ware, D., Fulton, R., Stein, J., Wei, F., Pasternak, S., Liang, C., Zhang, J., Fulton, L., Graves, T., Minx, P., Reily, A., Courtney, L., Kruchowski, S., Tomlinson, C., Strong, C., Delehaunty, K., Fronick, C., Courtney, B., Rock, S., Belter, E., Du, F., Kim, K., Abbott, R., Cotton, M., Levy, A., Marchetto, P., Ochoa, K., Jackson, S., Gillam, B., Chen, W., Yan, L., Higginbotham, J., Cardenas, M., Waligorski, J., Applebaum, E., Phelps, L., Falcone, J., Kanchi, K., Thane, T., Scimone, A., Thane, N., Henke, J., Wang, T., Ruppert, J., Shah, N., Rotter, K., Hodges, J., Ingenthron, E., Cordes, M., Kohlberg, S., Sgro, J., Delgado, B., Mead, K., Chinwalla, A., Leonard, S., Crouse, K., Collura, K., Kudrna, D., Currie, J., He, R., Angelova, A., Rajasekar, S., Mueller, T., Lomeli, R., Scara, G., Ko, A., Delaney, K., Wissotski, M., Lopez, G., Campos, D., Braidotti, M., Ashley, E., Golser, W., Kim, H., Lee, S., Lin, J., Dujmic, Z., Kim, W., Talag, J., Zuccolo, A., Fan, C., Sebastian, A., Kramer, M., Spiegel, L., Nascimento, L., Zutavern, T., Miller, B., Ambroise, C., Muller, S., Spooner, W., Narechania, A., Ren, L., Wei, S., Kumari, S., Faga, B., Levy, M., McMahan, L., Van Buren, P., Vaughn, M., Ying, K., Yeh, C., Emrich, S., Jia, Y., Kalyanaraman, A., Hsia, A., Barbazuk, W., Baucom, R., Brutnell, T., Carpita, N., Chaparro, C., Chia, J., Deragon, J., Estill, J., Fu, Y., Jeddeloh, J., Han, Y., Lee, H., Li, P., Lisch, D., Liu, S., Liu, Z., Nagel, D., McCann, M., SanMiguel, P., Myers, A., Nettleton, D., Nguyen, J., Penning, B., Ponnala, L., Schneider, K., Schwartz, D., Sharma, A., Soderlund, C., Springer, N., Sun, Q., Wang, H., Waterman, M., Westerman, R., Wolfgruber, T., Yang, L., Yu, Y., Zhang, L., Zhou, S., Zhu, Q., Bennetzen, J., Dawe, R., Jiang, J., Jiang, N., Presting, G., Wessler, S., Aluru, S., Martienssen, R., Clifton, S., McCombie, W., Wing, R., & Wilson, R. (2009). The B73 Maize Genome: Complexity, Diversity, and Dynamics Science, 326 (5956), 1112-1115 DOI: 10.1126/science.1178534

Tip of the Week: GeVo and Genome Comparison

gevo_thumbToday’s tip of the week introduces a new (to us) tool for genomic comparisons. We came across this tool reading a blog post at James and the Giant Corn (great blog) about a figure from his research proposal. See, there are reasons to read blogs :D. The tool he uses to create this figure and analysis is GeVo at CoGe which has several useful tools in addition to GeVo. In today’s tip of the week, we’ll take a quick look at James’ figure at GeVo and introduce CoGe. Check them out, they look like quite useful tools. (and while you’re at it, check out  James’ blog. Tidbits like this and interesting discussions make it well worth it.)

Should you care about the Platypus?


To answer that question you might want to check out the NHGRI webinar tomorrow: Using Evolution to Decode the Human Genome. The webinar is on comparative genomics:

It almost seems like a genome sequence now exists for nearly every living thing. Whether it’s a fruit fly, hedgehog, or the duck-billed platypus, the genomics research world has produced enormous

amounts of DNA sequence. How do we make sense of all of these data? The key is in comparisons… In this webinar, Eric Green will present an overview of the utility of comparative sequence analyses and show how these comparisons shed light on how genomes work and how these studies are relevant to human health.

It’s tomorrow, March 12, 2009 at 1 p.m. (Eastern US, 10 a.m Pacific, 17:00 GMT).

Oh, and the platypus genome is interesting even in and of itself :D.

Tip of the Week: Viewing data across databases

Vista tip Today’s tip looks at one example of how to view the same genomic data across several databases simply by browsing. You can download the data from analysis tools and databases in several formats and use that in others, and someday we’ll do a tip on that. But today’s tips shows you that many databases link out between them allowing you to view data in one context and then another simply by clicking a link. We are going to start by looking at comparative genomic data in VISTA , there’s much more in depth tutorial on VISTA here (free), then link out to the UCSC Genome Browser (free tutorial) to view the data there and then off to Ensembl (tutorial, subscription).