You may have been hearing about the 1000 Genomes project–it’s one of the ongoing “big data” projects that is going to yield a great deal of variation information about the human genome. The goal is to sequence well over1000 genomes to identify “most genetic variants that have frequencies of at least 1% in the populations studied”. They are doing this by sequencing large numbers of samples with 4x coverage. You can read more about their strategy in their About page on their web site. It also lists the anticipated sample populations.
In this week’s Tip of the Week I’m going to take a quick spin through their browser. (You can also download all the data, but I’ll be focusing on the browser.) They have begun to release data now, and there are 6 individual sequences available at this time. These are part of their “pilot” studies. You can get some details on the pilot from their about page, which links to this PDF about the samples.
They are using the Ensembl framework to display their data. So if you are familiar with using Ensembl you’ll have some facility moving around this browser. One thing that isn’t apparent right away from the site is that you can click the Resembl link on the display to turn on a track that puts the read/coverage data on the viewer. I also liked the alignment display of all 6 genomes–but I’m sure that’s going to get challenging to view later with more and more genomes.
In an exchange with their very helpful help desk yesterday, I got this quick summary of the samples you’ll see:
For the high coverage populations NA12891, NA12892 and NA12878 are the CEU trio, NA19238, NA19239 and NA19240 are the YRI trio both father, mother, child respectively and both children were daughters.
If you have questions about their data, be sure to go ask them for help–they were very speedy with answers for me .
Some of the project data has also been picked up by UCSC and you can access the same sequences in the UCSC Genome Browser in the Genome Variants track on the March 2006 human assembly. (You’ll also see Venter, Watson, and some other individual genomes there).
The Project: http://www.1000genomes.org/
The Browser: http://browser.1000genomes.org/
An article in Science with some background: A Plan to Capture Human Diversity in 1000 Genomes