The next 1000 Genomes data is out [with video]

NHGRI released this video to accompany the new publication about the 1000 Genomes project analysis. It’s a nice overview of the framework of the project, they types of things they evaluated, and the major features that came out of the analysis.

A catalog of the variations in humans around the world is the basic goal. And the catalog is now offering more details about both common variation and rare variants in humans. Some of the rare ones are more serious than researchers expected–but appear in apparently healthy people. The geographic clustering of some of the rare variants provided interesting details as well–and the advice is that taking into account a persons “geographic ancestry” will be important for clinical assessments of the variations. This also might affect the outcomes of studies using individuals in various geographical locations that needs to be considered–and should affect the design of these studies.

As the summary video and other stories note, there are 40 million variants in this data set. Are these in the paper? No–as I keep pointing out, the data is not in the papers anymore. You’ll have to access them computationally. In the video they make the point that the data from this project would require over 38,000 DVDs to store. Are you getting that with your journal subscription? Nope. Just have a look at the volume of information in Figure 2 of this paper. And that’s just one region of one chromosome with a couple of genes. Of course there is a 113 page supplement with more information about the methods, technologies, and processes.

After the fairly light but still informative video, if you want more details about the study check out this terrific post at MassGenomics: Human genetic variation mapped in 1,000 genomes . It provides a lot of helpful context for this new data.

In the paper supplement they offer this information about ways to learn more about how to access and use the data at the 1000 Genomes project main site:

Tutorials explaining recommended methods for accessing and using the data
have been made available at:

The 1000 Genomes Project Consortium (2012). An integrated map of genetic variation from 1,092 human genomes Nature DOI: 10.1038/nature11632

3 thoughts on “The next 1000 Genomes data is out [with video]

  1. gsgs

    these 38000 DVDs, that’s uncompressed ?
    How much is it compressed, listing only differences
    to another already listed genome or such ?

  2. Mary Post author

    You’ll have to check out their site for those details, I haven’t gone into the raw data at all.

  3. gsgs

    also, why 180000 GB ? 1092 genomes is just 3000 GB uncompressed,
    fits on some hundred DVDs. They mean the whole amount of data that
    was handled during the process ?
    That’s a bit misleading, IMO. Reporters will remember the 30000 DVDs
    and report it without properly addressing what it is.

Comments are closed.