Looking at the NHGRI News feed recently, I noticed this story (below) about a new genomic data collection that intrigued me. I found out about a new resource that I wanted to share as this week’s Tip of the Week. So this ~4 minute movie discusses my path to the Human Genome Structural Variation resource and a quick look at some of the data. But the paper was so influential on my thinking about the genome that I wanted to cover that in more detail in text form as well. So for a quick hit, watch the movie. For more detail, check out the text and links below. Quick trip to the database: http://hgsv.washington.edu
….Other recently created maps, such as the HapMap, have catalogued the patterns of small-scale variations in the genome that involve single DNA letters, or bases. However, the scientific community has been eagerly awaiting the creation of additional types of maps in light of findings that larger scale differences account for a great deal of the common genetic variation among individuals and between populations, and may account for a significant fraction of disease. While previous work has identified structural variation in the human genome, a sequence-based map provides much finer resolution and location information….
I spend a lot of time thinking about the official or “reference” human genome sequence. This sequence–the one that was released to all that fanfare a few years back–is a composite of several people. Rather like a “generic” genome.
Whose DNA was sequenced for the Human Genome Project?
This is intentionally not known to protect the volunteers who provided DNA samples for sequencing. The sequence is derived from the DNA of several volunteers….
Of course, projects to investigate small variations (simple or single nucleotide polymorphisms, or SNPs) such as the large HapMap effort and many other projects have revealed a lot of important single A, T, C, and Gs that vary among us. If you think of those as typos, you’ll have a sense of the scale of that variation. However, it is clear that there are other genomic variations that may be important for our understanding of any one individual’s genome. Whole pages or even whole chapters missing or added, not just typos, that might be an issue. Recently I got intrigued by that discovery of a possible deletion or duplication that might be associated with autism. So this wider look of these types of structural changes–deletions, insertions, inversions–got me really thinking about how to interpret data differently than I usually do when I consider the reference genome. It isn’t just the SNPs I need to keep in mind.
On to the paper itself:
Mapping and sequencing of structural variation from eight human genomes, by Kidd et al.
The paper in Nature is remarkable for many reasons, including the sheer volume of data it provides. This team examined the individual genomes of 8 different people. These same 8 people were examined in the HapMap project–so we can consider their genomes in the context of that variation project as well. Further, 5 of them are also part of the ENCODE project–so we will also be able to look at those genomes with that set of flashlights as well. I swear, those 5 people are going to be the most closely examined humans ever.
The researchers used several techniques to examine and confirm these structural variations in the genomes. They created whole genomic libraries of clones for each person–with about a million clones each. They sequenced each end to be able to map them to the reference genome. Think about how much that means in both physical and electronic reagents….And how much you can put in a figure on a regular page in a journal….
They mapped these clones and found over 75,000+ that differed in length or orientation. Some larger, some smaller, some flipped, some missing ends, etc. They whittled these down to around 3000 copy number variant candidates to pursue in more detail, and ended up with over 1,000 “non-redundant sites of copy-number variation” identified by restriction enzyme digestions. They validated these with microarrays that they had designed, and rescued another ~200 sites of variation with that strategy. Many of these could be additionally confirmed using SNP analyses as well. Inversions are more challenging to confirm, but they were able to find some with sequencing strategies and in situ hybridizations. End result: 1,695 sites of structural variation unearthed: 747 deletions, 724 insertions, 224 inversions. They discuss the occurrence of these in the context of the reference genome sequence, with some intriguing conclusions. You should go read the whole thing.
Of course, there is much more in the paper. Novel human sequences inserted in the genomes. Specific genes often affected by these changes. Examinations around human diversity. Very compelling stuff.
But there is just so much information in this paper–all those clones, all those individual implications….it simply can’t be contained in just a few pages. There is an extensive section of supplements for this paper. But even that isn’t enough. You have to check out the web site where you can see and interact with all of this clone data: http://hgsv.washington.edu/
The Human Genome Structural Variation Project site is a mirror of the UCSC Genome Browser that we know and love (and provide training for). But they overlay all of that clone data on top of the reference genome sequence information, and present it with the other genomic context that you can find the UCSC Genome Browser. So you can have it all in your view–and in your head–at once. It is a great way to go deeper with this data, and explore any genomic regions you might be interested in.
A traditional paper publication simply cannot contain all the data in a situation like this. This is an excellent example of one way to use the UCSC Genome Browser framework to present custom data for a project.
But it is such a huge advance in our understanding of genome-wide variations in humans, beyond the individual SNPs that have been such a major focus lately. We need to be thinking about the possible differences in each one of the pair of chromosomes that we carry. We need to be thinking about entire chunks of DNA that can be there, not there, twice there, or completely turned around. And if you think a personal genome scan for SNPs is going to provide all the information you may need to understand your biology, you may be missing some really crucial bits. Just keep that in mind as we move forward in the personal genome era. SNPs ain’t all, folks.
Kidd, J.M., Cooper, G.M., Donahue, W.F., Hayden, H.S., Sampas, N., Graves, T., Hansen, N., Teague, B., Alkan, C., Antonacci, F., Haugen, E., Zerr, T., Yamada, N.A., Tsang, P., Newman, T.L., Tüzün, E., Cheng, Z., Ebling, H.M., Tusneem, N., David, R., Gillett, W., Phelps, K.A., Weaver, M., Saranga, D., Brand, A., Tao, W., Gustafson, E., McKernan, K., Chen, L., Malig, M., Smith, J.D., Korn, J.M., McCarroll, S.A., Altshuler, D.A., Peiffer, D.A., Dorschner, M., Stamatoyannopoulos, J., Schwartz, D., Nickerson, D.A., Mullikin, J.C., Wilson, R.K., Bruhn, L., Olson, M.V., Kaul, R., Smith, D.R., Eichler, E.E. (2008). Mapping and sequencing of structural variation from eight human genomes. Nature, 453(7191), 56-64. DOI: 10.1038/nature06862