Creating the reference genome

In our workshops around the world on the UCSC Genome Browser, we talk at the very beginning about the framework for the organization of the data in the graphical representation. We describe that the reference genome–the official released genome–for a species provides the genome coordinates, or positions, that allows the rest of the data to be placed in the viewer at the correct spots. We don’t spend time on how a reference genome comes into existence–we just accept that it’s there for the purposes of the tutorials and focus on how to work with it using the site and the software.

If you go over and look at the UCSC Genome Browser Gateway page, you’ll see the current default assembly (or version) of the reference genome has several nicknames. One of them is GRCh37. And you can see how that’s a change from the previous nicknames. The Genome Reference Consortium (GRC) for the human genome was assembled (get it?? har har) after the end of the Human Genome Project. Some people may only realize the change from the menu at the UCSC Genome Browser:

UCSC Genome Browser menu

We don’t spend a lot of time on the process of getting the reference genome. But if you’ve ever wondered who is responsible for creating the human reference genome–and a bit about how that’s done, and some of the complexities, you should read this interview with Deanna Church in Bio-IT World:

Deanna Church on the Reference Genome Past, Present and Future

You should also check out that paper they link about the variations on human chromosome 17–it’s a fascinating case study of the challenges of creating a reference from a section that has evolutionary and medical consequences. It might make you think differently about what the reference genome really means. It means it’s the official one that we all agree to use to provide the map coordinates–it doesn’t necessarily mean the one that everyone is walking around with. And some of the other places that have complicated structural features could have real medical implications are mentioned in the piece.

As we see more personal genomes come along, that will affect our understanding of genome structure in other important ways too. The article touches on that as well.

Also–you can get a heads-up on when the next assembly is expected. So eventually you’ll see that the UCSC team will offer another menu choice, and the new coordinates will drive what you see in their viewer. It doesn’t happen right away; it takes some time to recreate the mappings. And some annotation tracks take longer to come along from their providers as other groups also have to re-map to the new assembly. But that interview will help you to understand why new assemblies are still coming along, and how that happens.

Quick link:

Read Deanna Church on the Reference Genome Past, Present and Future at Bio-IT World.

Reference:

Zody, M., Jiang, Z., Fung, H., Antonacci, F., Hillier, L., Cardone, M., Graves, T., Kidd, J., Cheng, Z., Abouelleil, A., Chen, L., Wallis, J., Glasscock, J., Wilson, R., Reily, A., Duckworth, J., Ventura, M., Hardy, J., Warren, W., & Eichler, E. (2008). Evolutionary toggling of the MAPT 17q21.31 inversion region Nature Genetics, 40 (9), 1076-1083 DOI: 10.1038/ng.193