Tip of the Week: Introduction to ENCODE

ENCODE, which stands for Encyclopedia of DNA Elements, is the name of a project you’ll be hearing more and more about soon (ENCODE site at NHGRI). Partly because we are going to do a series of “tips” about the ENCODE project :) . But you’ll be hearing about it not just from us–there is a whole consortium of researchers working on this project to generate tremendous volumes of information about functional elements in the human genome. This data will begin to permeate the literature and databases near you quite rapidly! This ~4-minute movie introduces you to the project and directs you to the UCSC Genome Browser’s ENCODE portal to access this data.

For more detail about the project in text form, read below….

There was a pilot ENCODE project which aimed to find and analyze 1% of the human genome. This was a specified 30 megabases, and researchers took a wide range of genomics techniques to these regions to learn as much as they could about the details of these areas. From one of the early consortium papers we learn about the structure and the goals of this pilot phase of the project (Science 306:636-640):

The ENCyclopedia Of DNA Elements (ENCODE) Project is predicated on the belief that a comprehensive catalog of the structural and functional components encoded in the human genome sequence will be critical for understanding human biology well enough to address those fundamental aims of biomedical research. Such a complete catalog, or “parts list,” would include protein-coding genes, non–protein-coding genes, transcriptional regulatory elements, and sequences that mediate chromosome structure and dynamics; undoubtedly, additional, yet-to-be-defined types of functional sequences will also need to be included. 1

A complete catalog of any identifiable elements is important for us to really understand the functions encoded in our DNA. And we don’t just mean genes–we need to know about many types of other regulatory and structural features as well. Existing techniques were thrown at this–and new techniques developed as well.

The results of the pilot phase were published just about a year ago. There were some interesting revelations about genes, non-gene regions, pseudogenes, chromatin signatures, aspects of DNA replication, evolutionary constraints, variations, and more. The summary “group” paper from the ENCODE Consortium was published in Nature2. There was also a companion issue of Genome Research with 28 papers of the details! (I will not be citing each of those here…please go have a look)

So, a deep look at this 1% of the genome was instructive–dozens of publications demonstrated that. It also provided the proof-of-concept that continuing to examine the genome in this way would enlighten us and compel us. The project now proceeds with an expanded scope:

With the success of the initial phases of the ENCODE Project, NHGRI funded new awards in September 2007 to scale the ENCODE Project to a production phase on the entire genome along with additional pilot-scale studies. {emphasis mine}

The UCSC Genome Bioinformatics group manages the ENCODE consortium official repository for sequence data. They provide a portal to this data: ENCODE Project within the UCSC Genome Browser framework. But when you are using the regular UCSC Genome Browser you will find the ENCODE data available directly from the Tracks area as well. This includes the 1% that we already have pursued–and they are ramping up now with new software and strategies to expand the scope genome-wide as well.encode_logo.gif

This “Tip of the Week” will introduce you to the project and the portal. Subsequent tips will talk about using the data, special software strategies to examine the data, various data types, and many other aspects of this important project. We hope you’ll come back to learn more as we progress through the other 99% of the genome!

Logo: obtained from NHGRI’s official ENCODE page.

1 (2004). The ENCODE (ENCyclopedia Of DNA Elements) Project. Science, 306(5696), 636-640. DOI: 10.1126/science.1105136

2Birney, E., et al. (2007). Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature, 447(7146), 799-816. DOI: 10.1038/nature05874