The ENCODE project began many years ago, with a pilot phase, that examined just 1% of the human genome. But this initial exploration helped the consortium participants to iron out some of the directions for later stages–including focusing on specific cell lines, techniques, and technologies in Phase 2. There have been a number of publications that came out from consortium members, but in addition to the participant’s papers, a lot of other folks have mined this data for various investigations as well. There’s still plenty of opportunity for discovery. Some people may not realize that there’s an also ENCODE phase 3 underway.
When we had a contract with the folks at UCSC Genome Browser for outreach on ENCODE, we developed materials to help people explore the data. But we hadn’t delved into it much since phase 3 began. But the other day I got a note from my NHGRI YouTube subscription (GenomeTV) that a whole workshop of ENCODE phase 3 information had been made available. So I wanted to have a look.
There is a series of video segments that correspond to this agenda from the ENCODE workshop. I’ll be highlighting one of them here, the one that introduces the features of the Phase 3 Data Coordination Center at Stanford now. But there may be others that you want to examine for your research goals as well. Another way to work through the different segments is available from the NHGRI page here: http://www.genome.gov/27561910 That page offers the slides, handouts, and exercises too.
The video is longer than our typical tips, but it’s worth seeing for the context and framework details. There’s also a section on searching and filtering, which explains how to locate precisely the things you want to find. There’s a helpful and funny analogy to searching for shoes as you would at Zappos. I’ve used the Zappos tool exactly that way, and I also like it very much. If you want more details on how their ontology structure helps them to accomplish this, check out the paper linked below. Also in the video, there’s a piece about how the metadata is structured, and what you can expect to find there.
There’s also a part about how to visualize the things you find. You end up loading them as a UCSC Genome Browser track hub, which is integrated with all they other data at UCSC. There’s another video with Pauline Fujita on the hubs which I’ll address separately later.
The playlist for the whole meeting is here. I won’t be highlighting all of them, but I may select more of them for future tips.
ENCODE portal: https://www.encodeproject.org/
Malladi, V., Erickson, D., Podduturi, N., Rowe, L., Chan, E., Davidson, J., Hitz, B., Ho, M., Lee, B., Miyasato, S., Roe, G., Simison, M., Sloan, C., Strattan, J., Tanaka, F., Kent, W., Cherry, J., & Hong, E. (2015). Ontology application and use at the ENCODE DCC Database, 2015 DOI: 10.1093/database/bav010
ENCODE Project Consortium (2012). An integrated encyclopedia of DNA elements in the human genome Nature, 489 (7414), 57-74 DOI: 10.1038/nature11247
ENCODE Project Consortium. (2011). A User’s Guide to the Encyclopedia of DNA Elements (ENCODE) PLoS Biology, 9 (4) DOI: 10.1371/journal.pbio.1001046
ENCODE Project Consortium (2004). The ENCODE (ENCyclopedia Of DNA Elements) Project Science, 306 (5696), 636-640 DOI: 10.1126/science.1105136