ENCODE Chromatin state data offers nice insights. Take this and run with it.

mmmmm….another “big data” paper illustrates a point I’ve been hammering on: there is terrific data coming out of these projects–but it’s not in the publications. So I’m going to talk about this paper in this post (1), but I’ll direct you to the database where the information is really available for your perusal–and to a tutorial that explains how to access it. Off we go….

The ENCODE project is one of the “big data” consortium-type efforts that is so common these days. ENCODE stands for ENCyclopedia Of DNA Elements, and it was begun to take us beyond the basic framework that the reference genome sequencing provided–to find out more about the control and structural organizational features encoded in the genome. I won’t go into the entire background here–you can find that at NHGRI and in the previous publications from the ENCODE team members. Some of you will recall that there was a pilot phase to explore the strategies and technologies that might be used to accomplish their goals, which was completed and published in 2007 (2). But that covered only 1% of the genome. After the success of the pilot phase, ENCODE was rolled out in a scale-up or production phase–which is now genome-wide.

The data from the production phase has been flowing into the UCSC Genome Browser since 2009 (3). New strategies for storing, displaying, and visualizing related data have been developed–so even if you aren’t specifically interested in the data types that ENCODE is providing–you may want to be aware of how this next-gen type of data is being managed and displayed effectively.

One of the data types that has now been published is about the chromatin state dynamics that were determined using ChIP-Seq technologies, various antibodies for immuno-precipitation, and a number of different cell lines. The researchers explored the various combinations of signals that could be derived from the binding of different histones, insulator and transcription machinery proteins to different DNA segments, and developed a way to characterize the resulting states as weak or strong promoters, enhancers, insulators, transcription regulators and more.

I won’t go into the details of the biological meaning more here–a nice take on that was offered yesterday by Joe at Genomes Unzipped in his post: How do variants outside genes influence disease risk? So go there for some insights into the larger meaning of this data. I wanted to focus on ways you can access and explore this data yourself, and to get you to go mine this for regions that you are interested in.

The paper lays out the strategy and shows the typical compelling examples that all big data papers offer. But this is a mere fraction of what’s available to you–rarely is your favorite gene/region/whatever going to be singled out and mentioned in these kinds of papers. And because of that, it won’t be curated by traditional editors/curators as has been done in the good (?) old days of one-gene-one-paper for you to find later in some repository. It just can’t happen that way anymore.

You need to look at at this kind of data in your favorite regions yourself. All of this data has been deposited in the UCSC Data Coordination Center (DCC). It’s all available for you to peruse in the UCSC Genome Browser.

Figure 1c in the paper gives you a color-coded way to evaluate the data that you might see in your regions of interest. Use that as a guide to consider what the data shows when you visualize it. Figure 4 illustrates a specific example of one specific putative strong enhancer with a Gata1 binding site. This should help you to recognize this kind of pattern in the display when you find it over at UCSC.

Now–finding the data at UCSC: the first thing I would encourage is to watch the tutorial that is freely available, sponsored by the UCSC DCC group. This provides an overview, a guide to recognizing ENCODE data, and ways to interact with the data. Then go over to UCSC Genome Browser and look for the appropriate tracks in the 2009 assembly (for some data) and a lot of ENCODE data on the 2006 assembly. Keep in mind: not all of the ENCODE data has been mapped to the 2009 assembly–so it’s wise to explore both. Find the Regulation track section, and look for ENCODE transcription-factor binding tracks, and Enhancer and Promoter tracks, and the super-track data collections as well–these bring several related data types together for visualization (ENCODE Regulation track on Mar 2006 is one example of a super-track). Some of the data may also be found on the Preview server.

Well, that should keep you busy for a while. Go and mine this stuff for regions that you are interested in. I know you want to put some of the leads from this data into your next grant application…. :)  If you are going to publish the results of your mining be sure to review the ENCODE data use and release policy, and ensure it’s past the publication embargo window or meets the other criteria that might apply. But it’s all there for the lookin’ right now.

Quick links:

ENCODE background at NHGRI: http://www.genome.gov/10005107

ENCODE portal at UCSC: http://encodeproject.org

ENCODE tutorial at OpenHelix, freely available because it is sponsored by UCSC: http://openhelix.com/ENCODE

References:

(1) Ernst, J., Kheradpour, P., Mikkelsen, T., Shoresh, N., Ward, L., Epstein, C., Zhang, X., Wang, L., Issner, R., Coyne, M., Ku, M., Durham, T., Kellis, M., & Bernstein, B. (2011). Mapping and analysis of chromatin state dynamics in nine human cell types Nature DOI: 10.1038/nature09906

And PeeEss: Don’t forget to look at the 19 figures in the supplemental information….

(2). Birney, E., Stamatoyannopoulos, J., Dutta, A., Guigó, R., Gingeras, T., Margulies, E., Weng, Z., Snyder, M., Dermitzakis, E., Stamatoyannopoulos, J., Thurman, R., Kuehn, M., Taylor, C., Neph, S., Koch, C., Asthana, S., Malhotra, A., Adzhubei, I., Greenbaum, J., Andrews, R., Flicek, P., Boyle, P., Cao, H., Carter, N., Clelland, G., Davis, S., Day, N., Dhami, P., Dillon, S., Dorschner, M., Fiegler, H., Giresi, P., Goldy, J., Hawrylycz, M., Haydock, A., Humbert, R., James, K., Johnson, B., Johnson, E., Frum, T., Rosenzweig, E., Karnani, N., Lee, K., Lefebvre, G., Navas, P., Neri, F., Parker, S., Sabo, P., Sandstrom, R., Shafer, A., Vetrie, D., Weaver, M., Wilcox, S., Yu1, M., Collins, F., Dekker, J., Lieb, J., Tullius, T., Crawford, G., Sunyaev, S., Noble, W., Dunham, I., Dutta, A., Guigó, R., Denoeud, F., Reymond, A., Kapranov, P., Rozowsky, J., Zheng, D., Castelo, R., Frankish, A., Harrow, J., Ghosh, S., Sandelin, A., Hofacker, I., Baertsch, R., Keefe, D., Flicek, P., Dike, S., Cheng, J., Hirsch, H., Sekinger, E., Lagarde, J., Abril, J., Shahab, A., Flamm, C., Fried, C., Hackermüller, J., Hertel, J., Lindemeyer, M., Missal, K., Tanzer, A., Washietl, S., Korbel, J., Emanuelsson, O., Pedersen, J., Holroyd, N., Taylor, R., Swarbreck, D., Matthews, N., Dickson, M., Thomas, D., Weirauch, M., Gilbert, J., Drenkow, J., Bell, I., Zhao, X., Srinivasan, K., Sung, W., Ooi, H., Chiu, K., Foissac, S., Alioto, T., Brent, M., Pachter, L., Tress, M., Valencia, A., Choo, S., Choo, C., Ucla, C., Manzano, C., Wyss, C., Cheung, E., Clark, T., Brown, J., Ganesh, M., Patel, S., Tammana, H., Chrast, J., Henrichsen, C., Kai, C., Kawai, J., Nagalakshmi, U., Wu, J., Lian, Z., Lian, J., Newburger, P., Zhang, X., Bickel, P., Mattick, J., Carninci, P., Hayashizaki, Y., Weissman, S., Dermitzakis, E., Margulies, E., Hubbard, T., Myers, R., Rogers, J., Stadler, P., Lowe, T., Wei, C., Ruan, Y., Snyder, M., Birney, E., Struhl, K., Gerstein, M., Antonarakis, S., Gingeras, T., Brown, J., Flicek, P., Fu, Y., Keefe, D., Birney, E., Denoeud, F., Gerstein, M., Green, E., Kapranov, P., Karaöz, U., Myers, R., Noble, W., Reymond, A., Rozowsky, J., Struhl, K., Siepel, A., Stamatoyannopoulos, J., Taylor, C., Taylor, J., Thurman, R., Tullius, T., Washietl, S., Zheng, D., Liefer, L., Wetterstrand, K., Good, P., Feingold, E., Guyer, M., Collins, F., Margulies, E., Cooper, G., Asimenos, G., Thomas, D., Dewey, C., Siepel, A., Birney, E., Keefe, D., Hou, M., Taylor, J., Nikolaev, S., Montoya-Burgos, J., Löytynoja, A., Whelan, S., Pardi, F., Massingham, T., Brown, J., Huang, H., Zhang, N., Bickel, P., Holmes, I., Mullikin, J., Ureta-Vidal, A., Paten, B., Seringhaus, M., Church, D., Rosenbloom, K., Kent, W., Stone, E., Sequencing Program*, N., Human Genome Sequencing Center*, B., Genome Sequencing Center*, W., Broad Institute*, ., Oakland Research Institute*, C., Gerstein, M., Antonarakis, S., Batzoglou, S., Goldman, N., Hardison, R., Haussler, D., Miller, W., Pachter, L., Green, E., Sidow, A., Weng, Z., Trinklein, N., Fu, Y., Zhang, Z., Karaöz, U., Barrera, L., Stuart, R., Zheng, D., Ghosh, S., Flicek, P., King, D., Taylor, J., Ameur, A., Enroth, S., Bieda, M., Koch, C., Hirsch, H., Wei, C., Cheng, J., Kim, J., Bhinge, A., Giresi, P., Jiang, N., Liu, J., Yao, F., Sung, W., Chiu, K., Vega, V., Lee, C., Ng, P., Shahab, A., Sekinger, E., Yang, A., Moqtaderi, Z., Zhu, Z., Xu, X., Squazzo, S., Oberley, M., Inman, D., Singer, M., Richmond, T., Munn, K., Rada-Iglesias, A., Wallerman, O., Komorowski, J., Clelland, G., Wilcox, S., Dillon, S., Andrews, R., Fowler, J., Couttet, P., James, K., Lefebvre, G., Bruce, A., Dovey, O., Ellis, P., Dhami, P., Langford, C., Carter, N., Vetrie, D., Kapranov, P., Nix, D., Bell, I., Patel, S., Rozowsky, J., Euskirchen, G., Hartman, S., Lian, J., Wu, J., Urban, A., Kraus, P., Van Calcar, S., Heintzman, N., Hoon Kim, T., Wang, K., Qu, C., Hon, G., Luna, R., Glass, C., Rosenfeld, M., Aldred, S., Cooper, S., Halees, A., Lin, J., Shulha, H., Zhang, X., Xu, M., Haidar, J., Yu, Y., Birney*, E., Weissman, S., Ruan, Y., Lieb, J., Iyer, V., Green, R., Gingeras, T., Wadelius, C., Dunham, I., Struhl, K., Hardison, R., Gerstein, M., Farnham, P., Myers, R., Ren, B., Snyder, M., Thomas, D., Rosenbloom, K., Harte, R., Hinrichs, A., Trumbower, H., Clawson, H., Hillman-Jackson, J., Zweig, A., Smith, K., Thakkapallayil, A., Barber, G., Kuhn, R., Karolchik, D., Haussler, D., Kent, W., Dermitzakis, E., Armengol, L., Bird, C., Clark, T., Cooper, G., de Bakker, P., Kern, A., Lopez-Bigas, N., Martin, J., Stranger, B., Thomas, D., Woodroffe, A., Batzoglou, S., Davydov, E., Dimas, A., Eyras, E., Hallgrímsdóttir, I., Hardison, R., Huppert, J., Sidow, A., Taylor, J., Trumbower, H., Zody, M., Guigó, R., Mullikin, J., Abecasis, G., Estivill, X., Birney, E., Bouffard, G., Guan, X., Hansen, N., Idol, J., Maduro, V., Maskeri, B., McDowell, J., Park, M., Thomas, P., Young, A., Blakesley, R., Muzny, D., Sodergren, E., Wheeler, D., Worley, K., Jiang, H., Weinstock, G., Gibbs, R., Graves, T., Fulton, R., Mardis, E., Wilson, R., Clamp, M., Cuff, J., Gnerre, S., Jaffe, D., Chang, J., Lindblad-Toh, K., Lander, E., Koriabine, M., Nefedov, M., Osoegawa, K., Yoshinaga, Y., Zhu, B., & de Jong, P. (2007). Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project Nature, 447 (7146), 799-816 DOI: 10.1038/nature05874

(3). Rosenbloom, K., Dreszer, T., Pheasant, M., Barber, G., Meyer, L., Pohl, A., Raney, B., Wang, T., Hinrichs, A., Zweig, A., Fujita, P., Learned, K., Rhead, B., Smith, K., Kuhn, R., Karolchik, D., Haussler, D., & Kent, W. (2009). ENCODE whole-genome data in the UCSC Genome Browser Nucleic Acids Research, 38 (Database) DOI: 10.1093/nar/gkp961

One thought on “ENCODE Chromatin state data offers nice insights. Take this and run with it.

  1. Pingback: Encyclopedia of DNA Elements (ENCODE): User’s Guide | The OpenHelix Blog

Comments are closed.