Tag Archives: ChIP-Seq

Video Tip of the Week: ENCODE ChIP-Seq Significance Tool

We’ve been doing training and workshops on the UCSC Genome Browser for 10 years now. It’s a tremendous tool that has to be a foundational item in your toolkit in genomics. But–there may be times when you want to examine some of the data that you can find there in another way, with a different focus or emphasis. It might be possible to craft some clever Table Browser queries that get you what you want. Sometimes, though, someone else has created a way for you to query the underlying data for a topic that could be useful too. And today’s tip of the week is exactly this kind of tool. A web interface to query the ENCODE data that resides in the UCSC Genome Browser, with a focus on finding transcription factors with enriched binding in a region that you might be interested in exploring. Today’s video tip is for the ENCODE ChIP-Seq Significance Tool.

There’s a ton of great data that flowed into the UCSC Genome Browser as part of the ENCODE project. It’s going to provide years of mining for biologists. What would be great is for biomedical researchers who have interest in specific genes–or sets of genes–to take a look at the ENCODE data to see if they can unearth some useful insights about the regulation of these genes or lists of genes. You can use the ChIP-Seq Significance tool to sift through the data.

The video that the Butte lab team did is very nice. Very specific guidance on how to use their tool–what to choose for the menu options, what the choices are, and what to expect from the results. Here’s their video:

Of course you should read their paper about this tool for the background you need (linked below), and the references that will also help you to understand what this tool offers. You should also read up on the associated ENCODE data. The supplement with the paper is also nicely written in clear language to help you to understand the features.

One of the things I was curious about was whether this might be extended to the mouse data too. One thing that people grouse to me about is that ENCODE is cell line data, and tissue data would really be great. But I saw discussion at Stephen Turner’s blog (read the comments) about the focus on human for now. There was also discussion of the CScan tool, though, which does cover the mouse data. So if this is a tool you are interested in, you might want to explore CScan too.

Hat tip to Stephen Turner for the awareness:

Quick links:

ENCODE ChIP-Seq Significance Tool: http://encodeqt.stanford.edu/

CScan: http://www.beaconlab.it/cscan


Auerbach, R., Chen, B., & Butte, A. (2013). Relating Genes to Function: Identifying Enriched Transcription Factors using the ENCODE ChIP-Seq Significance Tool Bioinformatics DOI: 10.1093/bioinformatics/btt316

ENCODE Chromatin state data offers nice insights. Take this and run with it.

mmmmm….another “big data” paper illustrates a point I’ve been hammering on: there is terrific data coming out of these projects–but it’s not in the publications. So I’m going to talk about this paper in this post (1), but I’ll direct you to the database where the information is really available for your perusal–and to a tutorial that explains how to access it. Off we go….

The ENCODE project is one of the “big data” consortium-type efforts that is so common these days. ENCODE stands for ENCyclopedia Of DNA Elements, and it was begun to take us beyond the basic framework that the reference genome sequencing provided–to find out more about the control and structural organizational features encoded in the genome. I won’t go into the entire background here–you can find that at NHGRI and in the previous publications from the ENCODE team members. Some of you will recall that there was a pilot phase to explore the strategies and technologies that might be used to accomplish their goals, which was completed and published in 2007 (2). But that covered only 1% of the genome. After the success of the pilot phase, ENCODE was rolled out in a scale-up or production phase–which is now genome-wide.

The data from the production phase has been flowing into the UCSC Genome Browser since 2009 (3). New strategies for storing, displaying, and visualizing related data have been developed–so even if you aren’t specifically interested in the data types that ENCODE is providing–you may want to be aware of how this next-gen type of data is being managed and displayed effectively.

One of the data types that has now been published is about the chromatin state dynamics that were determined using ChIP-Seq technologies, various antibodies for immuno-precipitation, and a number of different cell lines. The researchers explored the various combinations of signals that could be derived from the binding of different histones, insulator and transcription machinery proteins to different DNA segments, and developed a way to characterize the resulting states as weak or strong promoters, enhancers, insulators, transcription regulators and more.

I won’t go into the details of the biological meaning more here–a nice take on that was offered yesterday by Joe at Genomes Unzipped in his post: How do variants outside genes influence disease risk? So go there for some insights into the larger meaning of this data. I wanted to focus on ways you can access and explore this data yourself, and to get you to go mine this for regions that you are interested in.

The paper lays out the strategy and shows the typical compelling examples that all big data papers offer. But this is a mere fraction of what’s available to you–rarely is your favorite gene/region/whatever going to be singled out and mentioned in these kinds of papers. And because of that, it won’t be curated by traditional editors/curators as has been done in the good (?) old days of one-gene-one-paper for you to find later in some repository. It just can’t happen that way anymore.

You need to look at at this kind of data in your favorite regions yourself. All of this data has been deposited in the UCSC Data Coordination Center (DCC). It’s all available for you to peruse in the UCSC Genome Browser.

Figure 1c in the paper gives you a color-coded way to evaluate the data that you might see in your regions of interest. Use that as a guide to consider what the data shows when you visualize it. Figure 4 illustrates a specific example of one specific putative strong enhancer with a Gata1 binding site. This should help you to recognize this kind of pattern in the display when you find it over at UCSC.

Now–finding the data at UCSC: the first thing I would encourage is to watch the tutorial that is freely available, sponsored by the UCSC DCC group. This provides an overview, a guide to recognizing ENCODE data, and ways to interact with the data. Then go over to UCSC Genome Browser and look for the appropriate tracks in the 2009 assembly (for some data) and a lot of ENCODE data on the 2006 assembly. Keep in mind: not all of the ENCODE data has been mapped to the 2009 assembly–so it’s wise to explore both. Find the Regulation track section, and look for ENCODE transcription-factor binding tracks, and Enhancer and Promoter tracks, and the super-track data collections as well–these bring several related data types together for visualization (ENCODE Regulation track on Mar 2006 is one example of a super-track). Some of the data may also be found on the Preview server.

Well, that should keep you busy for a while. Go and mine this stuff for regions that you are interested in. I know you want to put some of the leads from this data into your next grant application…. :)  If you are going to publish the results of your mining be sure to review the ENCODE data use and release policy, and ensure it’s past the publication embargo window or meets the other criteria that might apply. But it’s all there for the lookin’ right now.

Quick links:

ENCODE background at NHGRI: http://www.genome.gov/10005107

ENCODE portal at UCSC: http://encodeproject.org

ENCODE tutorial at OpenHelix, freely available because it is sponsored by UCSC: http://openhelix.com/ENCODE


(1) Ernst, J., Kheradpour, P., Mikkelsen, T., Shoresh, N., Ward, L., Epstein, C., Zhang, X., Wang, L., Issner, R., Coyne, M., Ku, M., Durham, T., Kellis, M., & Bernstein, B. (2011). Mapping and analysis of chromatin state dynamics in nine human cell types Nature DOI: 10.1038/nature09906

And PeeEss: Don’t forget to look at the 19 figures in the supplemental information….

(2). Birney, E., Stamatoyannopoulos, J., Dutta, A., Guigó, R., Gingeras, T., Margulies, E., Weng, Z., Snyder, M., Dermitzakis, E., Stamatoyannopoulos, J., Thurman, R., Kuehn, M., Taylor, C., Neph, S., Koch, C., Asthana, S., Malhotra, A., Adzhubei, I., Greenbaum, J., Andrews, R., Flicek, P., Boyle, P., Cao, H., Carter, N., Clelland, G., Davis, S., Day, N., Dhami, P., Dillon, S., Dorschner, M., Fiegler, H., Giresi, P., Goldy, J., Hawrylycz, M., Haydock, A., Humbert, R., James, K., Johnson, B., Johnson, E., Frum, T., Rosenzweig, E., Karnani, N., Lee, K., Lefebvre, G., Navas, P., Neri, F., Parker, S., Sabo, P., Sandstrom, R., Shafer, A., Vetrie, D., Weaver, M., Wilcox, S., Yu1, M., Collins, F., Dekker, J., Lieb, J., Tullius, T., Crawford, G., Sunyaev, S., Noble, W., Dunham, I., Dutta, A., Guigó, R., Denoeud, F., Reymond, A., Kapranov, P., Rozowsky, J., Zheng, D., Castelo, R., Frankish, A., Harrow, J., Ghosh, S., Sandelin, A., Hofacker, I., Baertsch, R., Keefe, D., Flicek, P., Dike, S., Cheng, J., Hirsch, H., Sekinger, E., Lagarde, J., Abril, J., Shahab, A., Flamm, C., Fried, C., Hackermüller, J., Hertel, J., Lindemeyer, M., Missal, K., Tanzer, A., Washietl, S., Korbel, J., Emanuelsson, O., Pedersen, J., Holroyd, N., Taylor, R., Swarbreck, D., Matthews, N., Dickson, M., Thomas, D., Weirauch, M., Gilbert, J., Drenkow, J., Bell, I., Zhao, X., Srinivasan, K., Sung, W., Ooi, H., Chiu, K., Foissac, S., Alioto, T., Brent, M., Pachter, L., Tress, M., Valencia, A., Choo, S., Choo, C., Ucla, C., Manzano, C., Wyss, C., Cheung, E., Clark, T., Brown, J., Ganesh, M., Patel, S., Tammana, H., Chrast, J., Henrichsen, C., Kai, C., Kawai, J., Nagalakshmi, U., Wu, J., Lian, Z., Lian, J., Newburger, P., Zhang, X., Bickel, P., Mattick, J., Carninci, P., Hayashizaki, Y., Weissman, S., Dermitzakis, E., Margulies, E., Hubbard, T., Myers, R., Rogers, J., Stadler, P., Lowe, T., Wei, C., Ruan, Y., Snyder, M., Birney, E., Struhl, K., Gerstein, M., Antonarakis, S., Gingeras, T., Brown, J., Flicek, P., Fu, Y., Keefe, D., Birney, E., Denoeud, F., Gerstein, M., Green, E., Kapranov, P., Karaöz, U., Myers, R., Noble, W., Reymond, A., Rozowsky, J., Struhl, K., Siepel, A., Stamatoyannopoulos, J., Taylor, C., Taylor, J., Thurman, R., Tullius, T., Washietl, S., Zheng, D., Liefer, L., Wetterstrand, K., Good, P., Feingold, E., Guyer, M., Collins, F., Margulies, E., Cooper, G., Asimenos, G., Thomas, D., Dewey, C., Siepel, A., Birney, E., Keefe, D., Hou, M., Taylor, J., Nikolaev, S., Montoya-Burgos, J., Löytynoja, A., Whelan, S., Pardi, F., Massingham, T., Brown, J., Huang, H., Zhang, N., Bickel, P., Holmes, I., Mullikin, J., Ureta-Vidal, A., Paten, B., Seringhaus, M., Church, D., Rosenbloom, K., Kent, W., Stone, E., Sequencing Program*, N., Human Genome Sequencing Center*, B., Genome Sequencing Center*, W., Broad Institute*, ., Oakland Research Institute*, C., Gerstein, M., Antonarakis, S., Batzoglou, S., Goldman, N., Hardison, R., Haussler, D., Miller, W., Pachter, L., Green, E., Sidow, A., Weng, Z., Trinklein, N., Fu, Y., Zhang, Z., Karaöz, U., Barrera, L., Stuart, R., Zheng, D., Ghosh, S., Flicek, P., King, D., Taylor, J., Ameur, A., Enroth, S., Bieda, M., Koch, C., Hirsch, H., Wei, C., Cheng, J., Kim, J., Bhinge, A., Giresi, P., Jiang, N., Liu, J., Yao, F., Sung, W., Chiu, K., Vega, V., Lee, C., Ng, P., Shahab, A., Sekinger, E., Yang, A., Moqtaderi, Z., Zhu, Z., Xu, X., Squazzo, S., Oberley, M., Inman, D., Singer, M., Richmond, T., Munn, K., Rada-Iglesias, A., Wallerman, O., Komorowski, J., Clelland, G., Wilcox, S., Dillon, S., Andrews, R., Fowler, J., Couttet, P., James, K., Lefebvre, G., Bruce, A., Dovey, O., Ellis, P., Dhami, P., Langford, C., Carter, N., Vetrie, D., Kapranov, P., Nix, D., Bell, I., Patel, S., Rozowsky, J., Euskirchen, G., Hartman, S., Lian, J., Wu, J., Urban, A., Kraus, P., Van Calcar, S., Heintzman, N., Hoon Kim, T., Wang, K., Qu, C., Hon, G., Luna, R., Glass, C., Rosenfeld, M., Aldred, S., Cooper, S., Halees, A., Lin, J., Shulha, H., Zhang, X., Xu, M., Haidar, J., Yu, Y., Birney*, E., Weissman, S., Ruan, Y., Lieb, J., Iyer, V., Green, R., Gingeras, T., Wadelius, C., Dunham, I., Struhl, K., Hardison, R., Gerstein, M., Farnham, P., Myers, R., Ren, B., Snyder, M., Thomas, D., Rosenbloom, K., Harte, R., Hinrichs, A., Trumbower, H., Clawson, H., Hillman-Jackson, J., Zweig, A., Smith, K., Thakkapallayil, A., Barber, G., Kuhn, R., Karolchik, D., Haussler, D., Kent, W., Dermitzakis, E., Armengol, L., Bird, C., Clark, T., Cooper, G., de Bakker, P., Kern, A., Lopez-Bigas, N., Martin, J., Stranger, B., Thomas, D., Woodroffe, A., Batzoglou, S., Davydov, E., Dimas, A., Eyras, E., Hallgrímsdóttir, I., Hardison, R., Huppert, J., Sidow, A., Taylor, J., Trumbower, H., Zody, M., Guigó, R., Mullikin, J., Abecasis, G., Estivill, X., Birney, E., Bouffard, G., Guan, X., Hansen, N., Idol, J., Maduro, V., Maskeri, B., McDowell, J., Park, M., Thomas, P., Young, A., Blakesley, R., Muzny, D., Sodergren, E., Wheeler, D., Worley, K., Jiang, H., Weinstock, G., Gibbs, R., Graves, T., Fulton, R., Mardis, E., Wilson, R., Clamp, M., Cuff, J., Gnerre, S., Jaffe, D., Chang, J., Lindblad-Toh, K., Lander, E., Koriabine, M., Nefedov, M., Osoegawa, K., Yoshinaga, Y., Zhu, B., & de Jong, P. (2007). Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project Nature, 447 (7146), 799-816 DOI: 10.1038/nature05874

(3). Rosenbloom, K., Dreszer, T., Pheasant, M., Barber, G., Meyer, L., Pohl, A., Raney, B., Wang, T., Hinrichs, A., Zweig, A., Fujita, P., Learned, K., Rhead, B., Smith, K., Kuhn, R., Karolchik, D., Haussler, D., & Kent, W. (2009). ENCODE whole-genome data in the UCSC Genome Browser Nucleic Acids Research, 38 (Database) DOI: 10.1093/nar/gkp961