Tag Archives: chromatin

Video Tip of the Week: Chromohub, annotated trees of chromatin-mediated signaling

Today’s tip of the week is a quick introduction to ChromoHub. ChromoHub is an annotated phylogeny of chromatin-mediated signaling genes. As the ChromoHub site says these are “genes involved in writing, reading and erasing the histone code.” These are epigenetic modifications that emerging as target classes for future drug therapies.

ChromoHub maps annotated information about these genes onto a phylogeny of the genes where the researcher can find a wealth of information. The information one can find ranges from cancer data, SNPS, protein structure, protein-protein interactions to PubMed and funding information. There is a lot of information to view.

Today’s tip introduces you to the tool and how to add and view the annotations. There is a lot more at ChromoHub. You can suggest data that the developers have missed and download the information, alignment files and images and more.

ChromoHub was developed by SGC, the Structural Genomics Consortium. This is a private-public partnership that supports discovery of new medicines through open access research. ChromoHub is just one of the tools and resources developed by the consortium.

To find out more about the resource, check out the links and reference below.

Quick Links:

UCSC Genome Browser
Structural Genomics Consortium  (SGC)

ChromoHub Reference:

Liu L, Zhen S, Denton E, Marsden B, Schapira M. (2012). ChromoHub: a data hub for navigators of chromatin-mediated signalling. Bioinformatics DOI: 10.1093/bioinformatics/bts340 (open access)

ENCODE Chromatin state data offers nice insights. Take this and run with it.

mmmmm….another “big data” paper illustrates a point I’ve been hammering on: there is terrific data coming out of these projects–but it’s not in the publications. So I’m going to talk about this paper in this post (1), but I’ll direct you to the database where the information is really available for your perusal–and to a tutorial that explains how to access it. Off we go….

The ENCODE project is one of the “big data” consortium-type efforts that is so common these days. ENCODE stands for ENCyclopedia Of DNA Elements, and it was begun to take us beyond the basic framework that the reference genome sequencing provided–to find out more about the control and structural organizational features encoded in the genome. I won’t go into the entire background here–you can find that at NHGRI and in the previous publications from the ENCODE team members. Some of you will recall that there was a pilot phase to explore the strategies and technologies that might be used to accomplish their goals, which was completed and published in 2007 (2). But that covered only 1% of the genome. After the success of the pilot phase, ENCODE was rolled out in a scale-up or production phase–which is now genome-wide.

The data from the production phase has been flowing into the UCSC Genome Browser since 2009 (3). New strategies for storing, displaying, and visualizing related data have been developed–so even if you aren’t specifically interested in the data types that ENCODE is providing–you may want to be aware of how this next-gen type of data is being managed and displayed effectively.

One of the data types that has now been published is about the chromatin state dynamics that were determined using ChIP-Seq technologies, various antibodies for immuno-precipitation, and a number of different cell lines. The researchers explored the various combinations of signals that could be derived from the binding of different histones, insulator and transcription machinery proteins to different DNA segments, and developed a way to characterize the resulting states as weak or strong promoters, enhancers, insulators, transcription regulators and more.

I won’t go into the details of the biological meaning more here–a nice take on that was offered yesterday by Joe at Genomes Unzipped in his post: How do variants outside genes influence disease risk? So go there for some insights into the larger meaning of this data. I wanted to focus on ways you can access and explore this data yourself, and to get you to go mine this for regions that you are interested in.

The paper lays out the strategy and shows the typical compelling examples that all big data papers offer. But this is a mere fraction of what’s available to you–rarely is your favorite gene/region/whatever going to be singled out and mentioned in these kinds of papers. And because of that, it won’t be curated by traditional editors/curators as has been done in the good (?) old days of one-gene-one-paper for you to find later in some repository. It just can’t happen that way anymore.

You need to look at at this kind of data in your favorite regions yourself. All of this data has been deposited in the UCSC Data Coordination Center (DCC). It’s all available for you to peruse in the UCSC Genome Browser.

Figure 1c in the paper gives you a color-coded way to evaluate the data that you might see in your regions of interest. Use that as a guide to consider what the data shows when you visualize it. Figure 4 illustrates a specific example of one specific putative strong enhancer with a Gata1 binding site. This should help you to recognize this kind of pattern in the display when you find it over at UCSC.

Now–finding the data at UCSC: the first thing I would encourage is to watch the tutorial that is freely available, sponsored by the UCSC DCC group. This provides an overview, a guide to recognizing ENCODE data, and ways to interact with the data. Then go over to UCSC Genome Browser and look for the appropriate tracks in the 2009 assembly (for some data) and a lot of ENCODE data on the 2006 assembly. Keep in mind: not all of the ENCODE data has been mapped to the 2009 assembly–so it’s wise to explore both. Find the Regulation track section, and look for ENCODE transcription-factor binding tracks, and Enhancer and Promoter tracks, and the super-track data collections as well–these bring several related data types together for visualization (ENCODE Regulation track on Mar 2006 is one example of a super-track). Some of the data may also be found on the Preview server.

Well, that should keep you busy for a while. Go and mine this stuff for regions that you are interested in. I know you want to put some of the leads from this data into your next grant application…. :)  If you are going to publish the results of your mining be sure to review the ENCODE data use and release policy, and ensure it’s past the publication embargo window or meets the other criteria that might apply. But it’s all there for the lookin’ right now.

Quick links:

ENCODE background at NHGRI: http://www.genome.gov/10005107

ENCODE portal at UCSC: http://encodeproject.org

ENCODE tutorial at OpenHelix, freely available because it is sponsored by UCSC: http://openhelix.com/ENCODE


(1) Ernst, J., Kheradpour, P., Mikkelsen, T., Shoresh, N., Ward, L., Epstein, C., Zhang, X., Wang, L., Issner, R., Coyne, M., Ku, M., Durham, T., Kellis, M., & Bernstein, B. (2011). Mapping and analysis of chromatin state dynamics in nine human cell types Nature DOI: 10.1038/nature09906

And PeeEss: Don’t forget to look at the 19 figures in the supplemental information….

(2). Birney, E., Stamatoyannopoulos, J., Dutta, A., Guigó, R., Gingeras, T., Margulies, E., Weng, Z., Snyder, M., Dermitzakis, E., Stamatoyannopoulos, J., Thurman, R., Kuehn, M., Taylor, C., Neph, S., Koch, C., Asthana, S., Malhotra, A., Adzhubei, I., Greenbaum, J., Andrews, R., Flicek, P., Boyle, P., Cao, H., Carter, N., Clelland, G., Davis, S., Day, N., Dhami, P., Dillon, S., Dorschner, M., Fiegler, H., Giresi, P., Goldy, J., Hawrylycz, M., Haydock, A., Humbert, R., James, K., Johnson, B., Johnson, E., Frum, T., Rosenzweig, E., Karnani, N., Lee, K., Lefebvre, G., Navas, P., Neri, F., Parker, S., Sabo, P., Sandstrom, R., Shafer, A., Vetrie, D., Weaver, M., Wilcox, S., Yu1, M., Collins, F., Dekker, J., Lieb, J., Tullius, T., Crawford, G., Sunyaev, S., Noble, W., Dunham, I., Dutta, A., Guigó, R., Denoeud, F., Reymond, A., Kapranov, P., Rozowsky, J., Zheng, D., Castelo, R., Frankish, A., Harrow, J., Ghosh, S., Sandelin, A., Hofacker, I., Baertsch, R., Keefe, D., Flicek, P., Dike, S., Cheng, J., Hirsch, H., Sekinger, E., Lagarde, J., Abril, J., Shahab, A., Flamm, C., Fried, C., Hackermüller, J., Hertel, J., Lindemeyer, M., Missal, K., Tanzer, A., Washietl, S., Korbel, J., Emanuelsson, O., Pedersen, J., Holroyd, N., Taylor, R., Swarbreck, D., Matthews, N., Dickson, M., Thomas, D., Weirauch, M., Gilbert, J., Drenkow, J., Bell, I., Zhao, X., Srinivasan, K., Sung, W., Ooi, H., Chiu, K., Foissac, S., Alioto, T., Brent, M., Pachter, L., Tress, M., Valencia, A., Choo, S., Choo, C., Ucla, C., Manzano, C., Wyss, C., Cheung, E., Clark, T., Brown, J., Ganesh, M., Patel, S., Tammana, H., Chrast, J., Henrichsen, C., Kai, C., Kawai, J., Nagalakshmi, U., Wu, J., Lian, Z., Lian, J., Newburger, P., Zhang, X., Bickel, P., Mattick, J., Carninci, P., Hayashizaki, Y., Weissman, S., Dermitzakis, E., Margulies, E., Hubbard, T., Myers, R., Rogers, J., Stadler, P., Lowe, T., Wei, C., Ruan, Y., Snyder, M., Birney, E., Struhl, K., Gerstein, M., Antonarakis, S., Gingeras, T., Brown, J., Flicek, P., Fu, Y., Keefe, D., Birney, E., Denoeud, F., Gerstein, M., Green, E., Kapranov, P., Karaöz, U., Myers, R., Noble, W., Reymond, A., Rozowsky, J., Struhl, K., Siepel, A., Stamatoyannopoulos, J., Taylor, C., Taylor, J., Thurman, R., Tullius, T., Washietl, S., Zheng, D., Liefer, L., Wetterstrand, K., Good, P., Feingold, E., Guyer, M., Collins, F., Margulies, E., Cooper, G., Asimenos, G., Thomas, D., Dewey, C., Siepel, A., Birney, E., Keefe, D., Hou, M., Taylor, J., Nikolaev, S., Montoya-Burgos, J., Löytynoja, A., Whelan, S., Pardi, F., Massingham, T., Brown, J., Huang, H., Zhang, N., Bickel, P., Holmes, I., Mullikin, J., Ureta-Vidal, A., Paten, B., Seringhaus, M., Church, D., Rosenbloom, K., Kent, W., Stone, E., Sequencing Program*, N., Human Genome Sequencing Center*, B., Genome Sequencing Center*, W., Broad Institute*, ., Oakland Research Institute*, C., Gerstein, M., Antonarakis, S., Batzoglou, S., Goldman, N., Hardison, R., Haussler, D., Miller, W., Pachter, L., Green, E., Sidow, A., Weng, Z., Trinklein, N., Fu, Y., Zhang, Z., Karaöz, U., Barrera, L., Stuart, R., Zheng, D., Ghosh, S., Flicek, P., King, D., Taylor, J., Ameur, A., Enroth, S., Bieda, M., Koch, C., Hirsch, H., Wei, C., Cheng, J., Kim, J., Bhinge, A., Giresi, P., Jiang, N., Liu, J., Yao, F., Sung, W., Chiu, K., Vega, V., Lee, C., Ng, P., Shahab, A., Sekinger, E., Yang, A., Moqtaderi, Z., Zhu, Z., Xu, X., Squazzo, S., Oberley, M., Inman, D., Singer, M., Richmond, T., Munn, K., Rada-Iglesias, A., Wallerman, O., Komorowski, J., Clelland, G., Wilcox, S., Dillon, S., Andrews, R., Fowler, J., Couttet, P., James, K., Lefebvre, G., Bruce, A., Dovey, O., Ellis, P., Dhami, P., Langford, C., Carter, N., Vetrie, D., Kapranov, P., Nix, D., Bell, I., Patel, S., Rozowsky, J., Euskirchen, G., Hartman, S., Lian, J., Wu, J., Urban, A., Kraus, P., Van Calcar, S., Heintzman, N., Hoon Kim, T., Wang, K., Qu, C., Hon, G., Luna, R., Glass, C., Rosenfeld, M., Aldred, S., Cooper, S., Halees, A., Lin, J., Shulha, H., Zhang, X., Xu, M., Haidar, J., Yu, Y., Birney*, E., Weissman, S., Ruan, Y., Lieb, J., Iyer, V., Green, R., Gingeras, T., Wadelius, C., Dunham, I., Struhl, K., Hardison, R., Gerstein, M., Farnham, P., Myers, R., Ren, B., Snyder, M., Thomas, D., Rosenbloom, K., Harte, R., Hinrichs, A., Trumbower, H., Clawson, H., Hillman-Jackson, J., Zweig, A., Smith, K., Thakkapallayil, A., Barber, G., Kuhn, R., Karolchik, D., Haussler, D., Kent, W., Dermitzakis, E., Armengol, L., Bird, C., Clark, T., Cooper, G., de Bakker, P., Kern, A., Lopez-Bigas, N., Martin, J., Stranger, B., Thomas, D., Woodroffe, A., Batzoglou, S., Davydov, E., Dimas, A., Eyras, E., Hallgrímsdóttir, I., Hardison, R., Huppert, J., Sidow, A., Taylor, J., Trumbower, H., Zody, M., Guigó, R., Mullikin, J., Abecasis, G., Estivill, X., Birney, E., Bouffard, G., Guan, X., Hansen, N., Idol, J., Maduro, V., Maskeri, B., McDowell, J., Park, M., Thomas, P., Young, A., Blakesley, R., Muzny, D., Sodergren, E., Wheeler, D., Worley, K., Jiang, H., Weinstock, G., Gibbs, R., Graves, T., Fulton, R., Mardis, E., Wilson, R., Clamp, M., Cuff, J., Gnerre, S., Jaffe, D., Chang, J., Lindblad-Toh, K., Lander, E., Koriabine, M., Nefedov, M., Osoegawa, K., Yoshinaga, Y., Zhu, B., & de Jong, P. (2007). Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project Nature, 447 (7146), 799-816 DOI: 10.1038/nature05874

(3). Rosenbloom, K., Dreszer, T., Pheasant, M., Barber, G., Meyer, L., Pohl, A., Raney, B., Wang, T., Hinrichs, A., Zweig, A., Fujita, P., Learned, K., Rhead, B., Smith, K., Kuhn, R., Karolchik, D., Haussler, D., & Kent, W. (2009). ENCODE whole-genome data in the UCSC Genome Browser Nucleic Acids Research, 38 (Database) DOI: 10.1093/nar/gkp961

Tip of the Week: DAnCER for disease-annotated epigenetics data

Epigenetics and epigenomics are becoming more exciting areas of investigation, and we are seeing more requests for database resources to support them, and for the sources of data from these types of experiments. If you aren’t aware of these investigations at this point, check out their entries in the Talking Glossary of Genetic Terms:

Epigenetics: Epigenetics is an emerging field of science that studies heritable changes caused by the activation and deactivation of genes without any change in the underlying DNA sequence of the organism. The word epigenetics is of Greek origin and literally means over and above (epi) the genome.

Epigenome: The term epigenome is derived from the Greek word epi which literally means “above” the genome. The epigenome consists of chemical compounds that modify, or mark, the genome in a way that tells it what to do, where to do it, and when to do it. Different cells have different epigenetic marks. These epigenetic marks, which are not part of the DNA itself, can be passed on from cell to cell as cells divide, and from one generation to the next.

And for the talking part–you can hear Dr. Laura Elnitski talk about these in more detail–have a listen at each entry. And just today an article providing an epigenetics primer appeared in my inbox: Epigenetics: A Primer.

These intriguing–and sometimes puzzling–chromatin modification (CM) signals and leads that are being unveiled in many labs and projects now are becoming more widely available in different databases. For this week’s tip of the week I’ll introduce DAnCER: Disease-Annotated Chromatin Epigenetics Resource, one of the tools that is organizing this type of data and enabling additional explorations. You can find DAnCER here: http://wodaklab.org/dancer/

In the associated publication, the DAnCER team describes other useful resources that provide epigenetics data. These include ChromDB, ChromatinDB (for yeast), and the Human Histone Modification Database (HHMD), among others. I’m also aware of other sources. A few months back I introduced the NCBI Epigenomics resource as my tip-of-the-week. (At that time I promised that when the publication became available I’d mention it–that’s now at the bottom in the references section below.) There’s also quite a bit of this data flowing in to the UCSC Genome Browser ENCODE DCC. Including–may I add–some data from the very cool Elnitski bi-directional promoter studies.  You can find similar data types via the modENCODE project as well.

So, there are lots of resources out there. Each provider has different projects, species, goals, displays, etc. But the group that developed DAnCER wanted to fill a niche they didn’t see available already: linking these epigenetic changes to possible disease association data. Here’s how they describe their work:

Our research effort therefore strives to explore CM-related genes in the context of their protein-interaction network, their partnership in multi-protein complexes and cellular pathways, as well as their gene expression profiles….

They are well-suited to linking this kind of information. You may remember our previous explorations and discussions of iRefWeb. The kind of network and interaction data that they assemble in that context can be brought to the chromatin-modification arena. The point is that you can take steps beyond the modifications you know about, to explore their neighborhood of interactions, and potentially unearth important disease relationships from that.

The data includes several species, and because of that evolutionary conservation can also be explored.

So if you find that you are interested in exploring chromatin modifications, and want to take that data further, check out DAnCER, and the other tools and projects that are providing this type of information. If you have used the iRefWeb interface, you’ll see some similarities in structure. Search options with many filters are available. Color-coded and sortable results are provided. Links to gene details within the Wodak lab tools and external links are offered. On the gene pages at DAnCER you’ll have many types of annotations, including Gene Ontology descriptions, evidence type and references, neighbors, and protein domain information as well. And besides the texty-table based stuff, you can choose to load up the interactive network/interaction graphic, just like with the iRefWeb tool.

There’s a lot of opportunity to learn things from this tool. Try it out.

Quick Links and References:

DAnCER http://wodaklab.org/dancer/

Turinsky, A., Turner, B., Borja, R., Gleeson, J., Heath, M., Pu, S., Switzer, T., Dong, D., Gong, Y., On, T., Xiong, X., Emili, A., Greenblatt, J., Parkinson, J., Zhang, Z., & Wodak, S. (2010). DAnCER: Disease-Annotated Chromatin Epigenetics Resource Nucleic Acids Research, 39 (Database) DOI: 10.1093/nar/gkq857

Fingerman, I., McDaniel, L., Zhang, X., Ratzat, W., Hassan, T., Jiang, Z., Cohen, R., & Schuler, G. (2010). NCBI Epigenomics: a new public resource for exploring epigenomic data sets Nucleic Acids Research, 39 (Database) DOI: 10.1093/nar/gkq1146