Today’s tip of the week is a quick introduction to ChromoHub. ChromoHub is an annotated phylogeny of chromatin-mediated signaling genes. As the ChromoHub site says these are “genes involved in writing, reading and erasing the histone code.” These are epigenetic modifications that emerging as target classes for future drug therapies.
ChromoHub maps annotated information about these genes onto a phylogeny of the genes where the researcher can find a wealth of information. The information one can find ranges from cancer data, SNPS, protein structure, protein-protein interactions to PubMed and funding information. There is a lot of information to view.
Today’s tip introduces you to the tool and how to add and view the annotations. There is a lot more at ChromoHub. You can suggest data that the developers have missed and download the information, alignment files and images and more.
ChromoHub was developed by SGC, the Structural Genomics Consortium. This is a private-public partnership that supports discovery of new medicines through open access research. ChromoHub is just one of the tools and resources developed by the consortium.
To find out more about the resource, check out the links and reference below.
Dr. Elnitski frames the talk by indicating that we’ve been focusing on the roughly 2% of the genome that consists of protein-coding genes, but that there’s a lot more going on outside of that, and how much more there is to learn about other aspects of genome regulation. One of the papers she uses to illustrate that makes it clear how much of the variation we are aware of outside of protein coding regions (Hindoff et al 2009; around 11 minutes). That paper described the NIH GWAS catalog, which analyzed disease/trait-associated SNPs (TAS) and found that “88% of TASs were intronic (45%) or intergenic (43%)”. And if that’s the case, you need to think about ways to evaluate the effects of these differently than if it was a protein variation that resulted.
Due to this fact–that it’s not just proteins we need to be looking at–Elnitski says, “So throughout this talk we’ll take a look at functional categories of the genome, to further explain the steps you might consider to ascertain function at these GWAS sites.”
One way to evaluate a region that contains a non-coding variant is to consider it’s evolutionary relationships. How conserved is this tidbit in other species? Laura describes how PHASTCons and GERP can help you to analyze that (around 21 minutes). These tools use different approaches to find constrained elements. You can use knowledge of regions that have accelerated rates of change to suss out interesting features (she used the opposable thumb and foot/ankle region among bipedals as interesting examples of that sort of change; around 26 minutes).
Another type of landscape feature described was enhancer signatures. She offered a nice diagrammatic view of what this look like around a region to convey possible enhancer function (around 32 minutes). The look at the representation of the histone code could probably help people who are trying to use the ENCODE data tracks at UCSC to visualize that–and in slide 63 she looks at what the pattern of codes in an active promoter might look like, and then after that key differences of enhancers and what repressed regions look like (around 1hr). I found that really helpful.
One point she stressed though–epigenetic patterns are very cell-type specific–be sure to look at various cell types, and tread carefully with conclusions if your cell type of interest has not been evaluated yet (around 36 minutes). [As a side note, I worry about this particularly as a misuse by cranks of the features of epigenetics--they are already going out and telling people they can fix everything wrong with their health by affecting their epigenetics. Now, let's say you claim to treat diabetes or autism with your detox epi-fix--what is the impact on other cell types exactly??]
She also goes on to explain how these features rely on the 3D structure of the nucleus, looping interactions, and the packing of the chromosomes, with some nice guidance on how to think about that and the types of techniques to assess that. And just after I watched this, a paper came out describing more of this topology with the Hi-C strategy that she referenced.
It’s also important to consider that splicing defects can have consequences that wouldn’t be obvious just from looking at coding sequence per se. Although a substitution might be synonymous and not change an amino acid, it could still affect splicing. The SKIPPY tool that was developed by her group (and that Jennifer highlighted as a Tip of the Week) was suggested as a way to explore this (around 47 minutes).
This talk was a useful guide to thinking about non-coding genomic features to consider for your research. There were helpful graphics and tools provided. Have a look–it’s worth your time.
Woolfe, A., Mullikin, J., & Elnitski, L. (2010). Genomic features defining exonic variants that modulate splicing Genome Biology, 11 (2) DOI: 10.1186/gb-2010-11-2-r20
Hindorff, L., Sethupathy, P., Junkins, H., Ramos, E., Mehta, J., Collins, F., & Manolio, T. (2009). Potential etiologic and functional implications of genome-wide association loci for human diseases and traits Proceedings of the National Academy of Sciences, 106 (23), 9362-9367 DOI: 10.1073/pnas.0903103106
As you may know, we’ve been doing these video tips-of-the-week for FOUR years now. We have completed around 200 little tidbit introductions to various resources. At the end of the year we’ve established a sort of holiday tradition: we are doing a summary post to collect them all. If you have missed any of them it’s a great way to have a quick look at what might be useful to your work.
Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…
I LOVE the idea of the obituary section for NAR! RT @LabSpaces: Jerm Looks at the Annual NAR Web Server Issue http://bit.ly/pImJma @jermdemo with a great post on bioinformatics and new stuff in the field [Mary]
More and more we are seeing questions about ways to access Epigenomics data in the workshops we do. This often comes up in the workshop we do that focuses on the ENCODE data, because ENCODE is providing several epigenomics data sets that researchers are interested in. [The workshop we do is based on the materials you can find from the UCSC-sponsored freely available ENCODE tutorial.] But there are other browsers and data collections, and researchers want to be sure they are finding as much as possible.
I’m still recovering from my vacation this week, so I’m going to point you to a very nice SciVee example I found on another resource, the Human Epigenomics Visualization Hub from WashU, which can be accessed at this link: http://vizhub.wustl.edu/ It’s longer than our usual tips–about 20 minutes. But if you want to begin to explore the data available on that browser, it’s worth your time.
Based on a UCSC Genome Browser framework, this resource focuses on epigenomics data. Familiarity with basic vocabulary and functional features of the UCSC Genome Browser is something I’d recommend. Check out our freely-available UCSC sponsored tutorials for this which cover features of tracks and displays, including things like bigWig, Sessions, etc.
The project is associated with the Roadmap Epigenomics Project, which you can explore in detail at the project site, and learn more about in the reference below.
Epigenetics and epigenomics are becoming more exciting areas of investigation, and we are seeing more requests for database resources to support them, and for the sources of data from these types of experiments. If you aren’t aware of these investigations at this point, check out their entries in the Talking Glossary of Genetic Terms:
Epigenetics: Epigenetics is an emerging field of science that studies heritable changes caused by the activation and deactivation of genes without any change in the underlying DNA sequence of the organism. The word epigenetics is of Greek origin and literally means over and above (epi) the genome.
Epigenome: The term epigenome is derived from the Greek word epi which literally means “above” the genome. The epigenome consists of chemical compounds that modify, or mark, the genome in a way that tells it what to do, where to do it, and when to do it. Different cells have different epigenetic marks. These epigenetic marks, which are not part of the DNA itself, can be passed on from cell to cell as cells divide, and from one generation to the next.
And for the talking part–you can hear Dr. Laura Elnitski talk about these in more detail–have a listen at each entry. And just today an article providing an epigenetics primer appeared in my inbox: Epigenetics: A Primer.
These intriguing–and sometimes puzzling–chromatin modification (CM) signals and leads that are being unveiled in many labs and projects now are becoming more widely available in different databases. For this week’s tip of the week I’ll introduce DAnCER: Disease-Annotated Chromatin Epigenetics Resource, one of the tools that is organizing this type of data and enabling additional explorations. You can find DAnCER here: http://wodaklab.org/dancer/
In the associated publication, the DAnCER team describes other useful resources that provide epigenetics data. These include ChromDB, ChromatinDB (for yeast), and the Human Histone Modification Database (HHMD), among others. I’m also aware of other sources. A few months back I introduced the NCBI Epigenomics resource as my tip-of-the-week. (At that time I promised that when the publication became available I’d mention it–that’s now at the bottom in the references section below.) There’s also quite a bit of this data flowing in to the UCSC Genome Browser ENCODE DCC. Including–may I add–some data from the very cool Elnitski bi-directional promoter studies. You can find similar data types via the modENCODE project as well.
So, there are lots of resources out there. Each provider has different projects, species, goals, displays, etc. But the group that developed DAnCER wanted to fill a niche they didn’t see available already: linking these epigenetic changes to possible disease association data. Here’s how they describe their work:
Our research effort therefore strives to explore CM-related genes in the context of their protein-interaction network, their partnership in multi-protein complexes and cellular pathways, as well as their gene expression profiles….
They are well-suited to linking this kind of information. You may remember our previous explorations and discussions of iRefWeb. The kind of network and interaction data that they assemble in that context can be brought to the chromatin-modification arena. The point is that you can take steps beyond the modifications you know about, to explore their neighborhood of interactions, and potentially unearth important disease relationships from that.
The data includes several species, and because of that evolutionary conservation can also be explored.
So if you find that you are interested in exploring chromatin modifications, and want to take that data further, check out DAnCER, and the other tools and projects that are providing this type of information. If you have used the iRefWeb interface, you’ll see some similarities in structure. Search options with many filters are available. Color-coded and sortable results are provided. Links to gene details within the Wodak lab tools and external links are offered. On the gene pages at DAnCER you’ll have many types of annotations, including Gene Ontology descriptions, evidence type and references, neighbors, and protein domain information as well. And besides the texty-table based stuff, you can choose to load up the interactive network/interaction graphic, just like with the iRefWeb tool.
There’s a lot of opportunity to learn things from this tool. Try it out.
Turinsky, A., Turner, B., Borja, R., Gleeson, J., Heath, M., Pu, S., Switzer, T., Dong, D., Gong, Y., On, T., Xiong, X., Emili, A., Greenblatt, J., Parkinson, J., Zhang, Z., & Wodak, S. (2010). DAnCER: Disease-Annotated Chromatin Epigenetics Resource Nucleic Acids Research, 39 (Database) DOI: 10.1093/nar/gkq857
Fingerman, I., McDaniel, L., Zhang, X., Ratzat, W., Hassan, T., Jiang, Z., Cohen, R., & Schuler, G. (2010). NCBI Epigenomics: a new public resource for exploring epigenomic data sets Nucleic Acids Research, 39 (Database) DOI: 10.1093/nar/gkq1146
As you may know, we’ve been doing tips-of-the-week for three years now. We have completed around 150 little tidbit introductions to various resources. At the end of the year we’ve established a sort of holiday tradition: we are doing a summary post to collect them all. If you have missed any of them it’s a great way to have a quick look at what might be useful to your work.
Welcome to our Friday feature link dump: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…
The Institute of Cancer Research (ICR) is organizing a meeting to celebrate the launch of ICR’s Integrative Network of Biology Initiative from May 14th to 15th, 2010 in Chelsea, UK. The theme will be approaches to study signalling networks, and how network models can be applied to the study of complex diseases such as cancer. Find out more here: http://www.lindinglab.org/events. [Trey]
Jax® mice in space: I don’t know why it made me laugh, but to see the correct nomenclature on these animals–paired with the momentary wondering how they handled the lack of gravity and imagining them floating around the cages….[Mary]
As this Nature editorial says, the as the human genome (and a few hundred others) were completed, the amount of data had become daunting (we know that well here at OpenHelix, we deal with it everyday and daily make that more accessible to scientists through training :). But also, importantly, even with all the data, it’s been found that we need more. As the editorial states:
By 2004, large-scale genome projects were already indicating that genome sequences, within and across species, were too similar to be able to explain the diversity of life. It was instead clear that epigenetics — those changes to gene expression caused by chemical modification of DNA and its associated proteins — could explain much about how these similar genetic codes are expressed uniquely in different cells, in different environmental conditions and at different times.