Tag Archives: epigenetics

Video Tip of the Week: Chromohub, annotated trees of chromatin-mediated signaling

Today’s tip of the week is a quick introduction to ChromoHub. ChromoHub is an annotated phylogeny of chromatin-mediated signaling genes. As the ChromoHub site says these are “genes involved in writing, reading and erasing the histone code.” These are epigenetic modifications that emerging as target classes for future drug therapies.

ChromoHub maps annotated information about these genes onto a phylogeny of the genes where the researcher can find a wealth of information. The information one can find ranges from cancer data, SNPS, protein structure, protein-protein interactions to PubMed and funding information. There is a lot of information to view.

Today’s tip introduces you to the tool and how to add and view the annotations. There is a lot more at ChromoHub. You can suggest data that the developers have missed and download the information, alignment files and images and more.

ChromoHub was developed by SGC, the Structural Genomics Consortium. This is a private-public partnership that supports discovery of new medicines through open access research. ChromoHub is just one of the tools and resources developed by the consortium.

To find out more about the resource, check out the links and reference below.

Quick Links:

UCSC Genome Browser
Structural Genomics Consortium  (SGC)

ChromoHub Reference:

Liu L, Zhen S, Denton E, Marsden B, Schapira M. (2012). ChromoHub: a data hub for navigators of chromatin-mediated signalling. Bioinformatics DOI: 10.1093/bioinformatics/bts340 (open access)

“Regulatory and Epigenetic Landscapes of Mammalian Genomes”

In the series of talks from the Current Topics in Genome Analysis course from NHGRI, Laura Elnitski spoke on regulation and epigenetics. I’ll include some of my notes below, but be sure to check out the whole talk when you have a chance–and the slides are available for download from the CTGA page.

Dr. Elnitski frames the talk by indicating that we’ve been focusing on the roughly 2% of the genome that consists of protein-coding genes, but that there’s a lot more going on outside of that, and how much more there is to learn about other aspects of genome regulation. One of the papers she uses to illustrate that makes it clear how much of the variation we are aware of outside of protein coding regions (Hindoff et al 2009; around 11 minutes). That paper described the NIH GWAS catalog, which analyzed disease/trait-associated SNPs (TAS) and found that “88% of TASs were intronic (45%) or intergenic (43%)”. And if that’s the case, you need to think about ways to evaluate the effects of these differently than if it was a protein variation that resulted.

Due to this fact–that it’s not just proteins we need to be looking at–Elnitski says, “So throughout this talk we’ll take a look at functional categories of the genome, to further explain the steps you might consider to ascertain function at these GWAS sites.”

One way to evaluate a region that contains a non-coding variant is to consider it’s evolutionary relationships. How conserved is this tidbit in other species? Laura describes how PHASTCons and GERP can help you to analyze that (around 21 minutes). These tools use different approaches to find constrained elements. You can use knowledge of regions that have accelerated rates of change to suss out interesting features (she used the opposable thumb and foot/ankle region among bipedals as interesting examples of that sort of change; around 26 minutes).

Another type of landscape feature described was enhancer signatures. She offered a nice diagrammatic view of what this look like around a region to convey possible enhancer function (around 32 minutes). The look at the representation of the histone code could probably help people who are trying to use the ENCODE data tracks at UCSC to visualize that–and in slide 63 she looks at what the pattern of codes in an active promoter might look like, and then after that key differences of enhancers and what repressed regions look like (around 1hr). I found that really helpful.

One point she stressed though–epigenetic patterns are very cell-type specific–be sure to look at various cell types, and tread carefully with conclusions if your cell type of interest has not been evaluated yet (around 36 minutes). [As a side note, I worry about this particularly as a misuse by cranks of the features of epigenetics--they are already going out and telling people they can fix everything wrong with their health by affecting their epigenetics. Now, let's say you claim to treat diabetes or autism with your detox epi-fix--what is the impact on other cell types exactly??]

She also goes on to explain how these features rely on the 3D structure of the nucleus, looping interactions, and the packing of the chromosomes, with some nice guidance on how to think about that and the types of techniques to assess that. And just after I watched this, a paper came out describing more of this topology with the Hi-C strategy that she referenced.

It’s also important to consider that splicing defects can have consequences that wouldn’t be obvious just from looking at coding sequence per se. Although a substitution might be synonymous and not change an amino acid, it could still affect splicing. The SKIPPY tool that was developed by her group (and that Jennifer highlighted as a Tip of the Week) was suggested as a way to explore this (around 47 minutes).

This talk was a useful guide to thinking about non-coding genomic features to consider for your research. There were helpful graphics and tools provided. Have a look–it’s worth your time.


Woolfe, A., Mullikin, J., & Elnitski, L. (2010). Genomic features defining exonic variants that modulate splicing Genome Biology, 11 (2) DOI: 10.1186/gb-2010-11-2-r20

Hindorff, L., Sethupathy, P., Junkins, H., Ramos, E., Mehta, J., Collins, F., & Manolio, T. (2009). Potential etiologic and functional implications of genome-wide association loci for human diseases and traits Proceedings of the National Academy of Sciences, 106 (23), 9362-9367 DOI: 10.1073/pnas.0903103106

Video Tips of the Week: Annual Review IV (first half of 2011)

As you may know, we’ve been doing these video tips-of-the-week for FOUR years now. We have completed around 200 little tidbit introductions to various resources. At the end of the year we’ve established a sort of holiday tradition: we are doing a summary post to collect them all. If you have missed any of them it’s a great way to have a quick look at what might be useful to your work.

You can see past years’ tips here: 2008 I, 2008 II, 2009 I, 2009 II, 2010 I, 2010 II. The summary of the second half of 2011 will be available next week here.

January 2011

January 5: SKIPPY predicting variants w/ splicing effects

January 12: Twitter in Bioinformatics. This one was much more popular than I expected!

January 19: PolyPhen, for predicting the possible effects of mutations in genes

January 26: iRefWeb + protein interaction curation

February 2011

February 2: RCSB PDB Data Distribution Summaries

February 9: SIFT, Sorting (SNPs) Intolerant From Tolerant another tool for predicting the impact of mutations in genes.

February 16: Melina II for promoter analysis

February 23: SNPTips and viewing personal genome data This tip is one of the most-watched ones we’ve had. Thousands of views on SciVee!

March 2011

March 2: DAnCER for disease-annotated epigenetics data

March 9: World Tour of Genomics Resources

March 16: Encyclopedia of Life

March 23: ORegAnno for regulatory annotation

March 30: MetaPhoOrs, orthology and paralogy predictions

April 2011

April 6: The Taverna Project for workflows

April 13: VirusMINT , the branch of the Molecular Interaction database for viral interactions

April 20: LAMHDI for animal models

April 27: Dot Plots, Synteny at VISTA

May 2011

May 4: MycoCosm

May 11: InterMine for mining “big data”

May 18: Allen Institute’s Brain Explorer

May 25: SciVee, the YouTube of science

June 2011

June 1: New and Improved OMIM®

June 8: Converting Genome Coordinates

June 15: MutaDATABASE, a centralized and standardized DNA variation database

June 22: Update to NCBI’s Cn3D Viewer

June 29: Orphanet for Rare Disease information

Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

Tip of the Week: Human Epigenomics Visualization Hub

More and more we are seeing questions about ways to access Epigenomics data in the workshops we do. This often comes up in the workshop we do that focuses on the ENCODE data, because ENCODE is providing several epigenomics data sets that researchers are interested in. [The workshop we do is based on the materials you can find from the UCSC-sponsored freely available ENCODE tutorial.] But there are other browsers and data collections, and researchers want to be sure they are finding as much as possible.

Beside the data that is flowing into the UCSC Genome Browser ENCODE portal, we’ve talked in the past about the NCBI Epigenomics resource, including this tip.  We have also explored DAnCER in the past as a tip-of-the-week.

I’m still recovering from my vacation this week, so I’m going to point you to a very nice SciVee example I found on another resource, the Human Epigenomics Visualization Hub from WashU, which can be accessed at this link: http://vizhub.wustl.edu/ It’s longer than our usual tips–about 20 minutes. But if you want to begin to explore the data available on that browser, it’s worth your time.

Based on a UCSC Genome Browser framework, this resource focuses on epigenomics data. Familiarity with basic vocabulary and functional features of the UCSC Genome Browser is something I’d recommend. Check out our freely-available UCSC sponsored tutorials for this which cover features of tracks and displays, including things like bigWig, Sessions, etc.

The project is associated with the Roadmap Epigenomics Project, which you can explore in detail at the project site, and learn more about in the reference below.

In case the embed isn’t working, here is the link to the SciVee page: http://www.scivee.tv/node/31122

Quick link to Vizhub: http://vizhub.wustl.edu/

Follow them on Twitter for news and announcements: @WashUGBrowser

Bernstein, B., Stamatoyannopoulos, J., Costello, J., Ren, B., Milosavljevic, A., Meissner, A., Kellis, M., Marra, M., Beaudet, A., Ecker, J., Farnham, P., Hirst, M., Lander, E., Mikkelsen, T., & Thomson, J. (2010). The NIH Roadmap Epigenomics Mapping Consortium Nature Biotechnology, 28 (10), 1045-1048 DOI: 10.1038/nbt1010-1045

Tip of the Week: DAnCER for disease-annotated epigenetics data

Epigenetics and epigenomics are becoming more exciting areas of investigation, and we are seeing more requests for database resources to support them, and for the sources of data from these types of experiments. If you aren’t aware of these investigations at this point, check out their entries in the Talking Glossary of Genetic Terms:

Epigenetics: Epigenetics is an emerging field of science that studies heritable changes caused by the activation and deactivation of genes without any change in the underlying DNA sequence of the organism. The word epigenetics is of Greek origin and literally means over and above (epi) the genome.

Epigenome: The term epigenome is derived from the Greek word epi which literally means “above” the genome. The epigenome consists of chemical compounds that modify, or mark, the genome in a way that tells it what to do, where to do it, and when to do it. Different cells have different epigenetic marks. These epigenetic marks, which are not part of the DNA itself, can be passed on from cell to cell as cells divide, and from one generation to the next.

And for the talking part–you can hear Dr. Laura Elnitski talk about these in more detail–have a listen at each entry. And just today an article providing an epigenetics primer appeared in my inbox: Epigenetics: A Primer.

These intriguing–and sometimes puzzling–chromatin modification (CM) signals and leads that are being unveiled in many labs and projects now are becoming more widely available in different databases. For this week’s tip of the week I’ll introduce DAnCER: Disease-Annotated Chromatin Epigenetics Resource, one of the tools that is organizing this type of data and enabling additional explorations. You can find DAnCER here: http://wodaklab.org/dancer/

In the associated publication, the DAnCER team describes other useful resources that provide epigenetics data. These include ChromDB, ChromatinDB (for yeast), and the Human Histone Modification Database (HHMD), among others. I’m also aware of other sources. A few months back I introduced the NCBI Epigenomics resource as my tip-of-the-week. (At that time I promised that when the publication became available I’d mention it–that’s now at the bottom in the references section below.) There’s also quite a bit of this data flowing in to the UCSC Genome Browser ENCODE DCC. Including–may I add–some data from the very cool Elnitski bi-directional promoter studies.  You can find similar data types via the modENCODE project as well.

So, there are lots of resources out there. Each provider has different projects, species, goals, displays, etc. But the group that developed DAnCER wanted to fill a niche they didn’t see available already: linking these epigenetic changes to possible disease association data. Here’s how they describe their work:

Our research effort therefore strives to explore CM-related genes in the context of their protein-interaction network, their partnership in multi-protein complexes and cellular pathways, as well as their gene expression profiles….

They are well-suited to linking this kind of information. You may remember our previous explorations and discussions of iRefWeb. The kind of network and interaction data that they assemble in that context can be brought to the chromatin-modification arena. The point is that you can take steps beyond the modifications you know about, to explore their neighborhood of interactions, and potentially unearth important disease relationships from that.

The data includes several species, and because of that evolutionary conservation can also be explored.

So if you find that you are interested in exploring chromatin modifications, and want to take that data further, check out DAnCER, and the other tools and projects that are providing this type of information. If you have used the iRefWeb interface, you’ll see some similarities in structure. Search options with many filters are available. Color-coded and sortable results are provided. Links to gene details within the Wodak lab tools and external links are offered. On the gene pages at DAnCER you’ll have many types of annotations, including Gene Ontology descriptions, evidence type and references, neighbors, and protein domain information as well. And besides the texty-table based stuff, you can choose to load up the interactive network/interaction graphic, just like with the iRefWeb tool.

There’s a lot of opportunity to learn things from this tool. Try it out.

Quick Links and References:

DAnCER http://wodaklab.org/dancer/

Turinsky, A., Turner, B., Borja, R., Gleeson, J., Heath, M., Pu, S., Switzer, T., Dong, D., Gong, Y., On, T., Xiong, X., Emili, A., Greenblatt, J., Parkinson, J., Zhang, Z., & Wodak, S. (2010). DAnCER: Disease-Annotated Chromatin Epigenetics Resource Nucleic Acids Research, 39 (Database) DOI: 10.1093/nar/gkq857

Fingerman, I., McDaniel, L., Zhang, X., Ratzat, W., Hassan, T., Jiang, Z., Cohen, R., & Schuler, G. (2010). NCBI Epigenomics: a new public resource for exploring epigenomic data sets Nucleic Acids Research, 39 (Database) DOI: 10.1093/nar/gkq1146

Tip of the Week: A year in tips III (last half of 2010)

As you may know, we’ve been doing tips-of-the-week for three years now. We have completed around 150 little tidbit introductions to various resources. At the end of the year we’ve established a sort of holiday tradition: we are doing a summary post to collect them all. If you have missed any of them it’s a great way to have a quick look at what might be useful to your work.

Here are the tips from the first half of the year, and below you will find the tips from the last half of 2010 (you can see past years’ tips here: 2008 I2008 II2009 I2009 II):


July 7: Mint for Protein Interactions, an introduction to MINT to study protein-protein interactions
July 14: Introduction to Changes to NCBI’s Protein Database, as it states :D
July 21: 1000 Genome Project Browser, 1000 Genomes project has pilot data out, this is the browser.
July 28: R Genetics at Galaxy, the Galaxy analysis and workflow tool added R genetics analysis tools.


August 4: YeastMine, SGD adds an InterMine capability to their database search.
August 11: Gaggle Genome Browser, a tool to allow for the visualization of genomic data, part of the “gaggle components”
August 18: Brenda, comprehensive enzyme information.
August 25: Mouse Genomic Pathology, unlike other tips, this is not a video but rather a detailed introduction to a new website.


September 1: Galaxy Pages, and introduction to the new community documentation and sharing capability at Galaxy.
September 8: Varitas. A Plaid Database. A resource that integrates human variation data such as SNPs and CNVs.
September 15: CircuitsDB for TF/miRNA/gene regulation networks.
September 21: Pathcase for pathway data.
September 29: Comparative Toxicogenomics Database (CTD), VennViewer. A new tool to create Venn diagrams to compare associated datasets for genes, diseases or chemicals.


October 6: BioExtract Server, a server that allows researcher to store data, analyze data and create workflows of data.
October 13: NCBI Epigenomics, “Beyond the Genome” NCBI’s site for information and data on epigenetics.
October 20: Comparing Microbial Databases including IMG, UCSC Microbial and Archeal browsers, CMR and others.
October 27: iTOL, interactive tree of life


November 3: VISTA Enhancer Browser explore possible regulatory elements with comparative genomics
November 10: Getting canonical gene info from the UCSC Browser. Need one gene version to ‘rule them all’?
November 17: ENCODE Data in the UCSC Genome Browser, an entire 35 minute tutorial on the ENCODE project.
November 24: FLink. A tool that links items in one NCBI database to another in a meaningful and weighted manner.


December 1: PhylomeDB. A database of gene phylogenies of many species.
December 8: BioGPS for expression data and more.
December 15: RepTar, a database of miRNA target sites.

Friday SNPpets

Welcome to our Friday feature link dump: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

Heads Up, More Data, Epigenome

As this Nature editorial says, the as the human genome (and a few hundred others) were completed, the amount of data had become daunting (we know that well here at OpenHelix, we deal with it everyday and daily make that more accessible to scientists through training :). But also, importantly, even with all the data, it’s been found that we need more. As the editorial states:

By 2004, large-scale genome projects were already indicating that genome sequences, within and across species, were too similar to be able to explain the diversity of life. It was instead clear that epigenetics — those changes to gene expression caused by chemical modification of DNA and its associated proteins — could explain much about how these similar genetic codes are expressed uniquely in different cells, in different environmental conditions and at different times.

Thus is born the Human Epigenome Consortium (Nature paper, subscription required, here). You can find some of the data from the pilot projec at the Sanger Institute site.

The beginning stages, but I believe it will prove to be quite a treasure trove of data (as if we don’t have a huge unmined dataset now). It was this last comment in the editorial:

.., given that epigenetic coding will be orders of magnitude more complex than genetic coding, its requirement for data crunching may be similar…

Get ready for a lot more resources and tools of greater complexity :).