We spend a lot of time talking about sequence data: where to find it, how to analyze it, etc. But increasingly we are seeing more and more data that comes from epigenomics projects. Recently a tweet from NCBI got me to look at their Epigenetics site again. http://www.ncbi.nlm.nih.gov/epigenomics
Their definition of epigenetics is:
What is Epigenetics?
Interest in epigenetics has exploded in recent years, but the central question it aims to answer has been with us for decades: how do the many cell types of the body maintain drastically different gene expression patterns while sharing exactly the same DNA?
Epigenetics refers to a gene activity state that may be stable over long periods of time, persist through many cell divisions, or even be inherited through several generations, all without any change to the primary DNA sequence (Roloff and Nuber 2005, Ng and Gurdon 2008, Probst, et al. 2009).
This is a nice site that offers a lot of helpful background, project information about the NIH Roadmap for Epigenomics, and then of course access to the data itself. They have separate guidance on the types of data that you will find in here: About DNA Methylation, About Histone Modification, and About Chromatin Structure. So if you are ready to go “Beyond the Genome” as their tag line indicates, you can learn about the data types and find the data too.
This tip of the week will take a look at access to the data. I’ll be taking a look at what happens when you use the Sample Browser as a starting point to see some of the data via browsing. You can do more complex and custom queries with the Advanced Query form, which looks like other query building tools at NCBI. I won’t have time to cover that, but I wanted you to know it was available.
For my example I just chose the top sample that was in the list at the time I did this tip. And it was fortuitous for a couple of reasons. First it was exactly the kind of paper that I was talking about in my recent post (The data isn’t in the papers anymore, you know.) This paper (referenced below) has a huge volume of data. It looks at 39 types of histone modifications, and looks at them genome wide. There’s no way to publish all that as figures in this paper. There are summary figures, but not individual ones for that data collection. You’d have to visualize this yourself elsewhere. The second reason it was cool was because the data perfectly validates some of the data I’ve been using to develop the ENCODE project tutorial we’ve just created with the UCSC ENCODE team.
Anyway–check out the NCBI Epigenomics resource for a great way to visualize data on this topic. Data that you will not find in the papers.
NCBI Epigenomics (the tip site): http://www.ncbi.nlm.nih.gov/epigenomics
Epigenome Browser: http://www.epigenomebrowser.org/
By the way: I also asked the hive mind at BioStar what tools they are using for epigenomics or epigenetics, and you can go and see that question over there. People told me about the Epigenome Atlas and EPIGRAPH. And as I was researching this tip I came across a Roadmap Epigenomics site, that offers a link to a browser. It’s a UCSC Genome Browser framework focused on this kind of data: Epigenome Browser–but that’s a different installation than the main UCSC Genome Browser that I illustrate from this tip.
Reference for data used and shown in the tip:
Wang, Z., Zang, C., Rosenfeld, J., Schones, D., Barski, A., Cuddapah, S., Cui, K., Roh, T., Peng, W., Zhang, M., & Zhao, K. (2008). Combinatorial patterns of histone acetylations and methylations in the human genome Nature Genetics, 40 (7), 897-903 DOI: 10.1038/ng.154
Currently there’s isn’t a reference for NCBI Epigenomics. I contacted the Help Desk to be sure, and they told me it’s been submitted but isn’t out yet. I’ll update this when that reference becomes available.