Tag Archives: ENCODE

UCSC Genome Bioinformatics

Video Tip of the Week: UCSC features for ENCODE data utilization

UCSC Genome BioinformaticsAs noted in last week’s tip about the ENCODE DCC at Stanford, there was a workshop recently for the ENCODE project. There were a lot of folks speaking and a big room full of attendees. You should check out the full agenda and the playlist at the NHGRI site for all the videos, slides, and handouts: ENCODE 2015: Research Applications and Users Meeting.

This week I’m highlighting another video from this event. In this one, Pauline Fujita from the UCSC Genome Browser covers ways to work with ENCODE data in their browser.

Some of the talk includes intro stuff for brand new users, because there were certainly some in this workshop. If you are new to the tools, too, you can also see our free tutorial suites (below). Pauline also quickly highlights their Genome Browser in a Box virtual machine option for folks who have privacy sensitive or protected data, but only briefly. If you want some more info on that, check out our Tip of the Week on GBIB.

But soon she covered more detail on features like track hubs and how to use those (if you wanted to jump to that part, it begins around 20min). That extra search for items in the Track Hub is really good to know about. file_formats_helpAlso, there’s some guidance here on the types of file formats that you may want to use to structure your data. Also why you want BED vs Wiggle, for example. For the part that addresses these formats, jump to about 33min.

Towards the end there’s coverage of the Data Integrator. The idea with this feature is that maybe you’ve got some information on a region and you have this structured as a BED file–or a number of regions–and you want to find out what else is going on in those regions. The Data Integrator can help you with that by finding overlaps among different tracks of data (around 45min). The Variant Annotation Integrator does kind of a similar thing, but for VCF files with variation information (~48min). A smidge more guidance on track hubs comes in at 50min.

In our paper for Current Protocols (which is now in PubMedCentral), we talk a bit about the hubs structure too. So if it runs too quickly at the end, our paper shows some of that detail pretty much the same way. That might help you to think about how to structure them if the concept is new to you. But if you are ready to dive in, there’s a paper specifically about hubs. And there’s also more background on the browser’s tools and in the NAR database issue papers. There’s a lot of ENCODE data available to mine, and I really hope more folks can use the tools to find new insights into genomic regions they are interested in.

Quick links:

Track hubs: http://genome.ucsc.edu/cgi-bin/hgHubConnect

Data Integrator: http://genome.ucsc.edu/cgi-bin/hgIntegrator

Variant Annotation Integrator: http://genome.ucsc.edu/cgi-bin/hgVai

ENCODE features at UCSC: http://genome.ucsc.edu/ENCODE

UCSC tutorial suites:

UCSC Intro Tutorial suites (video, with our free slides + exercises): http://www.openhelix.com/ucscintro

UCSC Advanced Tutorial suites (video, slides, exercises): http://www.openhelix.com/ucscadv


Mangan ME, Williams JM, Kuhn RM, & Lathe WC (2014). The UCSC Genome Browser: What Every Molecular Biologist Should Know Current Protocols in Molecular Biology., 107 (19.9), 199-199 DOI: 10.1002/0471142727.mb1909s107

Rosenbloom, K., Armstrong, J., Barber, G., Casper, J., Clawson, H., Diekhans, M., Dreszer, T., Fujita, P., Guruvadoo, L., Haeussler, M., Harte, R., Heitner, S., Hickey, G., Hinrichs, A., Hubley, R., Karolchik, D., Learned, K., Lee, B., Li, C., Miga, K., Nguyen, N., Paten, B., Raney, B., Smit, A., Speir, M., Zweig, A., Haussler, D., Kuhn, R., & Kent, W. (2014). The UCSC Genome Browser database: 2015 update Nucleic Acids Research, 43 (D1) DOI: 10.1093/nar/gku1177

Raney, B., Dreszer, T., Barber, G., Clawson, H., Fujita, P., Wang, T., Nguyen, N., Paten, B., Zweig, A., Karolchik, D., & Kent, W. (2013). Track data hubs enable visualization of user-defined genome-wide annotations on the UCSC Genome Browser Bioinformatics, 30 (7), 1003-1005 DOI: 10.1093/bioinformatics/btt637

Disclosure: UCSC Genome Browser tutorials are freely available because UCSC sponsors us to do training and outreach on the UCSC Genome Browser.


Video Tip of the Week: ENCODE Data Coordination Center, phase 3


Image via: A User’s Guide to the Encyclopedia of DNA Elements (ENCODE). doi:10.1371/journal.pbio.1001046.g001

The ENCODE project began many years ago, with a pilot phase, that examined just 1% of the human genome. But this initial exploration helped the consortium participants to iron out some of the directions for later stages–including focusing on specific cell lines, techniques, and technologies in Phase 2. There have been a number of publications that came out from consortium members, but in addition to the participant’s papers, a lot of other folks have mined this data for various investigations as well. There’s still plenty of opportunity for discovery. Some people may not realize that there’s an also ENCODE phase 3 underway.

When we had a contract with the folks at UCSC Genome Browser for outreach on ENCODE, we developed materials to help people explore the data. But we hadn’t delved into it much since phase 3 began. But the other day I got a note from my NHGRI YouTube subscription (GenomeTV) that a whole workshop of ENCODE phase 3 information had been made available. So I wanted to have a look.

There is a series of video segments that correspond to this agenda from the ENCODE workshop. I’ll be highlighting one of them here, the one that introduces the features of the Phase 3 Data Coordination Center at Stanford now. But there may be others that you want to examine for your research goals as well. Another way to work through the different segments is available from the NHGRI page here: http://www.genome.gov/27561910 That page offers the slides, handouts, and exercises too.

The video is longer than our typical tips, but it’s worth seeing for the context and framework details. There’s also a section on searching and filtering, which explains how to locate precisely the things you want to find. There’s a helpful and funny analogy to searching for shoes as you would at Zappos. I’ve used the Zappos tool exactly that way, and I also like it very much. If you want more details on how their ontology structure helps them to accomplish this, check out the paper linked below. Also in the video, there’s a piece about how the metadata is structured, and what you can expect to find there.

There’s also a part about how to visualize the things you find. You end up loading them as a UCSC Genome Browser track hub, which is integrated with all they other data at UCSC. There’s another video with Pauline Fujita on the hubs which I’ll address separately later.

The playlist for the whole meeting is here. I won’t be highlighting all of them, but I may select more of them for future tips.

Quick link:

ENCODE portal: https://www.encodeproject.org/


Malladi, V., Erickson, D., Podduturi, N., Rowe, L., Chan, E., Davidson, J., Hitz, B., Ho, M., Lee, B., Miyasato, S., Roe, G., Simison, M., Sloan, C., Strattan, J., Tanaka, F., Kent, W., Cherry, J., & Hong, E. (2015). Ontology application and use at the ENCODE DCC Database, 2015 DOI: 10.1093/database/bav010

ENCODE Project Consortium (2012). An integrated encyclopedia of DNA elements in the human genome Nature, 489 (7414), 57-74 DOI: 10.1038/nature11247

ENCODE Project Consortium. (2011). A User’s Guide to the Encyclopedia of DNA Elements (ENCODE) PLoS Biology, 9 (4) DOI: 10.1371/journal.pbio.1001046

ENCODE Project Consortium (2004). The ENCODE (ENCyclopedia Of DNA Elements) Project Science, 306 (5696), 636-640 DOI: 10.1126/science.1105136

Friday SNPpets

This week’s SNPpets include definition confusion in “epigenetics”, two HIPPIES, a new mouse ENCODE browser, living figures (new ways to interact with published data), and new features at the Drug-Gene Interaction database (DGIdb). Oh–and the woolly mammoth genome.

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…


Note: Because of the way Twitter has re-vamped their retweet software, it’s harder to get just the text versions of tweets. But embedded tweets are huge. We are going to try out this new format, but are not sure it will work for searching and indexing the way we like. We may revisit the old format after testing this out a bit.

Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…


Video Tip of the Week: New UCSC “stacked” wiggle track view

This week’s video tip shows you a new way to look at the multiWig track data at the UCSC Genome Browser. A new option has recently been released (see 06 May 2014), a “stacked” view, and it’s a handy way to look at the data with a new strategy. But I’ll admit it took me a little while of working with it to understand the details. So in this tip I hope you’ll see what the new visualization offers.

I won’t go into the background on the many types of annotation tracks available–if you need to be introduced to the idea of the basic track views, start out with our introduction tutorial that touches on the different types of graphical representations. Custom tracks are touched on in the advanced tutorial. For guidance specifically how to create the different track types, see the UCSC documentation. The type of track I’m illustrating in the video today, a MultiWig track, has its own section over there too. Basically, if you are completely new to this, the “wiggle” style is a way to show a histogram display across a region. MultiWig lets you overlay several of these histograms in one space. In the example I’ll show here, the results of looking at 7 different cell lines are shown for some histone mark signals (Layered H3K27Ac track).

Annotation track cell lines

Annotation track cell lines

When I saw the announcement, I thought this was a good way to show all of the data simultaneously. When we do basic workshops, we don’t always have time to go into the details of this view, although we do explore it in the ENCODE material, because the track I’m using is one of the ENCODE data sets. I’ll use the same track in the same region as the announcement, which is shown here:

stack announcementBut when I first looked at this, I wasn’t sure if the peak–focus on the pink peak that represents the NHLF cell line–was meant to cover the whole area underneath or not. What I was trying to figure out is essentially this (a graphical representation of my thought process follows):


By trying out the various styles I was pretty sure I had the idea of what was really being shown, but I confirmed that with one of the track developers. The value is only the pink band segment, not the whole area below it. And Matthew also noted to me that they are sorting the tracks in reverse alphabetical order (so NHLF is the highest in the stack). That was an aspect I hadn’t realized yet. They are not sorting based on the values at that spot. This makes sense, of course, but it wasn’t obvious to me at first.

I like this option very much–but I figured if I had to do some noodling on what it actually meant others might have the same questions.

In the video I’ll show you how this segment looks with the different “Overlay method” settings on that track page. I’ll be looking at the SOD1 area, like the announcement example.  I tweaked a couple of the other settings from the defaults so it would be easier to see on the video (see arrowheads for my changes). But I hope this conveys the options you have now to look at this type of track data effectively.

Track settings for videoSo here is the video with the SOD1 5′ region in the center, using the 4 different choices of overlay method, illustrating the histone mark data in the 7 cell lines. I’m not going into the details of the data here, but I’ll point you to a reference associated with this work for more on how it’s done–see the Bernstein lab paper below.  I wanted to just demonstrate this new type of viewing options that will be available on wiggle tracks. Some tracks will have too much data for one type or another, or will be clearer with one or another style. But now you have an additional way to consider it.

Quick links:

UCSC Genome Browser: genome.ucsc.edu

UCSC Intro tutorial: http://openhelix.com/ucscintro

UCSC Advanced tutorial: http://openhelix.com/ucscadv

These tutorials are freely available because UCSC sponsors us to do training and outreach on the UCSC Genome Browser.


Kent W.J., Zweig A.S., Barber G., Hinrichs A.S. & Karolchik D. (2010). BigWig and BigBed: enabling browsing of large distributed datasets., Bioinformatics (Oxford, England), PMID:

Karolchik D., Barber G.P., Casper J., Clawson H., Cline M.S., Diekhans M., Dreszer T.R., Fujita P.A., Guruvadoo L. & Haeussler M. & (2013). The UCSC Genome Browser database: 2014 update., Nucleic acids research, PMID:

Ram O., Goren A., Amit I., Shoresh N., Yosef N., Ernst J., Kellis M., Gymrek M., Issner R. & Coyne M. & al. Combinatorial patterning of chromatin regulators uncovered by genome-wide location analysis in human cells., Cell, PMID:

The ENCODE Project Consortium, Bernstein B.E., Birney E., Dunham I., Green E.D., Gunter C. & Snyder M. et al. (2012). An integrated encyclopedia of DNA elements in the human genome., Nature, 489 PMID:

Also see the Nature special issue on ENCODE data, especially the chromatin accessibility and histone modification subset (section 02): http://www.nature.com/encode/

Video Tips of the Week, Annual Review 2013 (part 1)

As you may know, we’ve been doing these video tips-of-the-week for SiX years now. We have completed or collected around 300 little tidbit introductions to various resources through this past year, 2013. At first we had to do all of our own video intros, but as the movie technology became more accessible and more teams made their own, we were able to find a lot more that were done by the resource providers themselves. So we began to collect those as well. At the end of the year we’ve established a sort of holiday tradition: we are doing a summary post to collect them all. If you have missed any of them it’s a great way to have a quick look at what might be useful to your work.

You can see past years’ tips here: 2008 I, 2008 II, 2009 I, 2009 II, 2010 I, 2010 II, 2011 I, 2011 II, 2012 I, 2012 II, 2013 II (next week).

Annual Review VI:

January 2013:
January 2: Annual Review V part deux
January 9: The New and Improved OMIM®
January 16: InSilico DB
January 23: ZooBank and species nomenclature
January 30: ScienceGameCenter #edtech

February 2013:
February 6: MotifLab workbench for TFBS analysis
February 13: UCSC Genome Browser restriction enzyme display
February 20: ENCODE Data at UCSC (reminder)
February 27: NetGestalt

March 2013:
March 6: NCBI Genomics Workbench
March 13: FlyBase
March 20: figshare + GenoCAD = outreach
March 27: Enzyme Portal and User-Centered Design

April 2013:
April 3: Phytozome and the Peach Genome
April 10: Introductory Cheminformatics
April 17: Sharing H7N9 data at GISAID.org with EpiFlu™
April 24: Cancer Atlas roadmap

May 2013:
May 1: My Cancer Genome
May 8: Transfac (and HGMD, Proteome, etc)
May 15: Influenza Research Database (IRD)
May 22: Canary Database for sentinels of human health
May 29: QIIME for Quantitative Insights Into Microbial Ecology

June 2013:
June 5: Prezi and other nonlinear presentation methods
June 12: TrioVis for family genome data sets
June 19: ENCODE ChIP-Seq Significance Tool
June 26: InnateDB, Systems Biology of the Innate Immune Response

VideoTip of the Week: ENCODE @ Ensembl

We have a lot of tutorials (2 in fact, ENCODE Foundations & ENCODE @ UCSC), tips and information about ENCODE. We also have a lot of tutorials (again 2, Ensembl and Ensembl Legacy- on the older versions ), tips and information about Ensembl, the database and browser at EBI.

Now here is a tip of the week on both Ensembl AND ENCODE. This is one of the more recent additions to Ensembl’s video tutorials. This video looks at how to identify sequences that may be involved in gene regulation. Most of this data at Ensembl is based on ENCODE data. This is using the “Matrix,” a way to select the regulation data you need based on cell types and TF’s. At the end of the 8 minute video they discuss a bit more about how to get all ENCODE data.

So, now you have a wealth of information here at OpenHelix through our tutorials and our blog about ENCODE and Ensembl.

Quick links:

ENCODE: http://encodeproject.org/ENCODE/
ENCODE @ UCSC: http://genome.ucsc.edu/ENCODE/
Ensembl: http://www.ensembl.org
ENCODE Tutorials: http://openhelix.com/encode
Ensembl Tutorials: http://openhelix.com/cgi/tutorialInfo.cgi?id=95

Video Tip of the Week: ENCODE ChIP-Seq Significance Tool

We’ve been doing training and workshops on the UCSC Genome Browser for 10 years now. It’s a tremendous tool that has to be a foundational item in your toolkit in genomics. But–there may be times when you want to examine some of the data that you can find there in another way, with a different focus or emphasis. It might be possible to craft some clever Table Browser queries that get you what you want. Sometimes, though, someone else has created a way for you to query the underlying data for a topic that could be useful too. And today’s tip of the week is exactly this kind of tool. A web interface to query the ENCODE data that resides in the UCSC Genome Browser, with a focus on finding transcription factors with enriched binding in a region that you might be interested in exploring. Today’s video tip is for the ENCODE ChIP-Seq Significance Tool.

There’s a ton of great data that flowed into the UCSC Genome Browser as part of the ENCODE project. It’s going to provide years of mining for biologists. What would be great is for biomedical researchers who have interest in specific genes–or sets of genes–to take a look at the ENCODE data to see if they can unearth some useful insights about the regulation of these genes or lists of genes. You can use the ChIP-Seq Significance tool to sift through the data.

The video that the Butte lab team did is very nice. Very specific guidance on how to use their tool–what to choose for the menu options, what the choices are, and what to expect from the results. Here’s their video:

Of course you should read their paper about this tool for the background you need (linked below), and the references that will also help you to understand what this tool offers. You should also read up on the associated ENCODE data. The supplement with the paper is also nicely written in clear language to help you to understand the features.

One of the things I was curious about was whether this might be extended to the mouse data too. One thing that people grouse to me about is that ENCODE is cell line data, and tissue data would really be great. But I saw discussion at Stephen Turner’s blog (read the comments) about the focus on human for now. There was also discussion of the CScan tool, though, which does cover the mouse data. So if this is a tool you are interested in, you might want to explore CScan too.

Hat tip to Stephen Turner for the awareness:

Quick links:

ENCODE ChIP-Seq Significance Tool: http://encodeqt.stanford.edu/

CScan: http://www.beaconlab.it/cscan


Auerbach, R., Chen, B., & Butte, A. (2013). Relating Genes to Function: Identifying Enriched Transcription Factors using the ENCODE ChIP-Seq Significance Tool Bioinformatics DOI: 10.1093/bioinformatics/btt316

Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…


Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…