UCSC Genome Bioinformatics

Video Tip of the Week: UCSC features for ENCODE data utilization

UCSC Genome BioinformaticsAs noted in last week’s tip about the ENCODE DCC at Stanford, there was a workshop recently for the ENCODE project. There were a lot of folks speaking and a big room full of attendees. You should check out the full agenda and the playlist at the NHGRI site for all the videos, slides, and handouts: ENCODE 2015: Research Applications and Users Meeting.

This week I’m highlighting another video from this event. In this one, Pauline Fujita from the UCSC Genome Browser covers ways to work with ENCODE data in their browser.

Some of the talk includes intro stuff for brand new users, because there were certainly some in this workshop. If you are new to the tools, too, you can also see our free tutorial suites (below). Pauline also quickly highlights their Genome Browser in a Box virtual machine option for folks who have privacy sensitive or protected data, but only briefly. If you want some more info on that, check out our Tip of the Week on GBIB.

But soon she covered more detail on features like track hubs and how to use those (if you wanted to jump to that part, it begins around 20min). That extra search for items in the Track Hub is really good to know about. file_formats_helpAlso, there’s some guidance here on the types of file formats that you may want to use to structure your data. Also why you want BED vs Wiggle, for example. For the part that addresses these formats, jump to about 33min.

Towards the end there’s coverage of the Data Integrator. The idea with this feature is that maybe you’ve got some information on a region and you have this structured as a BED file–or a number of regions–and you want to find out what else is going on in those regions. The Data Integrator can help you with that by finding overlaps among different tracks of data (around 45min). The Variant Annotation Integrator does kind of a similar thing, but for VCF files with variation information (~48min). A smidge more guidance on track hubs comes in at 50min.

In our paper for Current Protocols (which is now in PubMedCentral), we talk a bit about the hubs structure too. So if it runs too quickly at the end, our paper shows some of that detail pretty much the same way. That might help you to think about how to structure them if the concept is new to you. But if you are ready to dive in, there’s a paper specifically about hubs. And there’s also more background on the browser’s tools and in the NAR database issue papers. There’s a lot of ENCODE data available to mine, and I really hope more folks can use the tools to find new insights into genomic regions they are interested in.

Quick links:

Track hubs: http://genome.ucsc.edu/cgi-bin/hgHubConnect

Data Integrator: http://genome.ucsc.edu/cgi-bin/hgIntegrator

Variant Annotation Integrator: http://genome.ucsc.edu/cgi-bin/hgVai

ENCODE features at UCSC: http://genome.ucsc.edu/ENCODE

UCSC tutorial suites:

UCSC Intro Tutorial suites (video, with our free slides + exercises): http://www.openhelix.com/ucscintro

UCSC Advanced Tutorial suites (video, slides, exercises): http://www.openhelix.com/ucscadv

References:

Mangan ME, Williams JM, Kuhn RM, & Lathe WC (2014). The UCSC Genome Browser: What Every Molecular Biologist Should Know Current Protocols in Molecular Biology., 107 (19.9), 199-199 DOI: 10.1002/0471142727.mb1909s107

Rosenbloom, K., Armstrong, J., Barber, G., Casper, J., Clawson, H., Diekhans, M., Dreszer, T., Fujita, P., Guruvadoo, L., Haeussler, M., Harte, R., Heitner, S., Hickey, G., Hinrichs, A., Hubley, R., Karolchik, D., Learned, K., Lee, B., Li, C., Miga, K., Nguyen, N., Paten, B., Raney, B., Smit, A., Speir, M., Zweig, A., Haussler, D., Kuhn, R., & Kent, W. (2014). The UCSC Genome Browser database: 2015 update Nucleic Acids Research, 43 (D1) DOI: 10.1093/nar/gku1177

Raney, B., Dreszer, T., Barber, G., Clawson, H., Fujita, P., Wang, T., Nguyen, N., Paten, B., Zweig, A., Karolchik, D., & Kent, W. (2013). Track data hubs enable visualization of user-defined genome-wide annotations on the UCSC Genome Browser Bioinformatics, 30 (7), 1003-1005 DOI: 10.1093/bioinformatics/btt637

Disclosure: UCSC Genome Browser tutorials are freely available because UCSC sponsors us to do training and outreach on the UCSC Genome Browser.