I’m sure the recordings will be available later, though, if you come across this at a later date.
Edit after session were done: I really enjoyed this. Having all these wicked smaht folks discussing ways to get to the future was really useful. I’ll post an additional note when I see the videos are up.
Yeah, I know you know. There’s a lot of genomics and proteomics data coming out every day–some of it in the traditional publication route, but some of it isn’t–and it’s only getting harder and harder to wrangle the useful information to access the signal from the noise. I can remember when merely looking through the (er, paper-based) table of contents of Cell and Nature would get me up to speed for a week. But increasingly, the data I need isn’t even coming through the papers.
Like everyone else, I have a variety of strategies to keep notified of different things I need to see. I use the MyNCBI stored searches to keep me posted on things that come from via the NCBI system. I signed up for the OMIM new “MIM-Match” service as well. But there’s still a lot of room for new ways to collect and filter new data and information. Today’s tip focuses on a service to do that: Nowomics. This is a freely available tool to help you keep track of important new data. Here’s a quick video overview of how to see what’s going on with Nowomics.
The goal of Nowomics is to offer you an actively updated feed of relevant information on genes or topics of interest, using text mining and ontology term harvesting from a range of sources. What makes them different from MyNCBI or OMIM is the range and types of data sources they use. The user sets up some genes or Gene Ontology terms to “follow”, and the software regularly checks for changes in the source sites. You can go in an look at your feed, you can filter it for different types of data, and you can see what’s new (“latest”) or what’s being hotly chattered about (“popular”) using Altmetric strategies. For example, here’s a paper that people seemed to find worth talking about, based on the tweets and the Mendeley occurrences.
This tool is in early stages of development–if there are features you’d like to see or other sources you’d think are useful, the Nowomics team is eager for feedback. You can find a link to contact them over at their site, or locate them on Facebook and Twitter. You can also learn more from their blog. You can also learn more about the philosophy and foundations of Nowomics from their slide presentation below.
Acland A., T. Barrett, J. Beck, D. A. Benson, C. Bollin, E. Bolton, S. H. Bryant, K. Canese, D. M. Church & K. Clark & (2014). Database resources of the National Center for Biotechnology Information, Nucleic Acids Research, 42 (D1) D7-D17. DOI: http://dx.doi.org/10.1093/nar/gkt1146
Online Mendelian Inheritance in Man, OMIM®. McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University (Baltimore, MD), July 22 2014. World Wide Web URL: http://omim.org/
Pathway databases, like Reactome, are uniquely suited for interpreting the results of high-throughput functional genomics data sets such as microarray-based expression profiles, protein interaction sets, and chromatin IP. In response to user feedback and new feature requests, we have released a new Reactome Pathway Browser with an integrated suite of tools for pathway analysis. Using these improved features, you can map protein lists to Reactome pathways, perform pathway overrepresentation analysis for a set of genes, colourize pathway diagrams with gene expression data, and compare model organism and human pathways. To support third-party tool integration, the Reactome Pathway Analysis Portal is also available via RESTful web services. Further details about the new pathway analysis tool can be found in our User Guide.…[see more details and contact info at the mailing list page]
Mapping gene and protein lists to pathways is a frequently-requested feature in pretty much every workshop we give–so have a look and see if it would help you to manage lists and do some discovery on them.
This week’s video tip shows you a new way to look at the multiWig track data at the UCSC Genome Browser. A new option has recently been released (see 06 May 2014), a “stacked” view, and it’s a handy way to look at the data with a new strategy. But I’ll admit it took me a little while of working with it to understand the details. So in this tip I hope you’ll see what the new visualization offers.
I won’t go into the background on the many types of annotation tracks available–if you need to be introduced to the idea of the basic track views, start out with our introduction tutorial that touches on the different types of graphical representations. Custom tracks are touched on in the advanced tutorial. For guidance specifically how to create the different track types, see the UCSC documentation. The type of track I’m illustrating in the video today, a MultiWig track, has its own section over there too. Basically, if you are completely new to this, the “wiggle” style is a way to show a histogram display across a region. MultiWig lets you overlay several of these histograms in one space. In the example I’ll show here, the results of looking at 7 different cell lines are shown for some histone mark signals (Layered H3K27Ac track).
Annotation track cell lines
When I saw the announcement, I thought this was a good way to show all of the data simultaneously. When we do basic workshops, we don’t always have time to go into the details of this view, although we do explore it in the ENCODE material, because the track I’m using is one of the ENCODE data sets. I’ll use the same track in the same region as the announcement, which is shown here:
But when I first looked at this, I wasn’t sure if the peak–focus on the pink peak that represents the NHLF cell line–was meant to cover the whole area underneath or not. What I was trying to figure out is essentially this (a graphical representation of my thought process follows):
By trying out the various styles I was pretty sure I had the idea of what was really being shown, but I confirmed that with one of the track developers. The value is only the pink band segment, not the whole area below it. And Matthew also noted to me that they are sorting the tracks in reverse alphabetical order (so NHLF is the highest in the stack). That was an aspect I hadn’t realized yet. They are not sorting based on the values at that spot. This makes sense, of course, but it wasn’t obvious to me at first.
I like this option very much–but I figured if I had to do some noodling on what it actually meant others might have the same questions.
In the video I’ll show you how this segment looks with the different “Overlay method” settings on that track page. I’ll be looking at the SOD1 area, like the announcement example. I tweaked a couple of the other settings from the defaults so it would be easier to see on the video (see arrowheads for my changes). But I hope this conveys the options you have now to look at this type of track data effectively.
So here is the video with the SOD1 5′ region in the center, using the 4 different choices of overlay method, illustrating the histone mark data in the 7 cell lines. I’m not going into the details of the data here, but I’ll point you to a reference associated with this work for more on how it’s done–see the Bernstein lab paper below. I wanted to just demonstrate this new type of viewing options that will be available on wiggle tracks. Some tracks will have too much data for one type or another, or will be clearer with one or another style. But now you have an additional way to consider it.
These tutorials are freely available because UCSC sponsors us to do training and outreach on the UCSC Genome Browser.
Kent W.J., Zweig A.S., Barber G., Hinrichs A.S. & Karolchik D. (2010). BigWig and BigBed: enabling browsing of large distributed datasets., Bioinformatics (Oxford, England), PMID: 20639541
Karolchik D., Barber G.P., Casper J., Clawson H., Cline M.S., Diekhans M., Dreszer T.R., Fujita P.A., Guruvadoo L. & Haeussler M. & (2013). The UCSC Genome Browser database: 2014 update., Nucleic acids research, PMID: 24270787
Ram O., Goren A., Amit I., Shoresh N., Yosef N., Ernst J., Kellis M., Gymrek M., Issner R. & Coyne M. & al. Combinatorial patterning of chromatin regulators uncovered by genome-wide location analysis in human cells., Cell, PMID: 22196736
The ENCODE Project Consortium, Bernstein B.E., Birney E., Dunham I., Green E.D., Gunter C. & Snyder M. et al. (2012). An integrated encyclopedia of DNA elements in the human genome., Nature, 489 PMID: 22955616
Also see the Nature special issue on ENCODE data, especially the chromatin accessibility and histone modification subset (section 02): http://www.nature.com/encode/
Just got the news via the mailing list, I haven’t had a chance to kick the tires yet:
We are pleased to announce the release of BioMart version 0.9.
The latest version of BioMart includes support for data analysis and visualisation tools. The first of the BioMart tools has already been implemented and is accessible from www.biomart.org. This tool enables enrichment analysis of genes in all Ensembl species and a broad range of gene identifiers for each species are also available. Furthermore, the tool supports cross-species analysis using Ensembl homology data. Finally, the enrichment tool facilitates analysis of BED files containing genomic features such as Copy Number Variations (CNVs) or Differentially Methylated Regions (DMRs).
The latest BioMart release comes with the new version of the REST and SOAP APIs. These APIs are available for testing at central.biomart.org. Third party developers who are currently using REST or SOAP version 0.7 are encouraged to start testing and transitioning to 0.9. The two servers providing access to BioMart data through REST and SOAP (version 0.7 and version 0.9) will be running in parallel to provide support for easy transition. The Enrichment tool is also accessible programmatically through 0.9 REST/SOAP interface.
Finally, the BioMart website has been completely redesigned to cater for a better user experience. The re-organised layout, incorporation of new functionality, such as the “quick tool access” and the use of subtle animation makes for clearer navigation and greater site interactivity.
Most people who read our blog, or access the free materials, haven’t had to register anyway so there wasn’t an issue with those. But we have checked with our development team to see if we were affected by the security flaw for our registered users.
We are told that we are unaffected by this vulnerability on our registration-accessible pages. So although it is always wise to change passwords from time-to-time, we won’t be requiring that for our registered users. Feel free to do so though if you want to. Let us know if you have any problems with that.
A fix was implemented for the Google Wallet checkout feature that some people might have used, and it’s already in place.
The folks who use genome assemblies in their software will be updating over time. It can take a while for all of the features you want to be mapped to the new assembly, and this will vary by project. At the end of last week, though, we were notified on the UCSC Announcement mailing list that there is a preliminary browser available with the hg38 assembly. Here’s a quick look at that, with a couple the key features highlighted:
Note that calling it hg38 is a big change–we had been on hg19–but now to coordinate with the system of the GRC (Genome Reference Consortium) those numbers will match. And as this is a preliminary browser, you’ll see that there aren’t many annotation tracks available yet. For many things you’ll still want to use the hg19 assembly. The annotation tracks you need will be added as soon as possible. As the announcement notes:
There’s much more to come! This initial release of the hg38 Genome Browser provides a rudimentary set of annotations. Many of our annotations rely on data sets from external contributors (such as our popular SNPs tracks) or require massive computational effort (our
comparative genomics tracks). In the upcoming months/years, we will release many more annotation tracks as they become available. To stay abreast of new datasets, join our genome-announce mailing list or follow us on twitter [@GenomeBrowser].
There are a number of other important changes too, which aren’t obvious from the interface. You should have a look at the full announcement email text to understand the impacts. There are aspects of not only the naming convention, but alternate sequences, centromere representation, mitochondrial genome sequence, sequence updates to fix previous erroneous bases and misassembled regions, and other aspects that could affect your work. Then go kick the tires!
You may also want to have a look at the publication in the NAR Database issue that describes other features that may have been updated since the last time you were diving into a new assembly. There are more species–alligators?!–and more types of tracks than you might be aware of if you just rely on the same stuff most of the time. There’s also the cool hub tools now that provide new ways to load up your own project data. Go forth and discover.
Karolchik D., Barber G.P., Casper J., Clawson H., Cline M.S., Diekhans M., Dreszer T.R., Fujita P.A., Guruvadoo L. & Haeussler M. & (2013). The UCSC Genome Browser database: 2014 update, Nucleic Acids Research, 42 (D1) D764-D770. DOI: 10.1093/nar/gkt1168
All the DNA on the internet now at your fingertips!
We’re pleased to announce the release of the Web Sequences track on the UCSC Genome Browser. This track, produced in collaboration with Microsoft Research, contains the results of a 30-day scan for DNA sequences from over 40 billion different webpages. The sequences were then mapped with Blat to the human genome (hg19) and numerous other species including mouse (mm9), rat (rn4), and zebrafish (danRer7). The data were extracted from a variety of sources including patents, online textbooks, help forums, and any other webpages that contain DNA sequence. In essence, this track displays the Blat alignments of nearly every DNA sequence on the internet! The Web Sequences track description page contains more details on how the track was generated.
We would like to acknowledge Max Haeussler and Matt Speir from the UCSC Genome Browser staff and Bob Davidson from Microsoft Research for their hard work in creating this track.
UCSC Genome Bioinformatics Group
If you are looking for the track, it’s in the Phenotype and Literature section in human: I took a quick look and it’s definitely a mixed bag–patents and homework sites, and journals and such. But I think it will be interesting to see what turns up.
Edit: some other finds–lots of non-English pages, so I can’t tell what they are. I have seen Japanese, Chinese, and Korean so far. Saw a link to Fark.com (heh). Slideshare. Some pages are borked and don’t load. Some require logins (medscape). Could be a good source of PDFs that you can’t get elsewhere (*cough*).
Over the years we’ve seen some real shifts in the needs of the trainees in our UCSC Genome Browser workshops. At first, people just needed access to the reference genome and the data that was available (and boy, has that changed over the years–time travel back with this post!). But as researchers around the world had access to more and bigger data sets that they were generating themselves, they kept asking for ways to load up their own data into the browser framework to explore their data with more context and with existing tools like the Table Browser for more queries.
We relayed this back to the UCSC team, and we know they were hearing it from other sources too. And besides the initial basic custom tracks that had been available, more ways to load up bigger data sets (the big-formats) and related tracks (super-tracks) became options. The biggest change, though, came with the Track Hubs. Now you aren’t just loading up a couple of tracks–you can load up complex collections with the hub framework. Check out the existing ones by going to the track hubs button from the Gateway:
Locate the hubs from the Gateway page.
You can use the public hubs to get a sense of what they can do–adding a lot of new info to a genome with new source data, new technologies, various special projects, etc. You can create your own Track Hubs to explore and query within existing assemblies. Or you can create entire new assemblies that UCSC doesn’t already host with Assembly Hubs. I touch on the basics of this in this week’s video tip.
But this was just a taste–click through to the documentation and wiki pages to learn more about the specifics of using these tools.
Hubs have been available for a little while and some early adopters have tried them out–like Pierre Lindenbaum did here–but they were the subject of much praise and chatter recently with the publication of a paper with more detail for people new to the hub functions. And a separate paper gave more detail on extending Assembly Hubs–that let you load up entirely new comparative genome data. You can create “snake tracks” that help you visualize comparisons, inversions, duplications, etc. And you don’t have to create a local mirror for these things.
Now, there may be some times where a local mirror is still your best bet. A lot of the places we talked to researchers were hospital research settings. If you have patient data that shouldn’t go outside of your firewall, for example, you may still want everything in-house. But for a lot of projects the Track Hubs can take care of the overhead for you instead.
You can use the hubs for your own work. Or if you have a collection of data that you want to offer to the wider community you can request that to the UCSC folks and possibly get that on to the public hubs page.
Check out the publications below for a grasp of the foundations–and also see the new directions that are being explored. Hubs are actually more flexible than just layering on new annotations, and you can read more about that too.
If you aren’t familiar with UCSC Genome Browser basics that you need to really understand the annotation tracks foundations, be sure to see the freely available materials that UCSC sponsors us to provide: Introduction and Advanced tutorials.
References: Raney B.J., Dreszer T.R., Barber G.P., Clawson H., Fujita P.A., Wang T., Nguyen N., Paten B., Zweig A.S. & Karolchik D. (2013). Track data hubs enable visualization of user-defined genome-wide annotations on the UCSC Genome Browser, Bioinformatics, DOI: 10.1093/bioinformatics/btt637
Karolchik D., Barber G.P., Casper J., Clawson H., Cline M.S., Diekhans M., Dreszer T.R., Fujita P.A., Guruvadoo L. & Haeussler M. (2013). The UCSC Genome Browser database: 2014 update, Nucleic Acids Research, DOI: 10.1093/nar/gkt1168
Nguyen N., Hickey G., Raney .BJ., Armstrong J., Clawson H., Zweig A., Kent J., Haussler D., Paten B. (2013). Comparative Assembly Hubs: Web Accessible Browsers for Comparative Genomics. arXiv:1311.1241v1 [q-bio.GN]