Category Archives: Genomics Resource News

Video Tip of the Week: Nowomics, set up alert feeds for new data

Yeah, I know you know. There’s a lot of genomics and proteomics data coming out every day–some of it in the traditional publication route, but some of it isn’t–and it’s only getting harder and harder to wrangle the useful information to access the signal from the noise.  I can remember when merely looking through the (er, paper-based) table of contents of Cell and Nature would get me up to speed for a week. But increasingly, the data I need isn’t even coming through the papers.

Like everyone else, I have a variety of strategies to keep notified of different things I need to see. I use the MyNCBI stored searches to keep me posted on things that come from via the NCBI system. I signed up for the OMIM new “MIM-Match” service as well. But there’s still a lot of room for new ways to collect and filter new data and information. Today’s tip focuses on a service to do that: Nowomics. This is a freely available tool to help you keep track of important new data. Here’s a quick video overview of how to see what’s going on with Nowomics.

The goal of Nowomics is to offer you an actively updated feed of relevant information on genes or topics of interest, using text mining and ontology term harvesting from a range of sources. What makes them different from MyNCBI or OMIM is the range and types of data sources they use. The user sets up some genes or Gene Ontology terms to “follow”, and the software regularly checks for changes in the source sites. You can go in an look at your feed, you can filter it for different types of data, and you can see what’s new (“latest”) or what’s being hotly chattered about (“popular”) using Altmetric strategies. For example, here’s a paper that people seemed to find worth talking about, based on the tweets and the Mendeley occurrences.

example_paper This tool is in early stages of development–if there are features you’d like to see or other sources you’d think are useful, the Nowomics team is eager for feedback. You can find a link to contact them over at their site, or locate them on Facebook and Twitter. You can also learn more from their blog. You can also learn more about the philosophy and foundations of Nowomics from their slide presentation below.

 

Quick links:

Nowomics: http://nowomics.com/

Example gene feed: http://nowomics.com/gene/human/BRCA2

References:

Acland A., T. Barrett, J. Beck, D. A. Benson, C. Bollin, E. Bolton, S. H. Bryant, K. Canese, D. M. Church & K. Clark & (2014). Database resources of the National Center for Biotechnology Information, Nucleic Acids Research, 42 (D1) D7-D17. DOI: http://dx.doi.org/10.1093/nar/gkt1146

Online Mendelian Inheritance in Man, OMIM®. McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University (Baltimore, MD), July 22 2014. World Wide Web URL: http://omim.org/

New tools at Reactome–check ‘em out

Just got this from the Reactome announcement mailing list:

Pathway databases, like Reactome, are uniquely suited for interpreting the results of high-throughput functional genomics data sets such as microarray-based expression profiles, protein interaction sets, and chromatin IP. In response to user feedback and new feature requests, we have released a new Reactome Pathway Browser with an integrated suite of tools for pathway analysis. Using these improved features, you can map protein lists to Reactome pathways, perform pathway overrepresentation analysis for a set of genes, colourize pathway diagrams with gene expression data, and compare model organism and human pathways. To support third-party tool integration, the Reactome Pathway Analysis Portal is also available via RESTful web services. Further details about the new pathway analysis tool can be found in our User Guide.…[see more details and contact info at the mailing list page]

Mapping gene and protein lists to pathways is a frequently-requested feature in pretty much every workshop we give–so have a look and see if it would help you to manage lists and do some discovery on them.

Quick link: http://www.reactome.org/PathwayBrowser/

Video Tip of the Week: New UCSC “stacked” wiggle track view

This week’s video tip shows you a new way to look at the multiWig track data at the UCSC Genome Browser. A new option has recently been released (see 06 May 2014), a “stacked” view, and it’s a handy way to look at the data with a new strategy. But I’ll admit it took me a little while of working with it to understand the details. So in this tip I hope you’ll see what the new visualization offers.

I won’t go into the background on the many types of annotation tracks available–if you need to be introduced to the idea of the basic track views, start out with our introduction tutorial that touches on the different types of graphical representations. Custom tracks are touched on in the advanced tutorial. For guidance specifically how to create the different track types, see the UCSC documentation. The type of track I’m illustrating in the video today, a MultiWig track, has its own section over there too. Basically, if you are completely new to this, the “wiggle” style is a way to show a histogram display across a region. MultiWig lets you overlay several of these histograms in one space. In the example I’ll show here, the results of looking at 7 different cell lines are shown for some histone mark signals (Layered H3K27Ac track).

Annotation track cell lines

Annotation track cell lines

When I saw the announcement, I thought this was a good way to show all of the data simultaneously. When we do basic workshops, we don’t always have time to go into the details of this view, although we do explore it in the ENCODE material, because the track I’m using is one of the ENCODE data sets. I’ll use the same track in the same region as the announcement, which is shown here:

stack announcementBut when I first looked at this, I wasn’t sure if the peak–focus on the pink peak that represents the NHLF cell line–was meant to cover the whole area underneath or not. What I was trying to figure out is essentially this (a graphical representation of my thought process follows):

stackedMultiWig_screenshot_v2

By trying out the various styles I was pretty sure I had the idea of what was really being shown, but I confirmed that with one of the track developers. The value is only the pink band segment, not the whole area below it. And Matthew also noted to me that they are sorting the tracks in reverse alphabetical order (so NHLF is the highest in the stack). That was an aspect I hadn’t realized yet. They are not sorting based on the values at that spot. This makes sense, of course, but it wasn’t obvious to me at first.

I like this option very much–but I figured if I had to do some noodling on what it actually meant others might have the same questions.

In the video I’ll show you how this segment looks with the different “Overlay method” settings on that track page. I’ll be looking at the SOD1 area, like the announcement example.  I tweaked a couple of the other settings from the defaults so it would be easier to see on the video (see arrowheads for my changes). But I hope this conveys the options you have now to look at this type of track data effectively.

Track settings for videoSo here is the video with the SOD1 5′ region in the center, using the 4 different choices of overlay method, illustrating the histone mark data in the 7 cell lines. I’m not going into the details of the data here, but I’ll point you to a reference associated with this work for more on how it’s done–see the Bernstein lab paper below.  I wanted to just demonstrate this new type of viewing options that will be available on wiggle tracks. Some tracks will have too much data for one type or another, or will be clearer with one or another style. But now you have an additional way to consider it.

Quick links:

UCSC Genome Browser: genome.ucsc.edu

UCSC Intro tutorial: http://openhelix.com/ucscintro

UCSC Advanced tutorial: http://openhelix.com/ucscadv

These tutorials are freely available because UCSC sponsors us to do training and outreach on the UCSC Genome Browser.

References:

Kent W.J., Zweig A.S., Barber G., Hinrichs A.S. & Karolchik D. (2010). BigWig and BigBed: enabling browsing of large distributed datasets., Bioinformatics (Oxford, England), PMID:

Karolchik D., Barber G.P., Casper J., Clawson H., Cline M.S., Diekhans M., Dreszer T.R., Fujita P.A., Guruvadoo L. & Haeussler M. & (2013). The UCSC Genome Browser database: 2014 update., Nucleic acids research, PMID:

Ram O., Goren A., Amit I., Shoresh N., Yosef N., Ernst J., Kellis M., Gymrek M., Issner R. & Coyne M. & al. Combinatorial patterning of chromatin regulators uncovered by genome-wide location analysis in human cells., Cell, PMID:

The ENCODE Project Consortium, Bernstein B.E., Birney E., Dunham I., Green E.D., Gunter C. & Snyder M. et al. (2012). An integrated encyclopedia of DNA elements in the human genome., Nature, 489 PMID:

Also see the Nature special issue on ENCODE data, especially the chromatin accessibility and histone modification subset (section 02): http://www.nature.com/encode/

BioMart news, and a shiny new look

Just got the news via the mailing list, I haven’t had a chance to kick the tires yet:

We are pleased to announce the release of BioMart version 0.9.

The latest version of BioMart includes support for data analysis and visualisation tools. The first of the BioMart tools has already been implemented and is accessible from www.biomart.org. This tool enables enrichment analysis of genes in all Ensembl species and a broad range of gene identifiers for each species are also available. Furthermore, the tool supports cross-species analysis using Ensembl homology data. Finally, the enrichment tool facilitates analysis of BED files containing genomic features such as Copy Number Variations (CNVs) or Differentially Methylated Regions (DMRs).

The latest BioMart release comes with the new version of the REST and SOAP APIs. These APIs are available for testing at central.biomart.org. Third party developers who are currently using REST or SOAP version 0.7 are encouraged to start testing and transitioning to 0.9. The two servers providing access to BioMart data through REST and SOAP (version 0.7 and version 0.9) will be running in parallel to provide support for easy transition. The Enrichment tool is also accessible programmatically through 0.9 REST/SOAP interface.

Finally, the BioMart website has been completely redesigned to cater for a better user experience. The re-organised layout, incorporation of new functionality, such as the “quick tool access” and the use of subtle animation makes for clearer navigation and greater site interactivity.

Your feedback is welcome and appreciated.

On behalf of the BioMart developers

Arek

Check it out: www.biomart.org

Heartbleed security issues, we’re ok

We’ve been tracking the concerns about the Heartbleed security issues, as has everyone with an internet login anywhere. And the actual depth of the issue continues to be discussed and disputed. Also XKCD:

Most people who read our blog, or access the free materials, haven’t had to register anyway so there wasn’t an issue with those. But we have checked with our development team to see if we were affected by the security flaw for our registered users.

We are told that we are unaffected by this vulnerability on our registration-accessible pages. So although it is always wise to change passwords from time-to-time, we won’t be requiring that for our registered users. Feel free to do so though if you want to. Let us know if you have any problems with that.

A fix was implemented for the Google Wallet checkout feature that some people might have used, and it’s already in place.

Safe travels around the ‘tubz.

New UCSC Genome Browser for the newest human genome assembly

Most folks who read this blog will be aware that a new human genome assembly has been completed, released, and is available for anyone to obtain. One of my favorite overviews of that new version can be found in this readable piece at Bio-IT World: Deanna Church on the Reference Genome Past, Present and Future. That should give you an idea of some of the context and the changes that you might encounter when you begin to work with the new version.

The folks who use genome assemblies in their software will be updating over time. It can take a while for all of the features you want to be mapped to the new assembly, and this will vary by project. At the end of last week, though, we were notified on the UCSC Announcement mailing list that there is a preliminary browser available with the hg38 assembly. Here’s a quick look at that, with a couple the key features highlighted:

ucsc_newHG38_annotated_sm

Note that calling it hg38 is a big change–we had been on hg19–but now to coordinate with the system of the GRC (Genome Reference Consortium) those numbers will match. And as this is a preliminary browser, you’ll see that there aren’t many annotation tracks available yet. For many things you’ll still want to use the hg19 assembly. The annotation tracks you need will be added as soon as possible. As the announcement notes:

There’s much more to come! This initial release of the hg38 Genome Browser provides a rudimentary set of annotations. Many of our annotations rely on data sets from external contributors (such as our popular SNPs tracks) or require massive computational effort (our
comparative genomics tracks). In the upcoming months/years, we will release many more annotation tracks as they become available. To stay abreast of new datasets, join our genome-announce mailing list or follow us on twitter [@GenomeBrowser].

There are a number of other important changes too, which aren’t obvious from the interface. You should have a look at the full announcement email text to understand the impacts. There are aspects of not only the naming convention, but alternate sequences, centromere representation, mitochondrial genome sequence, sequence updates to fix previous erroneous bases and misassembled regions, and other aspects that could affect your work. Then go kick the tires!

You may also want to have a look at the publication in the NAR Database issue that describes other features that may have been updated since the last time you were diving into a new assembly. There are more species–alligators?!–and more types of tracks than you might be aware of if you just rely on the same stuff most of the time. There’s also the cool hub tools now that provide new ways to load up your own project data. Go forth and discover.

Reference:

Karolchik D., Barber G.P., Casper J., Clawson H., Cline M.S., Diekhans M., Dreszer T.R., Fujita P.A., Guruvadoo L. & Haeussler M. & (2013). The UCSC Genome Browser database: 2014 update, Nucleic Acids Research, 42 (D1) D764-D770. DOI:

“We BLATted the Internet!”

Best sentence I’ve seen today. Heh.

blat

I’d be interested in the answer to Laura’s question too!

Here’s more detail from the “announcement” mailing list:

All the DNA on the internet now at your fingertips!

Hello everyone!

We’re pleased to announce the release of the Web Sequences track on the UCSC Genome Browser. This track, produced in collaboration with Microsoft Research, contains the results of a 30-day scan for DNA sequences from over 40 billion different webpages. The sequences were then mapped with Blat to the human genome (hg19) and numerous other species including mouse (mm9), rat (rn4), and zebrafish (danRer7). The data were extracted from a variety of sources including patents, online textbooks, help forums, and any other webpages that contain DNA sequence. In essence, this track displays the Blat alignments of nearly every DNA sequence on the internet! The Web Sequences track description page contains more details on how the track was generated.

We would like to acknowledge Max Haeussler and Matt Speir from the UCSC Genome Browser staff and Bob Davidson from Microsoft Research for their hard work in creating this track.


Matthew Speir
UCSC Genome Bioinformatics Group

If you are looking for the track, it’s in the Phenotype and Literature section in human:
web_seqs_noteI took a quick look and it’s definitely a mixed bag–patents and homework sites, and journals and such. But I think it will be interesting to see what turns up.

Edit: some other finds–lots of non-English pages, so I can’t tell what they are. I have seen Japanese, Chinese, and Korean so far. Saw a link to Fark.com (heh). Slideshare. Some pages are borked and don’t load. Some require logins (medscape). Could be a good source of PDFs that you can’t get elsewhere (*cough*).

Video Tip of the Week: UCSC Track Hubs

Over the years we’ve seen some real shifts in the needs of the trainees in our UCSC Genome Browser workshops. At first, people just needed access to the reference genome and the data that was available (and boy, has that changed over the years–time travel back with this post!). But as researchers around the world had access to more and bigger data sets that they were generating themselves, they kept asking for ways to load up their own data into the browser framework to explore their data with more context and with existing tools like the Table Browser for more queries.

We relayed this back to the UCSC team, and we know they were hearing it from other sources too. And besides the initial basic custom tracks that had been available, more ways to load up bigger data sets (the big-formats) and related tracks (super-tracks) became options. The biggest change, though, came with the Track Hubs. Now you aren’t just loading up a couple of tracks–you can load up complex collections with the hub framework. Check out the existing ones by going to the track hubs button from the Gateway:

Locate the hubs from the Gateway page.

Locate the hubs from the Gateway page.

You can use the public hubs to get a sense of what they can do–adding a lot of new info to a genome with new source data, new technologies, various special projects, etc. You can create your own Track Hubs to explore and query within existing assemblies. Or you can create entire new assemblies that UCSC doesn’t already host with Assembly Hubs. I touch on the basics of this in this week’s video tip.

But this was just a taste–click through to the documentation and wiki pages to learn more about the specifics of using these tools.

Hubs have been available for a little while and some early adopters have tried them out–like Pierre Lindenbaum did here–but they were the subject of much praise and chatter recently with the publication of a paper with more detail for people new to the hub functions. And a separate paper gave more detail on extending Assembly Hubs–that let you load up entirely new comparative genome data. You can create “snake tracks” that help you visualize comparisons, inversions, duplications, etc. And you don’t have to create a local mirror for these things.

Now, there may be some times where a local mirror is still your best bet. A lot of the places we talked to researchers were hospital research settings. If you have patient data that shouldn’t go outside of your firewall, for example, you may still want everything in-house. But for a lot of projects the Track Hubs can take care of the overhead for you instead.

You can use the hubs for your own work. Or if you have a collection of data that you want to offer to the wider community you can request that to the UCSC folks and possibly get that on to the public hubs page.

Check out the publications below for a grasp of the foundations–and also see the new directions that are being explored. Hubs are actually more flexible than just layering on new annotations, and you can read more about that too.

If you aren’t familiar with UCSC Genome Browser basics that you need to really understand the annotation tracks foundations, be sure to see the freely available materials that UCSC sponsors us to provide: Introduction and Advanced tutorials.

Quick links:

Track Data Hubs documentation: http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html

Assembly Hubs wiki page: http://genomewiki.ucsc.edu/index.php/Assembly_Hubs

References:
Raney B.J., Dreszer T.R., Barber G.P., Clawson H., Fujita P.A., Wang T., Nguyen N., Paten B., Zweig A.S. & Karolchik D.  (2013). Track data hubs enable visualization of user-defined genome-wide annotations on the UCSC Genome Browser, Bioinformatics, DOI:

Karolchik D., Barber G.P., Casper J., Clawson H., Cline M.S., Diekhans M., Dreszer T.R., Fujita P.A., Guruvadoo L. & Haeussler M.  (2013). The UCSC Genome Browser database: 2014 update, Nucleic Acids Research, DOI:

Nguyen N., Hickey G., Raney .BJ., Armstrong J., Clawson H., Zweig A., Kent J., Haussler D., Paten B. (2013). Comparative Assembly Hubs: Web Accessible Browsers for Comparative Genomics. arXiv:1311.1241v1 [q-bio.GN]

Antique (or maybe “classic”) UCSC Genome Browsers retired

Everyone is pretty comfortable with the concept of the non-standard but commonly used dog-years as a way to compare life spans to humans. Car enthusiasts have a taxonomy for the ages of vehicles. But I’ve been sitting here wondering what the genome-browser-years scale should be. I’ve been thinking about it because of the recent announcement over the UCSC list the other day:

Change in visualization access for old assemblies

Hello everyone!

Over the past 12 years we have made efforts to maintain visualization of many old, archived assemblies on our Archive server at http://genome-archive.cse.ucsc.edu/, in addition to providing download access to the associated data sets. Unfortunately, this visualization is no longer sustainable for very old assemblies due to the many changes in the Genome Browser software as it has matured. We are therefore reducing access to certain old assemblies to data downloads only, and are announcing the shutdown of our Archive server. We will continue to provide Genome Browser access for the 4 most current human assemblies, and at least the 2 most current assemblies for all other organisms with some exceptions. We will discontinue our visualization support for all other old assemblies, but will continue to make these data sets available on our download servers. The assemblies currently on our archive server for which we have discontinued visualization support include early human assembly drafts, hg4, hg5, hg6, hg7, hg8, hg10, hg11, hg12, hg13, hg15, rn1, rn2, mm1, mm2, mm3, mm4, mm5, rheMac1, bosTau1, ce1, danRer1, and danRer2. Links to the data and annotations associated with these assemblies have been added in the appropriate places on our Downloads page at http://hgdownload.soe.ucsc.edu/. Please contact us if you have difficulty locating a data set of interest.

Matthew Speir
UCSC Genome Bioinformatics Group

A view of the UCSC Genome Browser in the early days, ~2004.

A view of the UCSC Genome Browser in the early days, ~2004.

I’m sure the browser versions aren’t used very much anymore, but it was something I needed to be aware of. In our workshops I mention that the older versions have been available from the “archives” navigation on the landing page, but that will be gone now. Occasionally there are old papers that reference a genomic span that you want to revisit–but that’s becoming less common for those really old assemblies at this point. The data will persist for downloading, but the browser visuals will be gone. But it made me want to go back and look through my old materials to see what the early browsers looked like (click on the image to embiggen). A lot of the foundational structure is the same, but if you look at one of these old assemblies you might be surprised.  A lot fewer tracks, that’s for sure. In my shot, there are only a few species in the Conservation track (human, chimp, mouse, rat, chicken). We just didn’t have that much data available–not just across species, but other types of techniques and tools. Fewer tracks. Fewer functionality buttons.

This was almost as much fun as looking back at the old NCBI interfaces that I remember from way back. Those of you who have been in this rodeo for a while may remember those. Some of you will even remember the key “Pedro’s Tools” from back in the day. Sometimes it’s worth looking back at where we came from to realize how much further we are than we realized. I know there’s a lot of grousing about not having cured cancer and changed the pharmaceutical industry with the human genome sequence yet–but we really haven’t had the data that long in browser-years. Or maybe it’s more like Mars years–longer than you realize, despite the fairly comparable span of a single day. The arc of science is long, but it bends toward answers.

Edit: I realized I should look at the earliest paper and add that reference below. Check out Figure 1 for an even older view of the data.

Reference:

Kent W.J., Sugnet C.W., Furey T.S., Roskin K.M., Pringle T.H., Zahler A.M. & Haussler D. (2002). The Human Genome Browser at UCSC, Genome Res., 12 996-1006. DOI: