Category Archives: Genomics Resource News

Heartbleed security issues, we’re ok

We’ve been tracking the concerns about the Heartbleed security issues, as has everyone with an internet login anywhere. And the actual depth of the issue continues to be discussed and disputed. Also XKCD:

Most people who read our blog, or access the free materials, haven’t had to register anyway so there wasn’t an issue with those. But we have checked with our development team to see if we were affected by the security flaw for our registered users.

We are told that we are unaffected by this vulnerability on our registration-accessible pages. So although it is always wise to change passwords from time-to-time, we won’t be requiring that for our registered users. Feel free to do so though if you want to. Let us know if you have any problems with that.

A fix was implemented for the Google Wallet checkout feature that some people might have used, and it’s already in place.

Safe travels around the ‘tubz.

New UCSC Genome Browser for the newest human genome assembly

Most folks who read this blog will be aware that a new human genome assembly has been completed, released, and is available for anyone to obtain. One of my favorite overviews of that new version can be found in this readable piece at Bio-IT World: Deanna Church on the Reference Genome Past, Present and Future. That should give you an idea of some of the context and the changes that you might encounter when you begin to work with the new version.

The folks who use genome assemblies in their software will be updating over time. It can take a while for all of the features you want to be mapped to the new assembly, and this will vary by project. At the end of last week, though, we were notified on the UCSC Announcement mailing list that there is a preliminary browser available with the hg38 assembly. Here’s a quick look at that, with a couple the key features highlighted:

ucsc_newHG38_annotated_sm

Note that calling it hg38 is a big change–we had been on hg19–but now to coordinate with the system of the GRC (Genome Reference Consortium) those numbers will match. And as this is a preliminary browser, you’ll see that there aren’t many annotation tracks available yet. For many things you’ll still want to use the hg19 assembly. The annotation tracks you need will be added as soon as possible. As the announcement notes:

There’s much more to come! This initial release of the hg38 Genome Browser provides a rudimentary set of annotations. Many of our annotations rely on data sets from external contributors (such as our popular SNPs tracks) or require massive computational effort (our
comparative genomics tracks). In the upcoming months/years, we will release many more annotation tracks as they become available. To stay abreast of new datasets, join our genome-announce mailing list or follow us on twitter [@GenomeBrowser].

There are a number of other important changes too, which aren’t obvious from the interface. You should have a look at the full announcement email text to understand the impacts. There are aspects of not only the naming convention, but alternate sequences, centromere representation, mitochondrial genome sequence, sequence updates to fix previous erroneous bases and misassembled regions, and other aspects that could affect your work. Then go kick the tires!

You may also want to have a look at the publication in the NAR Database issue that describes other features that may have been updated since the last time you were diving into a new assembly. There are more species–alligators?!–and more types of tracks than you might be aware of if you just rely on the same stuff most of the time. There’s also the cool hub tools now that provide new ways to load up your own project data. Go forth and discover.

Reference:

Karolchik D., Barber G.P., Casper J., Clawson H., Cline M.S., Diekhans M., Dreszer T.R., Fujita P.A., Guruvadoo L. & Haeussler M. & (2013). The UCSC Genome Browser database: 2014 update, Nucleic Acids Research, 42 (D1) D764-D770. DOI:

“We BLATted the Internet!”

Best sentence I’ve seen today. Heh.

blat

I’d be interested in the answer to Laura’s question too!

Here’s more detail from the “announcement” mailing list:

All the DNA on the internet now at your fingertips!

Hello everyone!

We’re pleased to announce the release of the Web Sequences track on the UCSC Genome Browser. This track, produced in collaboration with Microsoft Research, contains the results of a 30-day scan for DNA sequences from over 40 billion different webpages. The sequences were then mapped with Blat to the human genome (hg19) and numerous other species including mouse (mm9), rat (rn4), and zebrafish (danRer7). The data were extracted from a variety of sources including patents, online textbooks, help forums, and any other webpages that contain DNA sequence. In essence, this track displays the Blat alignments of nearly every DNA sequence on the internet! The Web Sequences track description page contains more details on how the track was generated.

We would like to acknowledge Max Haeussler and Matt Speir from the UCSC Genome Browser staff and Bob Davidson from Microsoft Research for their hard work in creating this track.


Matthew Speir
UCSC Genome Bioinformatics Group

If you are looking for the track, it’s in the Phenotype and Literature section in human:
web_seqs_noteI took a quick look and it’s definitely a mixed bag–patents and homework sites, and journals and such. But I think it will be interesting to see what turns up.

Edit: some other finds–lots of non-English pages, so I can’t tell what they are. I have seen Japanese, Chinese, and Korean so far. Saw a link to Fark.com (heh). Slideshare. Some pages are borked and don’t load. Some require logins (medscape). Could be a good source of PDFs that you can’t get elsewhere (*cough*).

Video Tip of the Week: UCSC Track Hubs

Over the years we’ve seen some real shifts in the needs of the trainees in our UCSC Genome Browser workshops. At first, people just needed access to the reference genome and the data that was available (and boy, has that changed over the years–time travel back with this post!). But as researchers around the world had access to more and bigger data sets that they were generating themselves, they kept asking for ways to load up their own data into the browser framework to explore their data with more context and with existing tools like the Table Browser for more queries.

We relayed this back to the UCSC team, and we know they were hearing it from other sources too. And besides the initial basic custom tracks that had been available, more ways to load up bigger data sets (the big-formats) and related tracks (super-tracks) became options. The biggest change, though, came with the Track Hubs. Now you aren’t just loading up a couple of tracks–you can load up complex collections with the hub framework. Check out the existing ones by going to the track hubs button from the Gateway:

Locate the hubs from the Gateway page.

Locate the hubs from the Gateway page.

You can use the public hubs to get a sense of what they can do–adding a lot of new info to a genome with new source data, new technologies, various special projects, etc. You can create your own Track Hubs to explore and query within existing assemblies. Or you can create entire new assemblies that UCSC doesn’t already host with Assembly Hubs. I touch on the basics of this in this week’s video tip.

But this was just a taste–click through to the documentation and wiki pages to learn more about the specifics of using these tools.

Hubs have been available for a little while and some early adopters have tried them out–like Pierre Lindenbaum did here–but they were the subject of much praise and chatter recently with the publication of a paper with more detail for people new to the hub functions. And a separate paper gave more detail on extending Assembly Hubs–that let you load up entirely new comparative genome data. You can create “snake tracks” that help you visualize comparisons, inversions, duplications, etc. And you don’t have to create a local mirror for these things.

Now, there may be some times where a local mirror is still your best bet. A lot of the places we talked to researchers were hospital research settings. If you have patient data that shouldn’t go outside of your firewall, for example, you may still want everything in-house. But for a lot of projects the Track Hubs can take care of the overhead for you instead.

You can use the hubs for your own work. Or if you have a collection of data that you want to offer to the wider community you can request that to the UCSC folks and possibly get that on to the public hubs page.

Check out the publications below for a grasp of the foundations–and also see the new directions that are being explored. Hubs are actually more flexible than just layering on new annotations, and you can read more about that too.

If you aren’t familiar with UCSC Genome Browser basics that you need to really understand the annotation tracks foundations, be sure to see the freely available materials that UCSC sponsors us to provide: Introduction and Advanced tutorials.

Quick links:

Track Data Hubs documentation: http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html

Assembly Hubs wiki page: http://genomewiki.ucsc.edu/index.php/Assembly_Hubs

References:
Raney B.J., Dreszer T.R., Barber G.P., Clawson H., Fujita P.A., Wang T., Nguyen N., Paten B., Zweig A.S. & Karolchik D.  (2013). Track data hubs enable visualization of user-defined genome-wide annotations on the UCSC Genome Browser, Bioinformatics, DOI:

Karolchik D., Barber G.P., Casper J., Clawson H., Cline M.S., Diekhans M., Dreszer T.R., Fujita P.A., Guruvadoo L. & Haeussler M.  (2013). The UCSC Genome Browser database: 2014 update, Nucleic Acids Research, DOI:

Nguyen N., Hickey G., Raney .BJ., Armstrong J., Clawson H., Zweig A., Kent J., Haussler D., Paten B. (2013). Comparative Assembly Hubs: Web Accessible Browsers for Comparative Genomics. arXiv:1311.1241v1 [q-bio.GN]

Antique (or maybe “classic”) UCSC Genome Browsers retired

Everyone is pretty comfortable with the concept of the non-standard but commonly used dog-years as a way to compare life spans to humans. Car enthusiasts have a taxonomy for the ages of vehicles. But I’ve been sitting here wondering what the genome-browser-years scale should be. I’ve been thinking about it because of the recent announcement over the UCSC list the other day:

Change in visualization access for old assemblies

Hello everyone!

Over the past 12 years we have made efforts to maintain visualization of many old, archived assemblies on our Archive server at http://genome-archive.cse.ucsc.edu/, in addition to providing download access to the associated data sets. Unfortunately, this visualization is no longer sustainable for very old assemblies due to the many changes in the Genome Browser software as it has matured. We are therefore reducing access to certain old assemblies to data downloads only, and are announcing the shutdown of our Archive server. We will continue to provide Genome Browser access for the 4 most current human assemblies, and at least the 2 most current assemblies for all other organisms with some exceptions. We will discontinue our visualization support for all other old assemblies, but will continue to make these data sets available on our download servers. The assemblies currently on our archive server for which we have discontinued visualization support include early human assembly drafts, hg4, hg5, hg6, hg7, hg8, hg10, hg11, hg12, hg13, hg15, rn1, rn2, mm1, mm2, mm3, mm4, mm5, rheMac1, bosTau1, ce1, danRer1, and danRer2. Links to the data and annotations associated with these assemblies have been added in the appropriate places on our Downloads page at http://hgdownload.soe.ucsc.edu/. Please contact us if you have difficulty locating a data set of interest.

Matthew Speir
UCSC Genome Bioinformatics Group

A view of the UCSC Genome Browser in the early days, ~2004.

A view of the UCSC Genome Browser in the early days, ~2004.

I’m sure the browser versions aren’t used very much anymore, but it was something I needed to be aware of. In our workshops I mention that the older versions have been available from the “archives” navigation on the landing page, but that will be gone now. Occasionally there are old papers that reference a genomic span that you want to revisit–but that’s becoming less common for those really old assemblies at this point. The data will persist for downloading, but the browser visuals will be gone. But it made me want to go back and look through my old materials to see what the early browsers looked like (click on the image to embiggen). A lot of the foundational structure is the same, but if you look at one of these old assemblies you might be surprised.  A lot fewer tracks, that’s for sure. In my shot, there are only a few species in the Conservation track (human, chimp, mouse, rat, chicken). We just didn’t have that much data available–not just across species, but other types of techniques and tools. Fewer tracks. Fewer functionality buttons.

This was almost as much fun as looking back at the old NCBI interfaces that I remember from way back. Those of you who have been in this rodeo for a while may remember those. Some of you will even remember the key “Pedro’s Tools” from back in the day. Sometimes it’s worth looking back at where we came from to realize how much further we are than we realized. I know there’s a lot of grousing about not having cured cancer and changed the pharmaceutical industry with the human genome sequence yet–but we really haven’t had the data that long in browser-years. Or maybe it’s more like Mars years–longer than you realize, despite the fairly comparable span of a single day. The arc of science is long, but it bends toward answers.

Edit: I realized I should look at the earliest paper and add that reference below. Check out Figure 1 for an even older view of the data.

Reference:

Kent W.J., Sugnet C.W., Furey T.S., Roskin K.M., Pringle T.H., Zahler A.M. & Haussler D. (2002). The Human Genome Browser at UCSC, Genome Res., 12 996-1006. DOI:

UCSC Genome Browser: updated slides + exercises

Just a quick note to let you know that there are new slides up on the website.

The Introduction to the UCSC Genome Browser tutorial suite has new documents to match the current site.  This summer there was an update to the UCSC genome browser’s known gene tracks, which altered the ucsc ID values, and the example item that we use for the workshop changed (from .2 to .3, so it wasn’t exactly a huge deal).

But I have also updated the shots on the slides to reflect the current default tracks.

The new slides + exercises documents are now available on the UCSC Introduction landing page. Click the buttons for access to the new documents. Version 22 of these are the new ones.

Freely available because they are sponsored by the UCSC Genome Bioinformatics group, you can download these and use the slides in classrooms, workshops, or seminars to train others.

Quick links:

UCSC Intro tutorial suite landing page: http://www.openhelix.com/ucscintro

UCSC Genome Browser: http://genome.ucsc.edu

Video Tip of the Week: Mobile-device enabled tutorial suites

For decade now we’ve been offering our video tutorial suites to help people learn how to use bioinformatics resources. We’ve used a couple of delivery platforms, and we’ve changed the website a few times. But we also know that people like consistency with software, and if there are going to be major changes to the behavior of something, there better be a good reason.

We have a good reason. With the rise of mobile devices and the increasing use of them by students, our subscribers wanted us to make watching the tutorials on iPads and Androids and Surfaces more friendly. So we’re doing it.

This week’s video tip demonstrates the change to our tutorial movies that we’re rolling out. The basics are the same–each video offers details about how to use the software features at some database or tool site. We explain the display features, and the search mechanisms. We offer the video as well as the slides and some exercises to use as well. The only thing we’ve changed is the menu and controller options. The YouTube video here illustrates that.

So soon when you launch a tutorial video, you will have to swipe over the edges to access the menus and the slider. You can still click individual chapters, or move ahead with the controller. But those items move out of the way when you aren’t using them.

Everything else is the same. The landing pages for each tutorial suite will still have the launch buttons for all the items you need to access everything.

For subscribers, all of the suites will have this new functionality. If your site doesn’t have a subscription, you can still try it out on our sponsored training suites, such as: GenoCAD, OMIM, UCSC Genome Browser, or anything else from the “free” tutorials page: http://openhelix.com/free .

To learn more about our philosophy of training materials, you can check out our paper (below). Regular readers may already understand what we do, but if you are accessing these for the first time it might help you to know more about what we offer and how we do it.

Let us know if you have any issue with the new interface and we’ll take a look right away.

Quick link:

Free tutorials to try out: http://openhelix.com/free

Reference:

Williams J.M., Mangan M.E., Perreault-Micale C., Lathe S., Sirohi N. & Lathe W.C. (2010). OpenHelix: bioinformatics education outside of a different box, Briefings in Bioinformatics, 11 (6) 598-609. DOI:

UCSC’s new Variant Annotation Integrator

In case you aren’t on the UCSC announcement mailing list, and you don’t go to the site via their homepage with the posted news–you should know about this new tool at the UCSC Genome Browser. It will take variations that you are exploring and make a prediction about whether the variant is associated with a function, and potentially if it is damaging to a protein. It’s under active development, so try it out. And if there are features you could use, suggest them. See the VAI page for more.

Here are the details via their email, but sign up for the “announce” mailing list to get this news like this in your inbox if you like too:

[Link to the original at the mailing list site]

Hello all,

In order to assist researchers in annotating and prioritizing thousands
of variant calls from sequencing projects, we have developed the Variant
Annotation Integrator (VAI). Given a set of variants uploaded as a
custom track (in either pgSnp or VCF format), the VAI will return the
predicted functional effect (e.g., synonymous, missense, frameshift,
intronic) for each variant. The VAI can optionally add several other
types of relevant information, including: the dbSNP identifier if the
variant is found in dbSNP, protein damage scores for missense variants
from the Database of Non-synonymous Functional Predictions (dbNSFP), and
conservation scores computed from multi-species alignments. The VAI also
offers filters to help narrow down results to the most interesting variants.

Future releases of the VAI will include more input/upload options,
output formats, and annotation options, and a way to add information
from any track in the Genome Browser, including custom tracks.

There are two ways to navigate to the VAI: (1) From the “Tools” menu,
follow the “Variant Annotation Integrator” link. (2) After uploading a
custom track, hit the “go to variant annotation integrator” button. The
user’s guide is at the bottom of the page, under “Using the Variant
Annotation Integrator.”

As always, we welcome questions and feedback on our public mailing list:
genome@soe.ucsc.edu.


Brooke Rhead
UCSC Genome Bioinformatics Group

 

UCSC Genome Browser: New Euro server (affects custom tracks)

So there’s some news from the UCSC Genome Browser team. It’s great news for a better user experience–sharing the load on different continents. But the first thing I wondered about when I heard that was: what about sessions and custom tracks?

We tell people in trainings that they can share their sessions with colleagues around the world by sending a link. But now that link behaves a bit differently depending on where you are–I’m told that if I got a custom track session link from a European server user, I’d get directed to that one. So keep that in mind, and note that you might have to tweak things just a bit.

Whole announcement email from the UCSC announcement list:

The UCSC Genome Browser is pleased to announce the introduction
of a new mirror site to serve our users in Europe.  Genome-euro is an
official European mirror site of the UCSC Genome Browser, at
http://genome-euro.ucsc.edu. The server is physically located at the
Universität Bielefeld Center for Biotechnology in Bielefeld, Germany,
and is administered by UCSC.   Genome-euro is meant to be an alternate,
faster access point for those Browser users who are geographically
closer to central Europe than to the western United States.

All functionality will be the same as on the US server, although Custom
Tracks will not be transferred. Saved Sessions containing Custom Tracks
will require some manual intervention

When European users navigate to the US server home page and click
the “Genomes” menu item, they will receive a notification that they have
been redirected to the more geographically-appropriate server. They will
have the option to remain on the US server, as  described in our
documentation

The backup mirror in Aarhus, Denmark will continue to serve as a
emergency site in the event of the official sites in California and Germany
malfunctioning.

Thanks to Steve Heitner, Brooke Rhead, Galt Barber, Hiram Clawsen,
Jorge Garcia and the rest of the Genome Browser staff for engineering
and testing.

We wish to express our special thanks to our colleagues at the
Universität Bielefeld Bioinformatics, especially Jens Stoye, for making
this possible.

Regards,

–b0b kuhn
ucsc genome bioinformatics group

If you don’t know what I’m talking about with sessions and custom tracks, be sure to see our introductory and advanced tutorials that covers those aspects.

UCSC Genome Browser outage Wednesday June 26 (brief)

UCSC Announcements list mailing:

This Wednesday, June 26 at 4pm PDT (UTC-7), we will be performing a hardware upgrade on the Genome Browser. The Genome Browser itself, the Download server, and the BLAT servers will all be offline for no more than one hour. Thanks in advance for your patience while we work to make the Genome Browser more responsive to your needs.

Just a public service announcement. Every time it goes down (for either planned maintenance, or earthquakes) we see a big spike in searches for mirror sites.

So I’m warning you now, but also have a mirror bookmarked for the future too:

Regular page that lists their mirrors: http://www.genome.ucsc.edu/mirror.html

http://genome.hmgc.mcw.edu/

http://genome-mirror.bscb.cornell.edu/

http://moma.ki.au.dk/genome-mirror/