There’s great stuff in genome databases. And there’s great stuff in the literature. Sometimes there are clear links between the two–awesome curators work hard to source quality information, and sometimes automated processes can help. Of course, there’s also tons of stuff flowing into databases that’s not made it into the literature yet and may not ever in full, and that’s a whole ‘nother issue. But what if there was a way to put them together…. [Americans of a certain age will crack up at this]
But a lot of people are working on mining the literature for useful information and making it more accessible in other ways, as well as adding context to a genomic region. The project I’m highlighting today does exactly that.
Last year a program from Elsevier started to enable database and software providers and others to have access to their text corpus, and build “apps” that add value to the literature. We talked about it here in the context of NCBI’s app. Using this mechanism, you can also add value to genome databases by linking to the literature directly. That’s what the new Publications track in the UCSC Genome Browser does.
You can learn more about the track details from the link to a Publications landing page. But there was also an announcement from the track’s lead developer Max Haeussler, that came over the Biocurator mailing list with some features, I’ll quote that here:
Look for it in the group “Mapping and Sequencing”, name “publications” on the UCSC genome browser for human and major model organisms (mouse, fly, zebrafish, etc). It currently contains data mined from around 3 million research articles, with sequences found in around 200k papers.
So millions of Elsevier papers, PubMed Central articles, (and more sources to come are likely) have been mined to find sequences. These sequences were blatted against genome sequences. Matches have been indicated on the UCSC Genome Browser. You can get more details about the strategy from the paper I’ve linked to below, and on the Publications track page.
What’s so cool about this is that you can look at your genomic regions of interest, and now see if others have come across this region. Some of them will be papers you know, of course–but there might be other papers you didn’t know that could bring new insights about that region.
Also in the Biocurator letter Max offered a sample region to look at–here’s the link for that. Click it to load up the example and look around:
Here is a link to the genome browser with the track activated and zoomed the EGF gene:
In my video tip I’ll show how to get to that and the track details as well. If you need to learn more about the basic functions of the UCSC Genome Browser you can see the freely available sponsored training materials on that here: http://openhelix.com/ucsc
The other thing you can do is add the “app” at Elsevier to your personal set of apps if you have access to SciVerse. And when you find yourself in an article with sequence data, you’ll be able to click from that to go to UCSC. Jennifer described how to do that for our app, but the process would be similar if you wanted to add this UCSC app as well. You can find and add the UCSC app here, and while you’re at it you can add the OpenHelix app here . That will mine the text for the databases and software that authors mention, and will link you to training so you can learn how to use resources like the UCSC Genome Browser.
Special note: Max and his team are eager for feedback on this new track–if there’s something not quite right, or if there are other aspects you might want to see. He’s great on bug reports (I know, I sent some): takes them seriously and roots them out! And if you have any constructive thoughts I’m sure he’ll come by for a look. He has contact info on the Publications Track page as well. And he can correct anything I’ve got not fully right here in return for my bug hunting
Editing note October 25, 2013: At the time of this video, the Publications track was located elsewhere, but it can now be found in its own “Literature” track group. Look for it between “Genes and Gene Predictions” and “mRNA and ESTs”.
Publications Track details page for human on UCSC Genome Browser: http://genome.ucsc.edu/cgi-bin/hgTrackUi?hgsid=278356441&c=chr4&g=pubs
UCSC Genome Matches app at SciVerse –add it to supplement your literature browsing: http://bit.ly/Netpka
UCSC Genome Browser Intro training: http://openhelix.com/ucsc
OpenHelix SciVerse App Description: http://bit.ly/xtGcco
Haeussler, M., Gerner, M., & Bergman, C. (2011). Annotating genes and genomes with DNA sequences extracted from biomedical articles Bioinformatics, 27 (7), 980-986 DOI: 10.1093/bioinformatics/btr043