Video Tip of the Week: COSMIC, Catalogue Of Somatic Mutations In Cancer

When we do workshops at medical centers, one of the most common questions I get is about locating good resources for cancer data. And we’ve talked about some of the large projects, like the ICGC. We’ve talked about ways to stratify data sets, and one example of this was in cancer, using data from The Cancer Genome Atlas.  Going forward, the ability to rapidly sequence normal vs tumor pairs should help us to even more rapidly understand and target tumors. And this will lead to other cases of entirely new leads in some situations.

But one of the really solid tools that I like to be sure to highlight for people is the COSMIC collection. It’s not new–it’s been around for a decade now. But it’s one of those types of core data resources that people really need to know about. Their long experience, their high quality curation, and their adaptations to new influxes of data volumes and data types, make them a really valuable source of information.

Reading their update paper in the 2015 NAR Database issue, I wanted to go over and refresh my memory of the features I knew, and explore some of the newer features too. There really is some serious depth over there, and I can’t touch on all of the aspects that they have in a blog post like this. But I also discovered that they’ve recently provided a number of videos to help people learn about the various tools and options.

For this week’s Video Tip of the Week, I’ll include their “overview” piece. But you should check out their Tutorials page for additional topics as well.

One feature that I hadn’t realized is that they offer was a Genome Browser using the JBrowse framework.  There’s a separate video with some guidance on how to use that.

Their future directions section in the paper makes it clear they are preparing to be able to handle the incoming data on this topic. And they are evaluation new tools and analyses that may be appropriate. But they commit to maintaining their strong emphasis on curation–which is music to my ears. I think quality hand curation is simultaneously undervalued by end users (and sadly by funders), while being entirely critical to handling all the big data that’s coming. So get familiar with COSMIC for cancer genomics data. It will be worth you time.

Quick link:

COSMIC: http://cancer.sanger.ac.uk/


Forbes S.A., D. Beare, P. Gunasekaran, K. Leung, N. Bindal, H. Boutselakis, M. Ding, S. Bamford, C. Cole, S. Ward & C. Y. Kok & (2014). COSMIC: exploring the world’s knowledge of somatic mutations in human cancer, Nucleic Acids Research, 43 (D1) D805-D811. DOI: http://dx.doi.org/10.1093/nar/gku1075

Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

  • Nice video on RNA interference by Nature Reviews Genetics. You can access all of the featured RNAi multimedia links from this page, or go straight to the video on this page. [Jennifer]
  • Interesting, The Repertoire 10K (R10K) Project: RT @deannachurch: CG: go to http://t.co/rekf2Gkd for more information on joining the project! #AGBT [Mary]
  • And it’s not in the papers anymore… RT @genome_gov: Pachter: “My worst nightmare: the curse of deep sequencing” aka too much data. #AGBT [Mary]
  • Read a Nature Outlook on allergies from Nov. 2011 – lot of new philosophies & theories that I wasn’t aware of. Currently free full access is available to the Nature Allergy Outlook [Jennifer]
  • RT @andrewsu: Word cloud of NAR 2012 Database issue abstracts via http://t.co/TMtefZ0k http://t.co/2zRzZkEG [Mary]
  • Cool new option for PDB submissions: Volunteer Structures For Foldit [Jennifer]
  • RT @LouWoodley NYC tweeps – the next Science Online NYC is on March 20th on keeping the research record straight http://bit.ly/xwziUb #sonyc [Jennifer]
  • RT @GeneSherpas: “@GeneticsUpdate: Can You Be Fired for Your Genes?  http://t.co/fDaOriU” Hopefully our future doesn’t come down to this!” [Mary]
  • Ha! That was unexpected… RT @edyong209: Bizarre SNP study on genetics of choral singing. Abstract takes surprising turn in final lines. http://t.co/VkSU4fd0 [Mary]
  • RT @jacksonlab: Facing a rare #genetic disease together, the Wentzell family doesn’t let anything slow them down. #raredisease http://t.co/bscdUXoN [Mary]

NAR database issue (always a treasure trove)

The advance access release of most of the  NAR database issue articles is out. As usual, this this database issue includes a wealth of new and updated data repositories and analysis tools. We’ll be writing up additional more extensive blog posts on it and doing some tips of the week over the next couple months, but I thought I’d highlight the issue and some of the reports:

There are a lot of updates to many of the databases we know and love (links to go full text article): UCSC Genome Browser, Ensembl, UniProt, MINT, SMART, WormBase, Gene Ontology,  ENCODE, KEGG, UCSC Archaeal Browser, IMG/M, DBTSS, InterPro and others (we have tutorials on all those listed here).

And, as an indication of the explosion of data available (itself a subject of a database issue article, SRA), there are a lot of new(ish) databases on specific datatypes such as MINAS, a database of metal ions in nucleic acids (nice name :D); doRiNA, a database of RNA interactions in post-transcriptional regulation; BitterDB, a database of bitter compounds and well over 100 more.

And I’ll give a special shout out to my former PI at EMBL because I can, Peer Bork’s group has 4 databases listed in the issue: eggNOG, SMART, STITCH and OGEE. (and he and a couple members are on the InterPro paper also).

This is going to be a wealth of information to wade through!

NAR Web Server issue is out

I’m packing for my trip to the NIH to do some workshops, but wanted to make sure our regular readers catch this–the NAR web server issue is out. Always a nice look at some tools that may be new (or new to you) and updates to existing ones. Hat tip to Francis Ouellette for all the tweets (@bffo –he’s great to follow, by the way–worth having in your list):

RT @bffo: New in NAR: The Annual Nucleic Acids Research Web Server Issue link to Gary Benson’s editorial:  http://bit.ly/lKPpRq #bioinformatics

RT @bffo: NAR Web server issue: Table of Content: http://bit.ly/mPbx7J #bioinformatics

RT @bffo: New in NAR: our  paper: The 2011 bioinformatics links directory update: > resources, tools, & databases …  http://bit.ly/l1FlD4

I haven’t had time to go through it yet myself, but I will soon. And it’s usually a great source of tips-of-the-week!

New databases and resources from NAR database issue

If you haven’t noticed, articles in the Database issue of Nucleic Acids Research have been going to Advance Access in the last week . There is a wealth of new resources and databases, as always, in this issue. I’ll be going through these in the coming month or so and will post more in depth reviews of them then, but I’d thought I’d list some that were released Friday, go to the link above for even more from earlier in the week:

AmoebaDB (article)
MicrosporidaDB (article)
BriX (article)
WebGeSTer DB: a transcription terminator database. (article)
SCLD: Stem cell lineage database (article)
OrthoDB: Hierarchal  Catalog of Eukaryotic Orthologs (article)
CaSNP: a database for interrogating copy number alterations of cancer genome from SNP array data (article)

Friday SNPpets

Welcome to our Friday feature link dump: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

Quick Reference Cards for teaching and outreach

We know there are a number of different ways that scientists and students become familiar with genomics software.   Some of it comes from the traditional publication routes–like the very handy NAR Database issue.  Or like the Current Protocols papers we’ve done recently.  We have these online tutorials that people use in various ways: some teach themselves by watching the video and working the exercises, some download the matching slide sets and run local workshops (our catalog: some are free/sponsored and green icons indicate that; red indicates subscription required). Librarians are using them to become “embedded” in courses in some cases.

A less-well-known type of material we have is the Quick Reference Card.  These are printed cards with URLs, hints, tips, definitions, shortcuts–for stuff that you may want a quick reminder of: where a feature is located, or how to use it.  People who run the local workshops will sometimes write to us to get a set for their courses.  They are great to give out at conferences to raise awareness of the software.

We have these cards for several resources that we also have free sponsored training videos + slides + exercises with: UCSC Genome Browser (2 cards–intro and table browser); Galaxy, and our newest: RCSB PDB and SGKB.  You can go to this form and order them, and we’ll send them out.

I bring this up today because we just received word from Ensembl that they have created a card that we can distribute as a PDF.  You can print it up and put it on the wall near the computer as a handy reminder of some features and tools at Ensembl.  Click the image to download the PDF, or go directly to the link below.


Order OpenHelix printed cards for resources: http://www.openhelix.com/cgi/qrcOrder.cgi

Ensembl PDF card download: Ensembl_card_march2010.pdf

Tip of the Week: GOOD

This tip of the week comes to you by way of the recent NAR Database issue article and a twitter from Francis Ouillette. The Gene-Oriented Orthology Database (or GOOD) is a relatively new database of orthologous regions found in four genomes (human, mouse, chimpanzee and cow) using gene regions (hence the name) instead of proteins to determine orthology. As the authors state in the paper:

Several ortholog databases are now available online. Most of them, however, consider orthology from the aspect of protein sequences individually, including HomoloGene, EnsemblCompara, Inparanoid , Roundup , OrthoMCL and OrthoDB . There exist ambiguous and incomplete ortholog assignments because of the interference mediated by alternative splicing (AS). Isoforms of one gene might beassigned to different orthologous clusters.

They also use GO terms in depth to help determine functions and annotations. The tip of the week this week is a quick introduction to this new database.

TE insertions in genomes

transposon graphicIn the recent database issue of NAR, there are two reports of transposable element (TE) databases. I already discussed one in an earlier post. That one is a database that includes Gypsy elements (non-LTR retroposons) and retroviruses and aims to be a database “devoted to the non-redundant analysis and evolutionary-based classification of mobile genetic elements.” Hopefully to develop into a database of all TE’s. The other paper, by Levy et al. (“TranspoGene and microTranspoGene: transposed elements influence on the transcriptome of seven vertebrates and invertebrates“)1, I’ll discus briefly here introduces a somewhat different database of transposable elements.

ResearchBlogging.orgAs the paper discusses, TE’s have been implicated in a large number of effects on the vertebrate genome: affecting expression and “contributing to genetic diversity, genomic expansion, genomic content and genomic rearrangements.” Whether these immense changes are the raison d’être for transposable elements or the byproduct of a parasitic DNA element, I’ll leave for discussion (though I would side with the latter), but they are unarguably major factors in genome evolution and don’t fit the definition of “junk DNA.” :D.

Transposon Database

ResearchBlogging.orgI started my Ph.D. studies into the evolution of non-LTR retrotransposable elements in 1990 and found the world of mobile genetic elements or transposons (aka a long time ago… jumping genes) to be varied, complicated and fascinating. In 1993 I discovered the web. I’ve hoped for and searched for a database of these “Mobile genetic elements [that] are self-contained genomic units capable of proliferating within their host genomes”1 to little satisfaction. Such databases exist such as the Mouse Transposon Insertion Database and the dbRIP (human retrotransposon polymorphisms in humans) and several others. But these almost entirely organism-specific databases. This helps the study of the organism (mobile elements have an effect on the genome), but little in the study of transposons as a class.

So, I was happy to see the latest article in the NAR database issue on GypDB

