Tag Archives: cancer


Friday SNPpets

This week’s SNPpets offer a rather eclectic collection. Visualization with PanViz, LepBase for lepidopterans, a simulated data generator, and a new collection of community-curated phylogenetic estimates. But the big noise was the cancer “moonshot” data commons and the clinical trial for NCI-MATCH and precision medicine. Also newborn genome sequencing. Funniest thing: passive-aggresive bioinformatics. Coolest thing: Paul Simon and CRISPR (scroll to the bottom of the list).

SNPpets_2Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

one query type from: genomeportal.stanford.edu/pan-tcga/

Tip of the Week: The Cancer Genome Atlas Clinical Explorer

Accessing TCGA cancer data has been approached in a variety of ways. This week’s tip of the week highlights a web-based portal for improved access to the data in different ways. The Stanford Cancer Genome Atlas Clinical Explorer is aimed at helping identify clinically relevant genes in the cancer data sets.

They note that the data is available in other places and tools, from tools we’ve talked about before such as cBioPortal, UCSC Cancer Genomics Browser, and interacting with the StratomeX features. But this portal helps peoplt to quickly focus on clinical parameters in ways that aren’t as straightforward in the other tools.

You can learn more about the project on their site from their Overview at the site, and you can see their publication about it (below). The paper also covers some issues they had with the downloaded data that might be worth noting. And they also supplemented their analysis and information with COSMIC and TARGET (tumor alterations relevant for genomics-driven therapy) data as well.

one query type from: genomeportal.stanford.edu/pan-tcga/

One query type from: genomeportal.stanford.edu/pan-tcga/

The interface offers several quick ways to dive into the data.

There are 3 main query types: genes associated with certain clinical parameters; query directly by gene/protein/miR; and a two-hit hypothesis test. The first query is the image I’ve shown here. When you get to the the results, you can explore them in more detail with sortable tabular outputs, and on gene pages tabs for copy number changes, mutations, and RNA-seq values.

They give you some “example queries” that you can use as a way to get started and see what’s available underneath. And although we usually like to highlight a video, the tutorial that they provide is a slide embed.

So have a look at this interface if you’d like to explore TCGA data with a handy and quick query strategy. It might offer some hunting license on genes you are interested in, or some ideas for other investigations in tumor types you study.


Quick link:

Stanford-TCGA-CE: http://genomeportal.stanford.edu/pan-tcga

Lee, H., Palm, J., Grimes, S., & Ji, H. (2015). The Cancer Genome Atlas Clinical Explorer: a web and mobile interface for identifying clinical–genomic driver associations Genome Medicine, 7 (1) DOI: 10.1186/s13073-015-0226-3

Comparison of cancer genomics tools, via: Swiss Med Wkly. 2015;145:w14183

Video Tip of the Week: UCSC Xena System for functional and cancer genomics

When we go out and do workshops, we get a lot of requests from researchers who would like some guidance on cancer genomics tools. Our particular mission has been to aim more broadly at tools that are of wide interest and not to focus on a particular disease or condition area. But certainly the cancer genomics arena is going to be one of the ones that’s got so much opportunity for great bioinformatics-based outcomes in the near term. So I keep an eye out for tools researchers may want to explore.

When the “genomics” twitter column in my Tweetdeck dropped this new mini-review of cancer genomics tools on my desktop, I went to look right away: Data mining The Cancer Genome Atlas in the era of precision cancer medicine. TCGA is the focus of the data source they are talking about, but the tools included may have more data sets and wider utility, of course. Most of the tools described were familiar to me (cBioPortal, GDAC Firehose, UCSC Cancer Genomics Browser, canEvolve), but a couple of them were new. I had never explored the ProGeneV2 tools before. And the UZH Cancer Browser was also new to me.

Comparison of cancer genomics tools, via: Swiss Med Wkly. 2015;145:w14183

Comparison of cancer genomics tools, via: Swiss Med Wkly. 2015;145:w14183

One thing that’s very helpful to me is the kind of table they provided as Table 2. It’s a comparison of the main tools they are discussing, with different features of each compared. That’s handy for choosing the tool to spend time on, depending on your own research needs.

But they also referred to another tool that was new to me, Xena. “The UCSC cancer browser will be updated in the future, with the new Xena platform for visualisation and integration with Galaxy“. I can never resist new genomics visualization tools, and as a giant fan of Galaxy, I certainly need to know more about this.

So I went to look around for some information on it, and their introductory video is this week’s Tip of the Week.

So Xena is designed to let you combine your own data with large public resource collection data, without leaving your firewall or without being too onerous to pull down all the public data and manage it locally. You can explore functional genomics data and related phenotype and clinical data. It uses the “hubs” strategy that is becoming increasingly adopted as a way to integrate across data collections. We were just talking about hubs in another recent tip if these are new to you. It supports a wide range of data types to examine and visualize. If you want to go deeper, there’s a lot more information over at the Xena homepage. They have documentation, presentation slides, and a step-by-step demo available from a recent workshop.

Certainly one of the key features appears to be that you can integrate your own research data–which might be subject to strict privacy regulations–on your own computer with all the other key information from public data providers. Increasingly researchers I talk to at workshops need this aspect very much.

So try out Xena, and explore the other tools in the cancer genomics space, to see what’s right for your research.

Hat tip to Oscar:

And you can follow Xena on twitter for news and updates: https://twitter.com/UCSCXena

Quick links:

Xena: http://xena.ucsc.edu/


Cline, M., Craft, B., Swatloski, T., Goldman, M., Ma, S., Haussler, D., & Zhu, J. (2013). Exploring TCGA Pan-Cancer Data at the UCSC Cancer Genomics Browser Scientific Reports, 3 DOI: 10.1038/srep02652

Cheng PF, Dummer R, & Levesque MP (2015). Data mining The Cancer Genome Atlas in the era of precision cancer medicine. Swiss Med Wkly. (145) : 10.4414/smw.2015.14183


What’s the Answer? (cancer data visualization tools)

This week’s highlighted question was less of a question than a notice about a new tool. And because I’m always interested in exploring new visualization tools, I was interested to have a look. In addition, we are frequently asked about tools specific for cancer genomics, and I like to be able to tell people about what I’ve found.

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Thursdays we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

So here’s this week’s highlighted question (well, the question mark was edited out later, but this is what it said when I grabbed it):

Tool: A new tool for cancer researcher developed by UCSC?

Another tool XENA that comes to the world of Bioinforamatics designed by UCSC. I am not sure if people are aware of it. It is developed recently and I got the notification last evening. Seems to be having a lot of potential for both data visualization and also producing quality images for publication. The paper is not yet out but the mention of the tool was done last year when the update paper of the UCSC Cancer Genomics browser was made for 2015.

The tool has got data from both TCGA and ICGC and is a powerful resource not only for public data comparing and viewing but also one can upload its own data or download the tool locally to a desktop app version and visualize it. The tool is available at the below link


The technical doc is here . Am sure it will be a great resource for the researchers and bioinformaticians across the globe. For analysis it also integrates the galaxy as well and if you format your data in a version as mentioned in help docs one can view their data as well. Enjoy and appreciate the work. Hope people would like it.


I am not involved in the work. I liked the tool a lot so thought of informing it to the community


It also generated a bit of discussion about the challenges of developing visualizations. Go have a look.

Video Tip of the Week: COSMIC, Catalogue Of Somatic Mutations In Cancer

When we do workshops at medical centers, one of the most common questions I get is about locating good resources for cancer data. And we’ve talked about some of the large projects, like the ICGC. We’ve talked about ways to stratify data sets, and one example of this was in cancer, using data from The Cancer Genome Atlas.  Going forward, the ability to rapidly sequence normal vs tumor pairs should help us to even more rapidly understand and target tumors. And this will lead to other cases of entirely new leads in some situations.

But one of the really solid tools that I like to be sure to highlight for people is the COSMIC collection. It’s not new–it’s been around for a decade now. But it’s one of those types of core data resources that people really need to know about. Their long experience, their high quality curation, and their adaptations to new influxes of data volumes and data types, make them a really valuable source of information.

Reading their update paper in the 2015 NAR Database issue, I wanted to go over and refresh my memory of the features I knew, and explore some of the newer features too. There really is some serious depth over there, and I can’t touch on all of the aspects that they have in a blog post like this. But I also discovered that they’ve recently provided a number of videos to help people learn about the various tools and options.

For this week’s Video Tip of the Week, I’ll include their “overview” piece. But you should check out their Tutorials page for additional topics as well.

One feature that I hadn’t realized is that they offer was a Genome Browser using the JBrowse framework.  There’s a separate video with some guidance on how to use that.

Their future directions section in the paper makes it clear they are preparing to be able to handle the incoming data on this topic. And they are evaluation new tools and analyses that may be appropriate. But they commit to maintaining their strong emphasis on curation–which is music to my ears. I think quality hand curation is simultaneously undervalued by end users (and sadly by funders), while being entirely critical to handling all the big data that’s coming. So get familiar with COSMIC for cancer genomics data. It will be worth you time.

Quick link:

COSMIC: http://cancer.sanger.ac.uk/


Forbes S.A., D. Beare, P. Gunasekaran, K. Leung, N. Bindal, H. Boutselakis, M. Ding, S. Bamford, C. Cole, S. Ward & C. Y. Kok & (2014). COSMIC: exploring the world’s knowledge of somatic mutations in human cancer, Nucleic Acids Research, 43 (D1) D805-D811. DOI: http://dx.doi.org/10.1093/nar/gku1075

What’s the Answer? (cancer data discrepancies)

BioStar is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the community and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

This week’s question highlights the cancer data issues, and since I did a cancer database as “Tip of the Week” that got some decent interest, I thought I’d keep with the theme. Maybe some researchers who are more familiar with the cancer data sets will have some insights.

Question: cBio portal vs. Oncomine: Difference in samples and expression data

I was looking for expression levels for two genes involved in serous ovarian cancer data from TCGA via CBio portal. Based on a Z-score threshold of 2.0, I found the following percentage of samples (cases) have expression levels affected (UP or Down)

Case Set: Tumors with mRNA data (Agilent microarray): All samples with mRNA expression data (489 samples) CCNE1 – 11% CDK12 – 7%

For same dataset in Oncomine (I am using the free version):

TCGA Ovarian (517 samples <- this number is higher than what is reported in TCGA ovarian cancer publication) expression data is provided as log-2 median intensity and the in Oncomine shows that higher expression level of CCNE1 and CDK12 expression level is correlated with different grades – for example Grade 3 tumor (Grade 3 (431 samples) have higher expression level of both genes.

I have also noticed that the dataset 517 samples were assigned as No Associated Paper 2011/03/24. I am wondering if the data is reffering to this paper on TCGA ovarian cancer dataset. http://www.nature.com/nature/journal/v474/n7353/full/nature10166.html

I am wondering why such a discrepancy or am I missing something here.

PS. I have posted this question on both Oncomine and cBio list, but did not receive any responses yet. I am wondering if anyone here with experience on one of the platform could provide insight to this.

The other issue that interested me was the support problem. This is a skilled super-user trying hard to do it right–contacting the support teams of the sites, and getting no response. I think that’s one of the most frustrating things about this arena. Some projects are well resourced for user support. Some are not. But if smart users can’t figure out what’s going on with your site’s data, your resource isn’t as useful as you think it is. I wish support was valued more. But if you know what’s up–go over and offer an answer.

Video Tip of the Week: My Cancer Genome

computer_docThere are a lot of cancer database resources out there. Most of the ones we’ve focused on have been the data repository types. TCGA, ICGC, CaBIG, COSMIC, Cancer Genome Workbench, UCSC Cancer Genomic Browser, and of course big repositories like GEO. Researchers will need these sources of data to locate key alterations in cancer cells and tissues, and to evaluate changes with treatment conditions. But these are possibly not the most useful places for clinicians faced with a specific sample, or for patients trying to understand their situations. As more and more tumor sampling data becomes available, direct and specific access to actionable pieces of information will be crucial.

The MyCancerGenome site aims to serve that actionable end of the data spectrum. It has been developing for a while, but the recent story in the New York Times reminded me of it: Variations on a Gene, and Tools to Find Them. So for this week’s Video Tip of the Week, I bring you a look at the My Cancer Genome resources. They have a nice intro video that I will include here. It highlights features that I wouldn’t have been able to access–the part that links patient records + mutations + the curated detailed pages about the mutations and relevant studies. The public has access to that last part, but you wouldn’t be able to see the electronic health record part from the public side.

Papers are coming out that describe the deposition of information into the MyCancerGenome site. You can learn more about the philosophy and strategy about cataloging the somatic mutations that are clinically relevant in the recent paper about the DIRECT (DNA mutation Inventory to Refine and Enhance Cancer Treatment) project. A tab at that site shows you the initial data associated with that, from non-small cell lung cancer (NSCLC) mutations in the Epidermal Growth Factor Receptor (EFGR). And as more of this data comes along we’ll see it grow, of course. Seems a good step in translational medicine. So have a look at the useful and evidence-based information about specific cancer-related variations they are collecting.

Another feature is a search option to find clinical trials–by disease or by gene. I don’t think I’ve seen a gene-specific search for this kind of information before. This could be useful for people who need access to new treatment options if they have specific mutation data about their own tumors.

Have a look at My Cancer Genome, and think about where we are going with this data. I hope that the new cancer genomics data will really help drive appropriate and effective treatment strategies.

Quick link:

My Cancer Genome site: http://www.mycancergenome.org/

NYT article: Variations on a Gene, and Tools to Find Them


Swanton, C. (2012). My Cancer Genome: a unified genomics and clinical trial portal The Lancet Oncology, 13 (7), 668-669 DOI: 10.1016/S1470-2045(12)70312-1

Yeh, P., Chen, H., Andrews, J., Naser, R., Pao, W., & Horn, L. (2013). DNA-Mutation Inventory to Refine and Enhance Cancer Treatment (DIRECT): A Catalog of Clinically Relevant Cancer Mutations to Enable Genome-Directed Anticancer Therapy Clinical Cancer Research, 19 (7), 1894-1901 DOI: 10.1158/1078-0432.CCR-12-1894

My Cancer Genome. 2013. http://www.mycancergenome.org (Accessed 4/30/2013).

A new look for the UCSC Cancer Genomics Browser

From the UCSC Genome Browser announcement mailing list:

The UCSC Cancer Genomics group has recently remodeled the interface of
their Cancer Genomics Browser (https://genome-cancer.ucsc.edu/) to make
it easier to navigate and more intuitive to display, investigate, and
analyze cancer genomics data and associated clinical information. This
tool provides access to many types of information —- biological
pathways, collections of genes, genomic and clinical information -— that
can be used to sort, aggregate, and perform statistical tests on a group
of samples. The Cancer Browser currently displays 473 datasets of 25
cancers from The Cancer Genome Atlas (TCGA) as well as data from the
Cancer Cell Line Encyclopedia (CCLE) and Stand Up To Cancer.

You can find more information about how to use this tool in the online
tutorial, user’s guide and FAQ. Any questions or comments should be
directed to genome-cancer@soe.ucsc.edu.

Donna Karolchik
UCSC Genome Browser Senior Engineering Manager


Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

  • Chromothripsis – new model for some cancers? From GenomeWeb Daily News. I’m interested in seeing follow up studies on this. [Jennifer]
  • A new data source added to the BioMart Central portal: “EMAGE, a database of in situ gene expression data in the mouse embryo, has been added to BioMart Central Portal. The EMAGE website can be found at http://www.emouseatlas.org/emage/ and the EMAGE BioMart server can be found at http://biomart.emouseatlas.org/” (via the Mart-dev mailing list) [Mary]
  • Another potential outlet for scientists wanting to get involved: the Global Knowledge Initiative who’s goal is [Jennifer]

    We build global knowledge partnerships between individuals and institutions of higher education and research. We help partners access the global knowledge, technology, and human resources needed to sustain growth and achieve prosperity for all.

  • From GenomeWeb – an announcement about MoDEL the ‘World’s Largest Protein Video Database’ – it is free for academic, not-for-profit use. I haven’t tried it at all, but it sounds like it might be cool. Let us know if you check it out! [Jennifer]
  • Announcement from the International Cancer Genome Consortium (where you can access the data using the cutting edge BioMart build…Hat tip to @bffo: Update on ICGC website with a simplified application process for controlled access data  #bioinformatics #cancer #genomics  http://icgc.org/ [Mary]
  • Another resource for protein-protein and drug-protein interactions: PROMISCUOUS [Jennifer]
  • There’s a new Announcement mailing list for BioMart, as it gets migrated from the former EBI location.  Announce and Users lists are available–if you were on them you probably got automatically migrated. If you want to sign up, see this note:  [mart-announce] New BioMart announce and users mailing lists.  Hmm, that’s not entirely helpful as it hides the addresses you need. They are: mart-dev@ebi.ac.uk becomes users@biomart.org and mart-announce@ebi.ac.uk becomes announce@biomart.org [Mary]
  • REViGO – a resource for reducing and visualizing Gene Ontology trees, described in this paper: Supek F et al. PLoS Genet 6(6): e1001004. [Jennifer]