Category Archives: Tip of the Week

one query type from: genomeportal.stanford.edu/pan-tcga/

Tip of the Week: The Cancer Genome Atlas Clinical Explorer

Accessing TCGA cancer data has been approached in a variety of ways. This week’s tip of the week highlights a web-based portal for improved access to the data in different ways. The Stanford Cancer Genome Atlas Clinical Explorer is aimed at helping identify clinically relevant genes in the cancer data sets.

They note that the data is available in other places and tools, from tools we’ve talked about before such as cBioPortal, UCSC Cancer Genomics Browser, and interacting with the StratomeX features. But this portal helps peoplt to quickly focus on clinical parameters in ways that aren’t as straightforward in the other tools.

You can learn more about the project on their site from their Overview at the site, and you can see their publication about it (below). The paper also covers some issues they had with the downloaded data that might be worth noting. And they also supplemented their analysis and information with COSMIC and TARGET (tumor alterations relevant for genomics-driven therapy) data as well.

one query type from: genomeportal.stanford.edu/pan-tcga/

One query type from: genomeportal.stanford.edu/pan-tcga/

The interface offers several quick ways to dive into the data.

There are 3 main query types: genes associated with certain clinical parameters; query directly by gene/protein/miR; and a two-hit hypothesis test. The first query is the image I’ve shown here. When you get to the the results, you can explore them in more detail with sortable tabular outputs, and on gene pages tabs for copy number changes, mutations, and RNA-seq values.

They give you some “example queries” that you can use as a way to get started and see what’s available underneath. And although we usually like to highlight a video, the tutorial that they provide is a slide embed.

So have a look at this interface if you’d like to explore TCGA data with a handy and quick query strategy. It might offer some hunting license on genes you are interested in, or some ideas for other investigations in tumor types you study.

 

Quick link:

Stanford-TCGA-CE: http://genomeportal.stanford.edu/pan-tcga

Reference: 
Lee, H., Palm, J., Grimes, S., & Ji, H. (2015). The Cancer Genome Atlas Clinical Explorer: a web and mobile interface for identifying clinical–genomic driver associations Genome Medicine, 7 (1) DOI: 10.1186/s13073-015-0226-3

SGD variant viewer

Video Tip of the Week: SGD’s Variant Viewer

Variant viewers are very popular. As we get more and more sequence data, the challenge of looking across many samples only gets more and more important. So I always like to see how different groups are doing this. I’m still waiting for the killer app on this–the pan-genome graphs with all the paths along different genomes/regions displayed. But there are many good examples out there now for seeing variations in different species, strains, or different individuals.

This week’s Video Tip of the Week looks at the yeast variation viewer from SGD. It shares some features with other visualization tools (lollipops are hip, lately). But it has a very quick way of switching back and forth from DNA to protein that isn’t always available on variation viewers that I’ve tried before.

So this week’s video tip shows you their quick tour of their recently added variant viewer tool.

I won’t go into any more detail–they have a whole paper in the NAR database issue 2016 that describes the development and the features (below). Go over and try it out, it’s speedy and easy to use.

Quick link:

SGD: http://www.yeastgenome.org/

References:

Cherry, J., Hong, E., Amundsen, C., Balakrishnan, R., Binkley, G., Chan, E., Christie, K., Costanzo, M., Dwight, S., Engel, S., Fisk, D., Hirschman, J., Hitz, B., Karra, K., Krieger, C., Miyasato, S., Nash, R., Park, J., Skrzypek, M., Simison, M., Weng, S., & Wong, E. (2011). Saccharomyces Genome Database: the genomics resource of budding yeast Nucleic Acids Research, 40 (D1) DOI: 10.1093/nar/gkr1029

Sheppard, T., Hitz, B., Engel, S., Song, G., Balakrishnan, R., Binkley, G., Costanzo, M., Dalusag, K., Demeter, J., Hellerstedt, S., Karra, K., Nash, R., Paskov, K., Skrzypek, M., Weng, S., Wong, E., & Cherry, J. (2016). The Saccharomyces Genome Database Variant Viewer. Nucleic Acids Research, 44 (D1) DOI: 10.1093/nar/gkv1250

super_curator

Video Tip of the Week: Complex Portal with Super Curators

From the same team that brings us the IntAct database for protein interaction data, the Complex Portal offers insights for additional levels of protein interactions that are organized around a biological function. Although there are various resources that store individual interactions between proteins, collecting additional interactions in units of a complex are specially challenging for various reasons–both technical and computational.

The introduction part of this recent webinar from the Complex Portal team establishes definitions for PPI (protein-protein interactions) and PPIN (protein-protein interaction networks) and explores the difficulty of assessing and capturing information about which ones are part of complexes. In the Complex Portal database, the entries may consist of numerous individual studies of PPI that amalgamate into one set. About 6min they define complexes as “A stable set of (2 or more) interacting protein molecules which can be co-purified and have been shown to exist as a function unit in vivo.” Small molecules or nucleic acids might also be included if they are integral as well. Their complex calls are based on experimental evidence or they may be curator inferred.

super_curatorThey have a lot of source databases and curators from different groups, besides the IntAct data–most of them are familiar names such as MINT, SGD, Reactome, UniProt, etc. And most of their complex entries end up being the result of many papers. At ~5min, they describe their curation process. Their diagram includes my new favorite hero, “Super curator“. (We have always been huge fans of curators–the most undervalued resource in bioinformatics. Well, maybe trainers are too. Sometimes they are the same people.)

The species of focus are human, mouse, yeast, and E. coli, but they may have data on others as well. In the video they also talk about future plans if you want to know about other things they are doing. And they want to hear from you if there are complexes you think they should look at.

Additional training materials, and their paper (below), can help you to understand how to get the most out of their resources. I found the demo piece at the end helpful to locate specific examples of the features that were covered quickly in the slides, so be sure to hang on to the end of this video for that.

Quick links:

Complex Portal: http://www.ebi.ac.uk/intact/complex/

IntAct: http://www.ebi.ac.uk/intact/

Reference: 

Meldal, B., Forner-Martinez, O., Costanzo, M., Dana, J., Demeter, J., Dumousseau, M., Dwight, S., Gaulton, A., Licata, L., Melidoni, A., Ricard-Blum, S., Roechert, B., Skyzypek, M., Tiwari, M., Velankar, S., Wong, E., Hermjakob, H., & Orchard, S. (2014). The complex portal – an encyclopaedia of macromolecular complexes Nucleic Acids Research, 43 (D1) DOI: 10.1093/nar/gku975

Attribution: Iamlilbub, via Wikipedia.

Video Tip of the Week: LilBUBome sequencing project

Attribution: Iamlilbub, via Wikipedia.

Attribution: Iamlilbub, via Wikipedia.

Ok, the phrase “Lil Bub is an American celebrity cat….” is not the way I start a lot of blog posts. I enjoy cats on the internet as much as anyone–but their relevance to science is not one of the reasons, usually. But the Lil Bub genome project changes that. A popular, crowd-funded, genome sequencing project of a beloved “celebrity cat” might be a nice gateway for people who are new to the technology.

As a supporter of the initial funding effort, I get the update emails when they have something new to share. I was very pleased to get the new video, where one of the project team members talks about the sequencing and the FASTQ files. Uschi Symmons gives a bit of backstory on the sequencing, shows the files they have, and talks about the quality scores. And I like that the face of this technical information is a woman scientist, too. As a young girl interested in science, I would have noticed that sort of aspect.

They offer some additional detail and links on their blog if people have questions. Reading Bub’s DNA Sequence: The Raw Data.

It seems to me that this effort is a nice and gentle way to get people used to the ideas of genome sequencing. This could be kids and families facing a sequencing project with the hope of a diagnosis, or it could be youngsters in science classes who could be engaged with the ideas and the technology is out-of-the-textbook ways. It’s already been popular with the press, not just at the techie/nerd media, but more widely as well: Lil Bub, famous Internet cat, is getting her genome sequenced

There is evidence that watching cat videos has health benefits. I think that watching cat genome sequencing may have educational benefits as well. Follow along, and spread the word about this project as it develops. Simply becoming more familiar with the process of the science here, even if the outcome is unexpected, would still have teachable moments. It may catch the eye of some youngster in ways that typical classroom material won’t. And I’m interested in what the science tells us as well. I’ll be watching for more.

Quick links: 

Lil BUB: http://lilbub.com/

LilBUBome blog: https://lilbubome.wordpress.com/

LilBubome YouTube channel: https://www.youtube.com/channel/UCYzr7zLk-LhRGrCM92Nj1Gg

Project twitter account: https://twitter.com/LilBUBome

Reference:
Myrick, J. (2015). Emotion regulation, procrastination, and watching cat videos online: Who watches Internet cats, why, and to what effect? Computers in Human Behavior, 52, 168-176 DOI: 10.1016/j.chb.2015.06.001

DEE_poster

Video Tip of the Week: Digital Expression Explorer for RNA-seq

DEE_posterComing over my digital transom in a couple of different ways recently, the Digital Expression Explorer caught my attention. I have a soft spot for expression. It’s the business end of what’s going on, not just the archives of genomic info stored up in the nucleus, you know? Heh.

Anyway–I saw the poster via email from ResearchGate (yeah, I know, but a couple of interesting things have come to me from there). It was authored by someone that I’ve been on a paper with, so it notified me. And the volume of expression data and challenges for users definitely hits on a problem that I think about a lot. There’s such great detail underneath the surface in so many of these repositories we have, but few people have the tools to mine them. They specifically note the GEO and SRA in this case. Digital Expression Explorer (DEE) sits on top of them and provides a new way in, after standardizing the data. I think that’s worthwhile.

There’s no need for me to provide a more details here–the lead author Mark Ziemann has done a great job in this blog post: Introducing “Digital Expression Explorer”. He provides the background to the problem of re-using the data, and what they’ve done to solve it. And you can see the poster online. But better still, check out this quick intro video from Mark that is this week’s Video Tip of the Week:

So you can see how to pull out some data and begin to explore it with DEE. Then there’s a piece that illustrates taking this newly mined set and evaluating it with Degust. This quickly shows what you can get and what you can do next. It will give you an idea of the benefits (and the speed).

And, may I say, nice job on outreach! Poster, tweets, blog, and video. Many ways to reach potential users. Moar like this, plz, for all you folks with software tools.

Try it out, and see if it saves you time and delivers you more than you might have been able to mine out of the repositories so quickly in the past.

Quick link:

Digital Expression Explorer: http://dee.bakeridi.edu.au/

Reference:

Ziemann M., Kaspi A., Lazarus R., & El-Osta A. (2015). Digital Expression Explorer: A user-friendly repository of uniformly processed RNA-seq data. ComBio2015 : 10.13140/RG.2.1.1707.5926

unirule_sample

Video Tip of the Week: UniProt updates, now including portable BED files

UniProt is one of the core resources that provides tremendously important curated information about proteins. You will find links to UniProt in lots of other tools and databases as well, but we’ve always championed going directly there for the full look at all the wide range of information they offer. Their foundation remains solid, but they also continue to add new and useful features over time. Recently they had a webinar to describe some of the new things, and the recording of that webinar will be this week’s Video Tip fo the Week.

The video starts with an overview of the whole UniProt site. The core of their great resource is the same, of course. UniProtKB, UniRef, and UniParc are there for various ways to look across the data. The handy Proteomes collection of the proteins in a given species is available, and they also have reference proteomes from that access point. There’s a short section in the video that’s a guide to the basic search functions.

About 9 minutes in they introduce the UniRule annotation features. When certain conditions are met, an annotation gets applied to a protein–which you can trace from the protein pages by clicking on the UniRule link for that annotation. unirule_sampleAnd their software offers a very cool way to look and see how/when conditions are applied. It will load a decision flow path and highlights what the logic rules were used in that particular case, so you can trace it and understand how a protein got a given item. That’s what I illustrate in the screen shot here.

About 14 min, the topic changed to the new Genome Annotation Tracks. They now offer you a way to take their annotations for a UniProtKB entry and use them with a separate genome browser. They hand you BED or BigBed files for different features. You can also load the whole thing as a Hub file to see all the sequence feature data at once. They are species-specific, and started with human, but others are coming. You can access them from the “Downloads” area of the homepage. The video also described a bit about the structure there as well. So you could take these files to ENSEMBL or UCSC Genome Browser and load them, with all the UniProt features now to compare to the existing genomic context at those browsers. They illustrate how you can look at the “active site” annotations, but you can also look at post-translation modification sites, domains, etc. This was a feature that was new to me, and looks like a terrific idea.

So even if you think you know UniProt, check out these new options for additional ways to interact with the high-quality information they provide. Good stuff.

Quick links:

UniProt: http://www.uniprot.org/

Reference:

The UniProt Consortium (2014). UniProt: a hub for protein information Nucleic Acids Research, 43 (D1) DOI: 10.1093/nar/gku989

Thanks for joining us

Video Tips of the Week, Annual Review 2015. Part II

As you may know, we’ve been doing these video tips-of-the-week for eight years now. We have completed or collected around 400 little tidbit introductions to various resources through this past year, 2015. At first we had to do all of our own video intros, but as the movie technology became more accessible and more teams made their own, we were able to find a lot more that were done by the resource providers themselves. So we began to collect those as well. At the end of the year we’ve established a sort of holiday tradition: we are doing a summary post to collect them all. If you have missed any of them it’s a great way to have a quick look at what might be useful to your work.

You can see past years’ tips here: 2008 I, 2008 II, 2009 I, 2009 II, 2010 I, 2010 II, 2011 I, 2011 II, 2012 I, 2012 II, 2013 I, 2013 II, 2014 I, 2014 II, 2015 I.

July
July 1: MorphoGraphX, morphogenesis in 4D
July 8: PhenomeCentral
July 15: Introduction to the UCSC Genome Browser
July 22: PathWhiz for graphical appeal and computational readability
July 29: PathWhiz for Pathways, Part II

August
August 5: Araport, Arabidopsis Portal
August 12: World Tour of Genomics Resources, part II
August 19: gene.iobio for genome and variation browsing
August 26: Human Metabolome Database, HMDB

September
September 2: ENCODE Data Coordination Center, phase 3
September 9: UCSC features for ENCODE data utilization
September 16: BANDAGE for visualization of de novo assembly graphs
September 23: UCSC Xena System for functional and cancer genomics
September 30: Global Biotic Interactions database, GloBI

October
October 7: Weave, Web-based Analysis and Visualization Environment
October 14: 100,000 Genomes Project
October 21: PanelApp, from the 100000 Genomes Project
October 28: New Reactome Pathway Portal 3.0

November
November 4: RNACentral, wrangling non-coding RNA for simplifying access
November 11: UCSC Table Browser and Custom Tracks
November 18: Explore Gene Pages at NCBI with Variation and Expression Information
November 25: iDigBio for access to historical specimens and more

December
December 2: KBase, DOE’s Systems Biology Knowledgebase
December 9: Send UCSC Genome Browser sequence to external tools
December 16: Plant Reactome at Gramene
December 23: Video Tips of the Week, Annual Review 2015 (part 1)
December 30: [this post]

Thanks for joining us

Video Tips of the Week, Annual Review 2015. Part 1

As you may know, we’ve been doing these video tips-of-the-week for eight years now. We have completed or collected around 400 little tidbit introductions to various resources through this past year, 2015. At first we had to do all of our own video intros, but as the movie technology became more accessible and more teams made their own, we were able to find a lot more that were done by the resource providers themselves. So we began to collect those as well. At the end of the year we’ve established a sort of holiday tradition: we are doing a summary post to collect them all. If you have missed any of them it’s a great way to have a quick look at what might be useful to your work.

You can see past years’ tips here: 2008 I, 2008 II, 2009 I, 2009 II, 2010 I, 2010 II, 2011 I, 2011 II, 2012 I, 2012 II, 2013 I, 2013 II, 2014 I, 2014 II.

January 2015:
January 7: PhosphoSitePlus, protein post-translational modifications
January 14: Genome assemblers and #Docker
January 21: GWATCH, for flying over chromosomes
January 28: Helium plant pedigree software, because “Plants are weird.”

February 2015:
February 4: COSMIC, Catalogue Of Somatic Mutations In Cancer
February 11: IntOGen, for Integrative OncoGenomics
February 18: RStudio as an interface for using R
February 25: CRISPRdirect for editing tools and off-target information

March 2015:
March 4: Beacon, to locate genome variants of potential clinical significance
March 11: Aquaria, streamlined access to protein structures for biologists
March 18: Designing proteins, using Rosetta
March 25: Protein structure information for public outreach. Really.

April 2015:
April 1: The New OpenHelix Interface
April 8: Jalview for multiple sequence alignment editing and visualization
April 15: Viewing Amino Acid info in the UCSC Genome Browser
April 22: TargetMine, Data Warehouse for Drug Discovery
April 29: Proband for pedigrees with your iPad

May 2015:
May 6: Human Phenotype Ontology, HPO
May 13: PhenogramViz for evaluating phenotypes and CNVs
May 20: NCBI Tree Viewer
May 27: PANDA (Pathway AND Annotation) Explorer for lists of genes

June 2015:
June 3: ClinGen, The Clinical Genome Resource
June 10: GenomeConnect, the ClinGen piece for patients
June 17: ZBrowse for GWAS viewing and exploration
June 24: Handy way to make citations quickly

More next week…. 

gramene_logo

Video Tip of the Week: Plant Reactome at Gramene

gramene_logoReactome is one of our favorite tools to explore pathways. And they cover a wide range of species, which is helpful for many researchers. But there are times when having a topic-specific tool can be even more useful for a research community. Sometimes these types of tools are disease-specific, or in the case of today’s Video Tip of the Week, plant-centric. If you haven’t seen it before, this introduction to Plant Reactome is worth exploring.

In this overview of Plant Reactome, Justin Preece describes the data model to help understand the foundation of Reactome. And then quickly moves into how you can access this tool from the Gramene portal. There’s guidance on the organization of the interface, and how to access pathways you are interested in. Then there’s an explanation of their process of projecting pathways from well-characterized rice data on to other species that might be less well characterized. They have over 200 curated rice pathways and ongoing curation as well. There are dozens of Arabidopsis curated pathways too. There’s a bit about the move to the new Reactome platform, and analysis tools available with that. There’s an example of time course data shown to illustrate a series of data. The talk touches a little bit on the new main Reactome interface that we also highlighted.

Try it out.

Quick links:

Plant Reactome directly: http://plantreactome.gramene.org/

Gramene main site: http://www.gramene.org/

Reactome: http://www.reactome.org/

References:

Tello-Ruiz, M., Stein, J., Wei, S., Preece, J., Olson, A., Naithani, S., Amarasinghe, V., Dharmawardhana, P., Jiao, Y., Mulvaney, J., Kumari, S., Chougule, K., Elser, J., Wang, B., Thomason, J., Bolser, D., Kerhornou, A., Walts, B., Fonseca, N., Huerta, L., Keays, M., Tang, Y., Parkinson, H., Fabregat, A., McKay, S., Weiser, J., D’Eustachio, P., Stein, L., Petryszak, R., Kersey, P., Jaiswal, P., & Ware, D. (2015). Gramene 2016: comparative plant genomics and pathway resources Nucleic Acids Research DOI: 10.1093/nar/gkv1179

Fabregat, A., Sidiropoulos, K., Garapati, P., Gillespie, M., Hausmann, K., Haw, R., Jassal, B., Jupe, S., Korninger, F., McKay, S., Matthews, L., May, B., Milacic, M., Rothfels, K., Shamovsky, V., Webber, M., Weiser, J., Williams, M., Wu, G., Stein, L., Hermjakob, H., & D’Eustachio, P. (2015). The Reactome pathway Knowledgebase Nucleic Acids Research DOI: 10.1093/nar/gkv1351

View External Tools

Video Tip of the Week: Send UCSC Genome Browser sequence to external tools

The folks at the UCSC Genome Browser are always adding new features, new data, and new genomes to their site. And although they use the genome-announce mailing list to get the word out, even I can miss a notice. There was news recently of a new feature associated with the graphical genome browser that I’ve been waiting for (as I had tested it prior to roll-out), but they sent the email out when I was prepping for Thanksgiving, and I didn’t catch it for a few days.

But now I’ve had a chance to see it, kick the tires more, and I want to show you how it works. The basic feature is this: you can take the genomic sequence from the window you are viewing and with a couple of clicks deliver it right to another tool for more helpful data or analysis. This is really handy from the graphical viewer. Before you were limited to obtaining the sequence with the “Get DNA” option. You could get the sequence, copy/paste, and do whatever. Which is fine, and might still be right for some other tools. But now you can skip that copy/paste part and explore features of your sequence directly. If you don’t have time for the video, here’s the bullet–use this View–>In External Tools menu to get there.

View External Tools

There are a range of options. You can still jump right to NCBI Map Viewer or Ensembl to look at the same region in their browsers. But now you can select primers, see restriction sites, look at RNA characteristics, find protein domains, or examine CRISPR tools. You can look for transcription start sites or some TF binding motifs. Here’s this week’s Video Tip that shows the process.

You can still “get DNA” if you want to. And you can still use the Table Browser to send data from your custom queries to even more tools (like Galaxy and others). But we know from our workshops that most people we train spend their time in the graphic browser and these tools should help them accomplish more. You can see all of these options in our freely available training suites (linked below).

To see if there are other things you missed over the past year, check out the new NAR database issue paper from the UCSC team (linked below). It has updates about many of their accomplishments and recent features, so have a look and see if there are other useful aspects for assisting your research.

Quick links:

UCSC Genome Browser: http://genome.ucsc.edu/

Intro full training suite: http://openhelix.com/ucsc

Advanced topics training suite: http://openhelix.com/ucscadv

Reference:

Speir, M., Zweig, A., Rosenbloom, K., Raney, B., Paten, B., Nejad, P., Lee, B., Learned, K., Karolchik, D., Hinrichs, A., Heitner, S., Harte, R., Haeussler, M., Guruvadoo, L., Fujita, P., Eisenhart, C., Diekhans, M., Clawson, H., Casper, J., Barber, G., Haussler, D., Kuhn, R., & Kent, W. (2015). The UCSC Genome Browser database: 2016 update Nucleic Acids Research DOI: 10.1093/nar/gkv1275

Disclosure: UCSC Genome Browser tutorials are freely available because UCSC sponsors us to do training and outreach on the UCSC Genome Browser.