What’s the Answer? (what is a canonical transcript in UCSC?)

What is the rationale behind UCSC canonical transcripts?

Several good answers. The short of it is “The canonical transcript is defined as either the longest CDS, if the gene has translated transcripts, or the longest cDNA.” from the UCSC mailing list.

If you want to retrieve the canonical transcripts, you can check out our tip from a little over a year ago, still valid :D.

Friday SNPpets

Tip of the Week: list of genes and DAVID

david_tip.jpgThis week I’m returning to the exercise wherein I look at tools that analyze lists of genes. As before, I’m taking that list of genes I created some time ago. It was generated as a list of “disease” genes from UniProt. Today I’m taking that list to another resource: DAVID. DAVID is an unfortunate name to Google for this, but it stands for: Database for Annotation, Visualization and Integrated Discovery.

I have to say I was really impressed with the speed, ease, and results of this effort. It uploaded easily, automatically detected the species options, was quick to set for human as the focus, it offered 3 handy viewing option buttons really quickly, and provided informative output that would be really useful in further exploring my list. I had only chosen one of the possible options with default settings. There’s a lot more you can do with DAVID and we cover more of that in our full tutorial. But this quick start movie shows you something of the process and the outcome.

DAVID: Database for Annotation, Visualization, and Integrated Discovery

Reviewed by Glynn Dennis, Jr, Brad T Sherman, Douglas A Hosack, Jun Yang, Wei Gao, H Clifford Lane,2 and Richard A Lempicki. Genome Biol. 2003; 4(9): R60.


Tip of the Week: GO EAST for GO terms

goeast_tip.jpgWell, actually, GOEAST young scientist. This week’s tip of the week builds on the data from my previous tip. I had generated a list of genes and I wanted to use that list at a variety of sites to analyze the features of my list. So this week I have tried that at GOEAST.

I used the first 1999 items in my UniProt disease list and uploaded them to GOEAST. The movie shows you the process and a quick look at the outcome.

This view is just a quick example of a basic list upload using their Batch tool. The Batch tool algorithm is somewhat different from the pre-loaded microarray gene analysis they say, because of the way the background is calculated. It outputs the GO terms and groups the genes that fit that GO term. There are a number of other features of GOEAST that look intriguing and helpful. It looks like they handle various microarray platforms easily. They have a variety of outputs (web, text file, graphical). I couldn’t get the graphical output to work sometimes (probably my list is too large). I also would have liked to do the list the other way–to have the list of my genes and have the terms associated with them. I haven’t figured out if there is a way to do that so far.

I’m not going to draw many conclusions from this yet–I want to try a variety of tools and think about the features and the quality of the results. But this tool seemed to effectively group my genes into buckets with GO terms that could be helpful for an analysis.

Zheng Q, Wang XJ. Nucleic Acids Res. 2008 May 16. GOEAST: a web-based software toolkit for Gene Ontology enrichment analysis. PMID: 18487275

Tip of the Week: List of disease genes

disease_tip.jpgOne of the most common questions we get when we are out doing software training is: what do I do with a list of genes? People generate lists from all sorts of biomedical research forays: microarray results, database searches, literature searches, library screens, etc. The source doesn’t matter much–in the end people have this list that they need to analyze, assess, categorize, group, filter, and manage.

We’ve been looking into some tools to accomplish this. We’ve already demonstrated a few of them already (Reactome SkyPainter, Gene Ontology Term Enrichment, MatchMiner…). But there are more that I want to explore. What I decided to do was to create a standard list that I’m going to use to explore and evaluate different tools. Today the tip is where I got this list and how I created it. I want to be able to refer back to this list in the upcoming “list” tips, and thought that if I explained that first it would help.

So today’s tip is obtaining a list of disease genes from UniProt. Now, you could just go to UniProt yourself and get this handy list. But I show you how to get there starting from the UniProt homepage, and what I did to filter this list to a set of unique gene symbols for disease genes in Excel. I end up with ~2500 unique symbols for disease genes that will be the input for upcoming tips.


Tip of the week: list of genes–>pathways

reactome_skypainter.jpgIn this week’s tip I wanted to talk about a tool that offers a handy way to visualize the items in a list of genes that you might have on pathway diagrams. Reactome offers a tool called SkyPainter that allows you to enter a list of genes which is then analyzed statistically for genes in certain pathways. But then–and here’s the cool part–you also get a diagram of the pathways with the over-represented genes painted on a map of their pathway universe. See–SkyPainter. Anyway, it is a tool I have liked for years and I’ve been thinking a lot more about lists of genes and pathway representations. So I wanted to share that with you. This ~4 minute movie shows you how to access SkyPainter at Reactome and get started using it. Have fun!

Tip of the Week: Lists of genes–>Term Enrichment with GO

go_enrichment.jpgThe question we probably hear the most from researchers is…what can I do with a giant list of genes to figure out what’s going on in there? And about once every 6 months this question comes across the Gene Ontology mailing list. This is followed by a flurry of developers who offer their cool tools for analysis purposes. There are actually quite a few different tools with different strategies out there–and they are designed for different purposes, and in this tip I’m going to use the Gene Ontology consortium’s GO Term Enrichment tool as a primary example, but I’ll also point you to a list of other tools to try out.

Tip of the Week: Translating Gene IDs with MatchMiner

As we were creating a tutorial on the GoMiner resource from the Genomics and Bioinformatics group at the National Cancer Institute (NCI), we found another handy NCI tool named the MatchMiner.

MatchMiner translates one type of gene ID into another type – essentially the genetic equivalent of Swahili-to-German translating software. In this tip I’ll show you how to do a translation on a list of genes, or a ‘Batch Lookup’.