What’s the Answer? (Gene ID conversion)

BioStar is a site for asking, answering and discussing bioinformatics questions. We are members of thecommunity and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those questions and answers here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

This week’s highlighted question:

What is a good “gene ID conversion tool

This is an older question, from 2 years ago, but still relevant and the answers still quite helpful and full of resources such as DAVID, BioDBnet, BioMart and others.

Check it out. Also, might want to check out the third exercise of our UCSC Advanced Tutorial .  The exercise:

“From a list of UCSC genes, add gene symbols and GO IDs for additional information about the gene set. Bonus step: add GO terms.”

Walks through how you might be able to do this with the UCSC Table Browser with some simple modifications.

Tip of the Week: Prioritizing Genes

Many types of experiments today return large lists of genes, association studies, expression arrays, linkage analysis and more. The researcher needs to determine which of those genes are of most interest and promising so the next step in the analysis is to prioritize the list and find the method to do so.

There are a lot of methods and tools to prioritize a list of genes and getting a handle on which tool to use can be a bit of a daunting task. The Gene Prioritization Portal is an excellent resource to find the right tool. It’s a bit more than just a database of databases or tools. it’s a regularly updated list with detailed information about the tools (there are 25 at the moment), stats about what the data sources of the tools are, the outputs and references. There is also a nice search tool to find the tool that most fits your needs.

Today’s tip will introduce the site and perform a quick search. Future tips might be highlighting some of these tools.

Gene ID converters compared

From my HUM-MOLGEN mailing list newsletter today I spotted an interesting comparison.  We get a lot of questions about how to convert IDs or how to best move from one data source to another.  We’ve done some explorations of that in the past (MatchMiner is one example).   This is not the sort of sexy thing that gets published in the literature in general, but a really nice thing for the informal literature system of the newsletter/blogosphere/etc world.

Diego Forero, an editor of HUM-MOLGEN, has assembled a comparison of several tools: Babelomics, Clone/ID converter, DAVID, g:Profiler, MatchMiner.  He started with a list of 100 Ensembl IDs and tested them on each of the tools to get the HUGO official nomenclature.  (He does note that there are plenty of other conversions also possible, Ensembl, HGNC, EntrezGene, RefSeq, UniGene, but Ensembl–>HGNC was the test performed). There was a second test on Affymetrix IDs to HUGO symbols too.  The references for the tools are also provided.

The data is available on Scribd and you can download it yourself.  You can access the IDs and test other tools too.  Here is a sample of the outcome:


In this experiment Babelomics did the best in this test.  Now–I have a separate question: are they right?  Just because a program provides an ID doesn’t mean it gave the right one.  This is a problem I’ve seen over and over in this field.  In my experience most stuff needs to be checked by humans.  I remember one meeting I was in and someone was describing this new tool that represented splice variants.  We were all impressed, it sounded great, and then I raised my hand to ask: “But are they right?”  and the tool developer said, “I don’t know.”

Still, it is a useful exercise to compare these tools.  And it is a great list to bookmark.  But keep that in mind.

Forero’s ID converter tool comparison direct link: http://hum-molgen.org/NewsGen/08-2009/000020.html

Tip of the Week: list of genes and DAVID

david_tip.jpgThis week I’m returning to the exercise wherein I look at tools that analyze lists of genes. As before, I’m taking that list of genes I created some time ago. It was generated as a list of “disease” genes from UniProt. Today I’m taking that list to another resource: DAVID. DAVID is an unfortunate name to Google for this, but it stands for: Database for Annotation, Visualization and Integrated Discovery.

I have to say I was really impressed with the speed, ease, and results of this effort. It uploaded easily, automatically detected the species options, was quick to set for human as the focus, it offered 3 handy viewing option buttons really quickly, and provided informative output that would be really useful in further exploring my list. I had only chosen one of the possible options with default settings. There’s a lot more you can do with DAVID and we cover more of that in our full tutorial. But this quick start movie shows you something of the process and the outcome.

The citation for DAVID is:

DAVID: Database for Annotation, Visualization, and Integrated Discovery

