From my HUM-MOLGEN mailing list newsletter today I spotted an interesting comparison. We get a lot of questions about how to convert IDs or how to best move from one data source to another. We’ve done some explorations of that in the past (MatchMiner is one example). This is not the sort of sexy thing that gets published in the literature in general, but a really nice thing for the informal literature system of the newsletter/blogosphere/etc world.
Diego Forero, an editor of HUM-MOLGEN, has assembled a comparison of several tools: Babelomics, Clone/ID converter, DAVID, g:Profiler, MatchMiner. He started with a list of 100 Ensembl IDs and tested them on each of the tools to get the HUGO official nomenclature. (He does note that there are plenty of other conversions also possible, Ensembl, HGNC, EntrezGene, RefSeq, UniGene, but Ensembl–>HGNC was the test performed). There was a second test on Affymetrix IDs to HUGO symbols too. The references for the tools are also provided.
The data is available on Scribd and you can download it yourself. You can access the IDs and test other tools too. Here is a sample of the outcome:
In this experiment Babelomics did the best in this test. Now–I have a separate question: are they right? Just because a program provides an ID doesn’t mean it gave the right one. This is a problem I’ve seen over and over in this field. In my experience most stuff needs to be checked by humans. I remember one meeting I was in and someone was describing this new tool that represented splice variants. We were all impressed, it sounded great, and then I raised my hand to ask: “But are they right?” and the tool developer said, “I don’t know.”
Still, it is a useful exercise to compare these tools. And it is a great list to bookmark. But keep that in mind.
Forero’s ID converter tool comparison direct link: http://hum-molgen.org/NewsGen/08-2009/000020.html