In my continuing quest to be prepared for the upcoming ScienceOnline2011 conference I happened upon a great post by Antony Williams at ChemConnector. Apparently he has started a project with a number of other colleagues to compare and evaluate a variety of cheminformatics resources. Several people followed up with posts tweets or comments on his use of the word “trust”, but this is not the aspect of the post that I found most interesting. Instead I was interested in 1) his project to compare resources, 2) his foresight to explain his initial assumptions, 3) and his actual (all be it fairly incomplete) findings. I also liked the list of resources tested because, like many others that he cites, there were some that I hadn’t run across before. (How wonderful, new sites to play with & learn… but that will be for another post.) His description of his assumptions as to which resources were “trustworthy” sometimes conflicting with his concrete data on its “trustworthiness” run parallel to some of my own experiences. As we (at OpenHelix) create trainings on resources we often find “unexpected gems” and occasionally find “limitation surprises” at well-known & respected resources.
He offers a chart of his findings from the first 10 “test” chemicals that he searched at each site. If you look across & up & down, you will notice that there are no “perfect” (check marks only) resources or chemicals. Looking in multiple resources for the same chemical by & large results in greater likelihood of correct & “complete” coverage & there are mistakes that can be found in even the best of resources. This reminds me of an NCBI talk given by Jim Ostell for the 25th anniversary of GenBank on the process of integrating data across 3 national databases – GenBank, EMBL and DDBJ – by comparing their data each database found errors that they had not been previously aware of & which they were then able to fix.
These chart results also display something that we here at OpenHelix try to stress as we teach researchers to use public resources – every resource has its own set of benefits and limitations. By understanding your specific research needs and having a basic understanding of what several different resources in your field offer, you can be the most efficient and effective in selecting & using the best resource & thereby in accomplishing your research. This is one reason that at OpenHelix we don’t just offer training on one variation database, but on over 10. We don’t just offer materials on one Genome browser but several. We create categories of tutorials as well as overview comparisons of one type or another. There are SO many amazingly useful databases and resources being created by expert, hard-working, well-meaning curators and developers. Always using the same “go to” resource is limiting & narrow minded…
Ooops, apparently I’ve wandered onto my soapbox again. I’ll step down (for today) and just suggest that you check out Antony’s post (and presumed future posts with additional comparison data) and the resources that he mentions, especially if you have an interest in cheminformatics, chemicals or chemistry – and who doesn’t? :).