I wish I had more time to go into this paper in more detail–but I wanted to let you know that the paper is out there now. It came in my recent Nature Methods in paper version, and if I wasn’t crazy busy on a very cool project that we hope to launch this week I’d go deeper….
The paper is: Literature-curated protein interaction datasets by Cusick et al. Nature Methods 6, 39 – 46 (2009) 2008 | doi:10.1038/nmeth.1284
I knew from the abstract that it was going to cause some conflama. And I was right. Soon after an article in Bioinform addressed some of the issues. Requires a subscription, but here’s the title and the link if you do have one: Study Finding Erroneous Protein-Protein Interactions in Curated Databases Stirs Debate, by Vivien Marx.
This paper gets at a question that people ask us all the time–how do I know which database to use for X purpose? So if your question is which database to use for protein interactions, you should read this paper and consider the points they make. They don’t compare all protein interaction databases, of course–but for those they do examine (IntAct, DIP, MINT) they provide informative comparisons that you should consider for any database. What does it contain? What is it missing? They have some nice Venn diagrams to illustrate the content. The one I used here is just a representation of that, not attempting to be accurately proportional, go to the paper to see the real ones.
Our position is that you should use all of them, of course :) Project goals and funding issues, species specialties, scope…all of this impacts what will be in a database. (In fact, please go to MINT and support their funding by signing their protest of funding cuts).
One point embedded in the paper caught my attention, though. One major curation issue was that the species designation of the protein in the interactions was not clear. I know sometimes this is a problem with the original source paper. Sometimes it is a curation issue. But this worries me because of the concern I raised with Wikipedia gene entries. I made the point that there was no way to distinguish between human genes and mouse genes of the same name (MEF2/Mef2). This could be true of similar genes in other species too–where the gene might not even be the same gene, just a naming coincidence. I can see it has arisen again. But if we expect to rely on Wikification projects like Gene Wiki for more and more, I think that would need to be addressed.