Recently when I was adding videos to our SciVee collection, I noticed that there was a set of new videos about BindingDB. This database has been around for a long time, and I was surprised to realize that we hadn’t covered it yet. And it certainly only grows more important to understand proteins and their binding partners–whether they are other proteins or chemical compounds that can be important effectors of health and disease.
For a decade now this database has been curated and maintained to provide access to information from publications that is often not easily accessible. As their homepage says today:
BindingDB contains 832,773 binding data, for 5,765 protein targets and 362,123 small molecules.
That’s a lot of information available to you to investigate that they have collected. You can start with a protein of interest, or a compound, or a paper, and find related information from those points. There are various other tools and entry points as well.
In addition, it is integrated with many other key resources, including PDB and UniProt, MMDB and KEGG, and more. ChEMBL links offer handy links to compounds.
You can see from their “News” that they are actively maintaining this site, and are developing new tools to offer users ways to interact with the data. But the newest feature seems to be their videos–I’ll let them show you more about how to use their site.
They offer several other quick tips on ways to interact–starting with an article and obtaining the data, and more. You can access them from the end of the video in the “Related” links, or explore their SciVee set. They are also found on the homepage of BindingDB right now. So check them out if you need protein binding data. They may have what you seek.
Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…
Wow: the little skate genome. 49 chromosomes, 59x coverage. Still tough to assemble: RT @Biomedical101: Assembling the Little Skate Genome: This past week I was visiting the University of Delaware to attend the 3rd S… http://bit.ly/kxC8BK [Mary]
RT @biogrid: BioGRID version 3.1.77 released with 2,890 new physical and genetic interactions. http://bit.ly/lsTeFY #bioinformatics #biology #biogrid [Mary]
Initiative to get scientist into schools to help teachers and improve science education, explained here: Those who can [Jennifer]
Does look cool: RT @hurstej: Awesome! I love LigerCat – this is a cool resource for genomic literature visualized from Medline http://bit.ly/flI6nF #bmispring2011 [Mary]
BioStar is a site for asking, answering and discussing bioinformatics question
s. We are members of the community and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those questions and answers here in this thread. You can ask questions in this thread, or you can always join in at BioStar.
Protein interactions: If you are interested primary in protein interactions, look first at the IMEx consortium; all of the databases that you mention are a part of it. Their interactions are available through PSICQUIC web services, described here: http://code.google.com/p/psicquic/
Some websites combine data from multiple molecular interaction databases, e.g., Pathway Commonsand IRefWeb. I know you can download a combined dataset of interactions from Pathway Commons but I haven’t tried doing this with IRefWeb.
Genetic interactions: I think that BioGRID is the only major database that currently curates genetic interactions.
For this week’s tip of the week I’m going to introduce iRefWeb, a resource that provides thousands of data points on protein-protein interactions. If you follow this blog regularly, you may remember that we had a guest post from the iRefWeb team not too long ago. It was a nice overview of many of the important aspects of this tool, and I won’t go into those again here–you should check that out. Andrei knows those details quite well!
And at the time we also mentioned their webinar was coming up. We were unable to attend that, though, because we were doing workshops at The Stowers Institute. I was delighted to find that their webcast is now available to watch in full. It’s about 40 minutes long and covers much more than my 5-minute appetizer could do. It details many practical aspects of how to use iRefWeb effectively.
Because they’ve done all the prep work for me, I don’t need to spend much time on the structural and functional features here. What I would like to do is draw your attention to a different aspect of their work. Their project draws together protein interaction data from a variety of source databases–including some of our favorites such as MINT and IntAct (for which we have training suites available for purchase). They then used the iRefWeb processes and projects to evaluate and consider the issues around curation of protein-protein interaction data, and recently published those results. That’s what I’ll be focusing on in the post.
Every so often a database flame-war erupts in the bioinformatics community. Generally it involves someone writing a review of databases and/or their content. These evaluations are sometimes critical, sometimes not–but often what happens is that the database providers feel that their site is either mis-represented, or unfairly chastised, or at a minimum incompletely detailed on their mission and methods. I remember one flambédeveloped not too long ago around a paper by our old friend from our Proteome days–Mike Cusick–and his colleagues (and we talked about that here). As the OpenHelix team has been involved in plenty of software and curation teams, we know how these play out. And we have sympathy for both the authors and the database providers in these situations.
Anyway–it turns out not to need protective gear, in my opinion. Because their project brings data from several interaction database sources, they are well-positioned to collect information about the data to compare the data sets. They clearly explain their stringent criteria, and then look at the data from different papers as it is collected across different databases.
A key point is this:
On average, two databases curating the same publication agree on 42% of their interactions. The discrepancies between the sets of proteins annotated from the same publication are typically less pronounced, with the average agreement of 62%, but the overall trend is similar.
So although there is overlap, different database have different data stored. This won’t be a surprise to most of us in bioinformatics. But I think it is something that end users need to understand. The iRefWeb team acknowledges that there are many sources of difference among data curation teams. Some curate only certain species. Some include all data from high-throughput studies, others take only high-confidence subsets of that data. And it’s fine for different teams to slice the data how they want. Users just need to be aware of this.
It seems that in general there’s more agreement between curators on non-vertebrate model organism data sets than there is for vertebrates. Isoform complexity is a major problem among the hairy organisms, it turns out–and this affects how the iRefWeb team scored the data sets. And as always when curation is evaluated–the authors of papers are sometimes found to be at fault for providing some vagueness to their data sets.
The iRefWeb tools offer you a way to assess what’s available from a given paper in a straightforward manner. In their webinar, you can hear them describe that ~30 minutes in. If you use protein-protein interaction data, you should check that out.
Caveat emptor for protein-protein interaction data (well, and all data in databases, really). But iRefWeb provides an indication of what is available and what the sources are–all of it traceable to the original papers.
The paper is a nice awareness of the issues, not specific criticism of any of the sources. They note the importance of the curation standards encouraged by the Proteomics Standards Initiative–Molecular Interaction (PSI-MI) ontologies and efforts. And they use their paper to raise awareness of where there may be dragons. It seems that dragons are quite an issue for multi-protein complex data.
Your mileage may vary. If you are a data provider, you may want to have protective gear for this paper. But as someone not connected directly to any of the projects, I thought it was reasonable. And something to keep in mind as a user of data–especially as more “big data” proteomics projects start rolling out more and more data.
Turinsky, A., Razick, S., Turner, B., Donaldson, I., & Wodak, S. (2010). Literature curation of protein interactions: measuring agreement across major public databases Database, 2010 DOI: 10.1093/database/baq026
Cusick, M., Yu, H., Smolyar, A., Venkatesan, K., Carvunis, A., Simonis, N., Rual, J., Borick, H., Braun, P., Dreze, M., Vandenhaute, J., Galli, M., Yazaki, J., Hill, D., Ecker, J., Roth, F., & Vidal, M. (2009). Literature-curated protein interaction datasets Nature Methods, 6 (1), 39-46 DOI: 10.1038/nmeth.1284
Our “What’s Your Problem” post will be transitioning to a “What’s the Answer” post this week and going forward. BioStar is a site for asking, answering and discussing bioinformatics questions. We are members of the community and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every week we will be highlighting one of those questions and answers here in this thread. You can still ask questions in this thread, or you can always join in at BioStar.
BioStar Question of the Week:
…What is a good ontology for experimental results … If i want to publish experimental results, preferably via RDFa using a standardized ontology what would be a good source to use. I am thinking of a triple such as:
Protein X — Interacts with — Protein Y
Where the ontology would spell out “Interacts with”.
I would recommend formatting your data using the IMEx (International Molecular Exchange Consortium)curation guidelines. This will allow you to submit your data easily to any of the participant databases (DIP, MINT, INTACT, etc). IMEx uses The PSI (Proteomics Standards Initiative) Molecular Interactionscontrolled vocabulary. There is a PSI-MI XML/CV validator here.
Check out the other answers, or provide one if you have insights into the problem.
We’ve long been fans of the tools developed by the team responsible for MINT: Molecular INTeraction database. MINT is a curated resource full of experimentally verified protein-protein interactions, with some great visualization options. In addition to the main MINT interface, there are other aspects to the site that bring other types of visualization as well. We have done a tip on MINT in the past, but we wanted to re-visit this for our SciVee collection, and also mention a handy tool called Connect. Connect can be used to enter a list of up to ~100 proteins and generate the connection map between them.
HomoMINT: this tool extends the experimentally-verified interaction collection to include inferred interactions for human, based on data from model organisms. So this is homologous interactions, hence the name….
Domino: a look at the domains that are involved in the protein-protein interactions.
VirusMINT: this aspect of MINT explores the viral proteins that includes how the virus proteins interact with host proteins to disrupt host physiology.
For this week’s tip I’ll focus mainly on the experimentally-verified portion of MINT and that interface, and introduce the others. You’ll see how to do a quick search, explore protein details, and then load up the network in the visualization tool. We have a full tutorial on MINT available for subscribers for people who want to go deeper into the functionality–we can only barely touch on the features in our screencast movie limit.
The MINT or Molecular Interaction database for examination of protein interaction networks has long been a favorite tool of mine. The regular “flavor” of MINT includes over 100,000 interactions with a focus on experimentally verified protein interaction data. But recently I became aware of the VirusMINT data that is now available as well.
The VirusMINT paper describes the initial emphasis on medically relevant viruses for their curation efforts, and how the work differs from efforts like this PLoS Pathogens paper and the individual virus sites like NCBI’s HIV Interactions collection and the PIG (Pathogen Interaction Gateway) site.
Manual curation of data is labor-intensive, but I really appreciate the quality of that data. Some of the data they curated themselves, and some was downloaded from existing curated sites. Once at the site for VirusMINT, it is really simple to load up a virus network by simply clicking on a virus button, and then the proteins load and generate a network interaction group. The proteins are clickable and you can find out more about the proteins and their sources, and domain information if that is available. You can also click on the numbers between the interactions to find out which paper provided the interaction data and link quickly to PubMed from there. And not only can you interact with the data using the MINT software framework, but you can download the data and use it in other tools as well.
This brief Tip-of-the-Week introduces a few of the basic features of VirusMINT. We have additional details about how to interact with the software in our full MINT tutorial.
Chatr-aryamontri, A., Ceol, A., Peluso, D., Nardozza, A., Panni, S., Sacco, F., Tinti, M., Smolyar, A., Castagnoli, L., Vidal, M., Cusick, M., & Cesareni, G. (2009). VirusMINT: a viral protein interaction database Nucleic Acids Research, 37 (Database) DOI: 10.1093/nar/gkn739
Catching up on some reading I came across a new database topic this week–MatrixDB. The goal of MatrixDB is to capture information about interactions involving extracellular molecules. I have always been a fan of extracellular matrix, but thought it was a pretty murky topic. I like the idea of a database devoted to this type of data.
There are entries for a given gene with details, of course. But there are also options to launch Cytoscape for building your own interactomes with the data.
We are in the process of developing a suite of tutorials on interactions software, so we’ll have a lot more to say about that over the next few months. But today I wanted to touch on an interactions tool embedded in a model organism browser. When we were developing the WormBase tutorial we found this fun tool–N-Browse–linked right from the gene pages. (N-Browse is also a stand-alone tool, but here I’ll just show the WormBase version). In this 3 minute movie I introduce you to where to find N-Browse in WormBase and show you how you can learn about interactions with your gene of interest right from the gene pages.