Guest Post: iRefWeb — Andrei Turinsky
This next post in our continuing semi-regular Guest Post series is from Andrei Turinsky, one of the developers of iRefWeb. If you are a provider of a free, publicly available genomics tool, database or resource and would like to convey something to users on our guest post feature, please feel free to contact us at wlathe AT openhelix DOT com or the contact form (write ‘guest post’ as subject heading). We welcome introductions to your resource, information on updates, highlights of little known gems or opinion pieces on the state of genomic research and databases.
What is iRefWeb?
Protein-protein interactions (PPI) have become an important tool in biomedical research. Yet the PPI data for a specific organism tend to be distributed over a number of different databases. Comparison and integration of PPI information across databases remains a challenging task.
iRefWeb (Turner et al. (2010) Database, Vol. 2010, Article ID baq023.) is a web interface to a broad integrated landscape of protein-protein interactions (PPIs). For a given gene or protein, you can access all PPI records and protein complexes, consolidated non-redundantly from ten major public databases: BIND, BioGRID, CORUM, DIP, IntAct, HPRD, MINT, MPact, MPPI and OPHID. iRefWeb also presents various supporting evidence, helping you to gauge the reliability of an interaction. Versatile search filters allows you to retrieve the PPIs with a given level of support. Other features facilitate the analysis of possible inconsistencies across PPI data and the examination of PPI statistics. Data consolidation procedure effectively combines redundant records using the iRefIndex process (Razick et al (2008) BMC Bioinformatics 9, 405.).
Figure 1: The iRefIndex process aggregated over 916,059 original PPI records from source databases, 75% of which were redundant. The consolidation merged the redundant PPIs, reducing their number four-fold (orange). Only 232,612 PPIs were non-redundant (blue)
Figure 2: The size and the overlap between source databases. Diagonal table cells show the number of curated PPI records at each database, whereas non-diagonal cells show the overlap in interactions between different pairs of databases (according to color-coder scale as shown). Large databases typically share tens of thousands of interactions, which require consolidation. Grey cells indicate no overlaps in PPIs.
Basic features: retrieval and visualization of interactions
What interactions are known for my genes or proteins? You are often interested in a specific gene and its protein products, trying to understand what information is already available and what is novel. iRefWeb allows you to search the available interactions for any protein of interest, using its UniProt ID or UniProt accession, or one of its other aliases, or its corresponding Entrez Gene ID (or a list of IDs for batch search). The search returns a color-coded summary table, showing the protein’s interacting partners and their organisms of origin, the number of publications that support each of the interactions, the number of databases that recorded it, and whether the interaction was detected in any low-throughput or high-throughput studies.
What type of interaction is this and how was it detected? Each individual iRefWeb interaction record contains further details, such as the interaction type (e.g. “physical association” or “genetic interaction”) and the detection method used in each original experiment (e.g. “affinity chromatography technology”). These terms were provided by the original database curators and are standardized according to the PSI-MI ontology. They reveal the precise nature of the interaction, and whether it was detectable under different experimental conditions, which may help assess its reliability. For additional supporting evidence, iRefWeb includes web links to the PubMed publications from which the PPI was curated, as well as to the original curation records in the PPI databases.
Can I visualize interactions? For each protein, iRefWeb provides a graph of its interaction neighborhood, including other interactors and multi-protein complexes. The graph is web-based, interactive and customizable, and also displays disease associations (for human proteins) and protein domain composition where available. It is implemented using Cytoscape Web.
Can I download interactions? iRefWeb search results may be downloaded as a tab-delimited table in several formats, including a PSI-MITAB format.
Data filters and supporting evidence
The iRefWeb Search provides a range of filtering options, so you can retrieve only the interactions of a specific type or with specific supporting evidence. These capabilities can be readily exploited to extract organisms-specific interactomes from the consolidated data subject to specified constraints.
For example, you might wish to search for interactions in budding yeast S. cerevisiae that are physical interactions (i.e. excluding genetic interactions); are supported by at least two publications; involve no other organisms besides budding yeast (i.e. no fission yeast S. pombe); and were detected in at least one low-throughput study. The latter category may be defined as publications reporting no more than 10 interactions. These options can be further combined with the selection of PPIs detected by specific methods e.g. tandem affinity purification, affinity chromatography, etc., or with a large variety of other filter combinations.
Advanced features: annotation statistics and discrepancies
If you are interested in the origins and the consistency of the iRefWeb data, our resource provides several unique options.
Which databases contributed to the PPI landscape? The iRefWeb Statistics page presents various summaries of the data across the source databases. These include the number of organism-specific interactions, proteins, and cited publications; the number of shared vs. unique interactions, proteins and publications; the overlap statistics; and more.
Was my interaction annotated consistently across databases? Whenever two databases curate the same publication, they often record the resulting PPIs differently. iRefWeb’s PubMed Report and PubMed Details pages present a visual comparison of information extracted from each curated publication by different databases. In this way you can determine which database annotated your interaction and how.
How were the interactions from multiple databases consolidated? Different databases use different protein representations to record the same PPI. For each interaction record, iRefWeb provides full disclosure of the various protein IDs and aliases used by different curators, as well as the process by which they were mapped into a non-redundant canonical-isoform representation. Such information makes transparent the data consolidation steps used to establish the identity of the interaction partners and to harmonize them across databases.
The rich graphical and data-filtering features of iRefWeb provide a valuable snapshot of the known PPI landscape across different organisms and databases, and facilitate the generation of meaningful organism-specific interactomes. The resource is freely available at http://wodaklab.org/iRefWeb/.
On Thursday, December 9, 2010, at 2 PM EST, the Ontario Genomics Institute (OGI) and The Centre for Applied Genomics (TCAG) are hosting a one hour web conference/webinar about iRefWeb. To register go here: https://ogi.factorial.ca/Forms/fm_forms.jsp?token=HwkGSRoGZl5aSxdR
By Dr. Andrei Turinsky, researcher at the Molecular Structure and Function Program, Research Institute of the Hospital for Sick Children, Toronto, Canada.
Brian Turner, Sabry Razick, Andrei L. Turinsky, James Vlasblom, Edgard K. Crowdy, Emerson Cho, Kyle Morrison, Ian M. Donaldson, & Shoshana J. Wodak (2010). iRefWeb: interactive analysis of consolidated protein interaction data and their supporting evidence Database, 2010 : 10.1093/database/baq023
Razick, S., Magklaras, G., & Donaldson, I. (2008). iRefIndex: A consolidated protein interaction database with provenance BMC Bioinformatics, 9 (1) DOI: 10.1186/1471-2105-9-405