Tip of the Week: MetaPhoOrs, orthology and paralogy predictions

The researchers and developers at PhylomeDB haven’t rested on their laurels. I did a tip of the week on PhylomeDB 3 months ago and not too long ago I was checking over there and found the team had created another useful database and analysis tool, MetaPhoOrs. What is MetaPhoOrs? To quote from the homepage:

MetaPhOrs is a public repository of phylogeny-based orthology and paralogy predictions that were computed using resources available in seven popular homology prediction services (PhylomeDBEnsemblComparaEggNOGOrthoMCLCOGFungal Orthogroups, andTreeFam).

The research article on their methodology published in NAR (online 12/10) will give you a better understanding how these orthology and paralogy predictions are made. Basically, MetaPhOrs uses phylogenetic orthology and paralogy predictions from several sources. These phylogenies overlap:

Since many of these repositories overlap, partially, in terms of genomes covered, it is often the case that phylogenetic information regarding a pair of proteins can be found in several databases.

Moreover, these phylogenies are built with different protein sets, parameters and methodologies.

Such level of information redundancy can be exploited to assess the robustness of a given orthology or paralogy prediction to changes in the phylogenetic settings…. Intuitively, a prediction that is not affected by such settings will be considered more reliable.

MetaPhOrs uses this information to predict orthologs and paralogs for protein pairs with a consistency score (CS, “the fraction of trees predicting an orthology relationship over the total of trees considered”) and a evidence level (EL, “how many independent sources have been used for the prediction”). CS for orthologs ranges from 0 (all trees predict paralogy) to 1 (all trees predict orthology). Take a look at the paper for more information on this methodology and results.

To date, the database uses over 700,000 phylogenies from several sources to predict over 300 million homologous protein pairs from over 800 fully sequenced genomes. They plan to regularly update and add more phylogenetic and protein data.

Today’s tip spends 5 minutes going over the database and showing you how to access these predictions.

Pryszcz, L., Huerta-Cepas, J., & Gabaldon, T. (2010). MetaPhOrs: orthology and paralogy predictions from multiple phylogenetic evidence using a consistency-based confidence score Nucleic Acids Research, 39 (5) DOI: 10.1093/nar/gkq953