What’s the Answer? (aligning isoforms)

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

This week’s highlight from Biostar is a tool that was new to me, and seems to have useful differentiation from other tools in this arena. I put it in my drafts folder a while ago and forgot to get back to it at the time. Anyway–check out PALO for isoform matching in multiple sequence alignments.

Forum: Palo: The Importance (And Impact) Of Aligning Matching Isoforms In Multiple Sequence Alignments

Protein ALignment Optimiser (PALO) is an algorithm for the selection of the best combination of protein isoforms among orthologous genes in the construction of a multiple alignment. You can easily upload your files from ENSEMBL and this tool will tell you which is the most suitable combination for you to align.

Large-scale evolutionary studies often require the automated construction of alignments of a large number of homologous gene families. The majority of eukaryotic genes can produce different transcripts due to alternative splicing or transcription initiation, and many such transcripts encode different protein isoforms. As analyses tend to be gene centered, one single-protein isoform per gene is selected for the alignment, with the de facto approach being to use the longest protein isoform per gene (Longest), presumably to avoid including partial sequences and to maximize sequence information. Here, we show that this approach is problematic because it increases the number of indels in the alignments due to the inclusion of nonhomologous regions, such as those derived from species-specific exons, increasing the number of misaligned positions. With the aim of ameliorating this problem, we have developed a novel heuristic, Protein ALignment Optimizer (PALO), which, for each gene family, selects the combination of protein isoforms that are most similar in length.

Take a look to the Tutorial section. You can either use this online version (section Run) or download the raw code (python-github) and run it in your local machine.


Quick link to PALO: http://evolutionarygenomics.imim.es/palo

And their paper has more details as well.

Villanueva-Cañas J.L., Laurie S. & Albà M.M. (2013). Improving genome-wide scans of positive selection by using protein isoforms of similar length., Genome biology and evolution, 5 (2) 457-467. DOI: