Tip of the Week: Converting Genome Coordinates

I did this tip over two years ago and am revisiting it today with a bit more information, on SciVee (so it’s shareable) and up-to-date. I’ve been updating our Galaxy tutorial and that tip has been one of the most tweeted, shared and visited tips we’ve done (not the most, just one of), so thought now would be a good time to revisit it. This tip will go through the Galaxy tool to “liftover” genome coordinates between assemblies and genomes. You might also wish to visit a few other tools and places where you can convert genome coordinates between genome assemblies such as the UCSC Genome Browser Liftover utility (access that link from “utilities” menu on the front page, it uses a chain conversion files), FlyBase (for D. melanogaster genome), Maker (an annotation tool from GMOD that includes an assembly conversion tool), Ensembl Assembly converter, and I’m sure there are others. Have any to report? As the comment below informs us, there is also NCBI’s new remapping service which maps between assemblies (within species) and between refseq sequences and assemblies.

A word about methodology, as mentioned in the first paragraph, UCSC Genome Browser’s liftover tool uses chain conversion files. I am unsure of the methodology used at Galaxy though I’m assuming it’s similar. I have an inquiry in and will update this page when I know the answer.

Indeed it is. I received an nice answer from the Galaxy support team:

The liftOver program and the underlying mapping file comes from UCSC and is based on their “Chain/Net” comparative genome algorithms.

The data represents the syntenic genome regions for the two reference genomes involved. Genes with similar annotation, between closely related species, found within these syntenic regions have a good likelihood of being orthologs, but gene function is not considered by the algorithm and would have to be evaluated independently to confirm orthology.

This mailing list discussion at the UCSC Genome Browser project would be a good place to learn about the details:
Contacting the team directly at genome@ucsc.edu is also an option if you have a specific question about the algorithm.

