What’s The Answer? (ALT_LOCI-aware tools)

BioStar is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the community and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

This week’s highlighted question is pretty thoughtful and timely. Most people in the field are probably aware of the new human genome reference sequence that’s been generated and will be percolating to tools we use very soon. We talked about it back in April in this post. But one aspect of this new version is that there will be more segments that are ALT_LOCI, or variations from the primary assembly sequence that are important to capture. There was more emphasis in this new reference assembly to these alternative sequences and they will be made available with the reference sequence.

But how will different tools we use handle those? Although this question was aimed at aligners, it is important to think about how any tools we are using that display, compare, or navigate the reference genome will be affected.

Question: GRCh38 and reference alignment

I came across a slide share about the new GRCh38 assembly and its ALT_LOCI assemblies. (http://www.slideshare.net/vaschn/agbt2014schneider)

Question: 1. Are there any “ALT Loci aware” aligners out there ?

  1. If so, then how are accommodating these ALT LOCI while aligning reads to the reference genome ? How do they chose between primary assembly and Alternate Loci ?

Some references: This Biostar thread (Applying patches to GRCh assembly) talks about patches/Alts during alignment as well as about the aligner – srprism (ftp://ftp.ncbi.nlm.nih.gov/pub/agarwala/srprism)

Nice overview of ALT LOCI in GRCh38 (Will GRC38/HG20 be a multiple sequence reference genome?)

Are the tools ready? One answer provides some guidance on aligners (or “multi-allelic reference” aware as Deanna described them) that may be–but they aren’t published yet. If anyone else has other ALT-aware tools please bring ‘em over. Or of you know of any other helpful details or issues around this it would be great for the discussion.

I think this is important to think through as we get more and more individuals sequenced and see more sequence diversity in the population too. And it’s important to be aware that the primary assembly is an excellent snapshot, but it doesn’t show you everything. You should be ALT-Loci aware too.