What’s the Answer? (off-by-one)

BioStar is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the community and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

This week’s question is one that comes up a lot in trainings we do on the UCSC Genome Browser. The nickname for this problem is “off by one” and so when I saw this tweet go by, I knew exactly what had happened.

This week’s highlighted question:

Question: 200bp long query returning 201bp

In Ensembl when I use this link to return the sequence on the sense strand of X chromosome between 200000 and 200200 I get the following output:

>X dna:chromosome chromosome:GRCh37:X:200000:200200:1
CCAAACCCCAGGCAGGAGACCAGCCCGTGTTATACGGTGCCTGGAGGAGGCGTGACTCAT
TTGCATAGCGCTGAGGGGATTGGTCTGACCAGGCCTGTCATTCACGTAGCCCGCGAAAAA
CCTGGCCCGCCCACCCCAGTTCCGTAATATGCAAATGTAGGGCGCCATGATGTTCCACAC
GCCTGAGGGTAGTGGGGGCGG

This contains 201 nucleotides, but from my query I was expecting 200. Where has this extra nucleotide come from? Which position is it at? Is my query wrong?

yesitsjess

The comments below the answer help explain the issue, and the links that Istvan offers in the answers go further. But I’ll also add a link to the UCSC Genome Browser help page explanation about the discrepancy between the way it looks on the browser and why you might be confused if you do another kind of query.