BioStar is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the community and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at BioStar.
This week’s question is one that comes up a lot in trainings we do on the UCSC Genome Browser. The nickname for this problem is “off by one” and so when I saw this tweet go by, I knew exactly what had happened.
— Neil Saunders (@neilfws) March 26, 2013
This week’s highlighted question:
In Ensembl when I use this link to return the sequence on the sense strand of X chromosome between 200000 and 200200 I get the following output:
>X dna:chromosome chromosome:GRCh37:X:200000:200200:1 CCAAACCCCAGGCAGGAGACCAGCCCGTGTTATACGGTGCCTGGAGGAGGCGTGACTCAT TTGCATAGCGCTGAGGGGATTGGTCTGACCAGGCCTGTCATTCACGTAGCCCGCGAAAAA CCTGGCCCGCCCACCCCAGTTCCGTAATATGCAAATGTAGGGCGCCATGATGTTCCACAC GCCTGAGGGTAGTGGGGGCGG
This contains 201 nucleotides, but from my query I was expecting 200. Where has this extra nucleotide come from? Which position is it at? Is my query wrong?
The comments below the answer help explain the issue, and the links that Istvan offers in the answers go further. But I’ll also add a link to the UCSC Genome Browser help page explanation about the discrepancy between the way it looks on the browser and why you might be confused if you do another kind of query.