What’s the answer? (1000Genomes SNPs issues)

BioStar is a site for asking, answering and discussing bioinformatics questions. We are members of thecommunity and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those questions and answers here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

This week’s question:

Why are there more non-synonymous SNPs than synonymous SNPs in the 1000 genomes data?

I have downloaded SNP data from the 1000 genomes project through Biomart and UCSC genome browser. These SNP data are annotated as being synonymous or non-synonymous (missense). In all textbooks it is said the the number of synonymous mutations should be much higher than non-synonymous mutations. Then why is it that I consistently observe higher number of non-synonymous SNPs for the human genome? Do you think there might be a mistake in annotating these SNPs or there is something else that I am missing?


This question generated a lot of discussion. And one of the key aspects is that you have to really pay attention to how the annotation features are provided in a database. Have a look at the chatter over there about various aspects of SNP annotations.