Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.
When I touched on the variation tools at NCBI for this week’s tip, I didn’t go into detail on how the specific variations are designated. But I happened to be looking through the Biostar questions for this week’s highlight, and noticed that someone was not familiar with how the ClinVar mutations are denoted. So I thought maybe others would find that useful information as well.
I was looking into ClinVar data for getting mutation lists. There were mutations which were in the form GENE:c.*** representing they are CDS mutations and GENE:p.*** representing the amino acid changes.
What are those in the following forms represent?
TBC1D24:c.1143-6C>T – CDS mutation
NP_002760.1:p.Cys139Ser – Protein mutation
Have a look at the answers at Biostar. Zhaorong’s answer is correct. This nomenclature is certainly a bit cryptic if you aren’t familiar with the Human Genome Variation Society (HGVS) system. It’s worth looking into the background and framework for this if this is data you are likely to be working with. The history of this strategy goes back quite a ways as you can see from their publication list. But below I’ll add a reference that I think helps to understand the structure if you are new to it.
For even more help in understanding why getting nomenclature right is so crucial–check out the paper below that came out recently, on naming just the TP53 variations . This is a gene that has clinical relevance–and if you are aiming treatments at mutated TP53 you have to be sure you are getting the right one. It’s not just a trivial nuisance to understand how to define mutations–it can matter at the clinic and this will only become increasingly important as we get sequence from more tumors and other clinical situations. And I think this paper makes the point about the complexity and the needs for standardization.
Laros J.F.J., Johan T den Dunnen & Peter E M Taschner (2011). A formalized description of the standard human variant nomenclature in Extended Backus-Naur Form, BMC Bioinformatics, 12 (Suppl 4) S5. DOI: http://dx.doi.org/10.1186/1471-2105-12-s4-s5
Soussi T. & Peter E.M. Taschner (2014). Recommendations for Analyzing and Reporting TP53 Gene Variants in the High-Throughput Sequencing Era , Human Mutation, 35 (6) 766-778. DOI: http://dx.doi.org/10.1002/humu.22561