One of the most frequent questions we hear when we do workshops is: how to I find out if this SNP has an effect on my favorite protein? Well, that’s assuming it is a coding SNP. Of course, promoter SNPs and splicing SNPs and other features would be great to assess as well. Right now, though, the most mature tools are those that look at the effects of variation on the coding of the amino acids in proteins.
We’ve talked before about some tools for this, including PolyPhen2 and SIFT. Each of them will offer different algorithms and options that might help you to explore your SNPs. But another tool is available that you should check out as well: SNPeffect 4.0.
SNPeffect isn’t new–this team has been developing it for a while. But their recent paper that describes new features in the 4.0 version spurred me to have a new look at it. There are some foundational things that are important to know about the data collection in their database. It’s not just a re-hash of dbSNP–it actually relies on another source of variation data. They use the UniProt collection of human proteins as the starting point. If you haven’t used UniProt much, you might not be aware of how much variation they catalog and store that are identified in the proteins (we cover this in our tutorial*). The SNPeffect team takes those variations and evaluates the impact they have on a protein with a variety of algorithms. Some of the variations will correspond to dbSNP entries–but not all of them do. You may find things here that you won’t find in dbSNP. So I would say it’s worth exploring your proteins of interest here as well.
The algorithms they use provide information on a number of features of the protein. TANGO and WALTZ assess protein aggregation and amyloid formation. LIMBO evaluates chaperone binding. Structural stability is predicted by FoldX (if a suitable structure is available). They also use SMART* and Pfam* to see if the variation occurs within domains of the protein. There are some other tools with more protein features examined as well. Check out the paper for more details.
You can also submit proteins of interest to their analysis suite from the “Submit a new SNPeffect job” links.
A new feature highlighted in their paper is the opportunity to do a Meta-analysis on groups of variations. You can explore the features of sets of variants in this way, using the different algorithms they offer.
This short video examines the pipeline, the basic interface, and a couple of sample pages. But you’ll want to go over and try a lot more to learn about your favorite proteins. There’s a lot of information that can come out of this that you might not have known before. Check it out.
*OpenHelix tutorials for these resources available for individual purchase or through a subscription
Quick links to resources discussed:
SNPeffect 4.0 http://snpeffect.switchlab.org/
PolyPhen 2 http://genetics.bwh.harvard.edu/pph2/
De Baets, G., Van Durme, J., Reumers, J., Maurer-Stroh, S., Vanhee, P., Dopazo, J., Schymkowitz, J., & Rousseau, F. (2011). SNPeffect 4.0: on-line prediction of molecular and structural effects of protein-coding variants Nucleic Acids Research, 40 (D1) DOI: 10.1093/nar/gkr996