Tag Archives: Polyphen

What’s the Answer? disease causing SNPs

BioStar is a site for asking, answering and discussing bioinformatics questions. We are members of thecommunity and find it very useful. Often

questions and answers arise at BioStar that are germane to our readers (end users of genomics resources).Every Thursday we will be highlighting one of those questions and answers here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

This week’s highlighted question is….

..which is the best database choice from where i can extract a data set of causative variants and a data set of benign variants (OMIM ,GWAS)…

A perennially favorite question. The accepted answer gives a good rundown of how to go about choosing a database. Another answer points to an earlier discussion with a wealth of databases.

Tip of the Week: SIFT, Sorting (SNPs) Intolerant From Tolerant

Last tip I did was on PolyPhen, an algorithm that helps predict the phenotypic result of a non-synonymous SNP. There are other such algorithms available including MAPP (Multivariate Analaysis of Protein Polymorphism), SNPs3D and SIFT (Sorting Intolerant From Tolerant). It’s the latter that today’s tip will be briefly covering.

SIFT, like PolyPhen, has a web interface, but is also used in various other tools to give indication of an amino acid substitution affect on  protein function. If you want to learn more about how the predictions are made (and some comparisons of the different methodologies), you might want to check out their paper (warning, pdf file).

Tip of the Week: PolyPhen

There are several methods that can be used to predict if a particular non-synonymous SNP is deleterious; SIFT and PolyPhenamong others. Which one to use will be up to the individual researcher and the strengths and weakness of the predictors, though the two mentioned do a pretty good job. Today’s tip will be on the web interface of PolyPhen 2 hosted at the Sunyaev lab*. Many tools and databases use PolyPhen to help predict the functional effect of a nonsynonymous SNP including PolyDoms, F-SNP (which I’ve done a tip on before), NIEHS SNPs and SeattleSNPs (which we have  free tutorials on), SeattleSeq and more. Today’s tip will focus on simply using the web interface, but you can always download the program and integrate it as you see fit or use one of the databases. Along with SIFT, it’s arguably one of the most used predictors out there.

From an earlier help section describing PolyPhen:

PolyPhen (=Polymorphism Phenotyping) is an automatic tool for prediction of possible impact of an amino acid substitution on the structure and function of a human protein. This prediction is based on straightforward empirical rules which are applied to the sequence, phylogenetic and structural information characterizing the substitution

To learn more about how PolyPhen works, you can view that page, or you can read some of the references. Next tip I do (early February) will be on SIFT.

Personal Genomics, clinical assessment and online resources

ResearchBlogging.orgThe Lancet paper, Clinical assessment incorporating a personal genome, has held my fascination this weekend (yes, I read it at the beach). Mary posted Friday and again Saturday on the paper and related NPR segment. It feels to me to be a seminal paper, though I do agree with Daniel at Genetic Future, there are a lot there we still don’t know. A large portion of the variation is in non-coding regions, and thus predictions and propensities are hard to come by with the available analysis. In fact, as he pointed out, many of the coding region variations have little information as to their effect on disease. I would add also that even if we get to that holy grail of $1,000 to sequence a personal genome, this kind of extensive analysis would still be time and cost-prohibitive for the vast majority of sequenced genomes.

Yet, as with all early steps in science and medicine, there’s missing pieces, large gaps and huge efforts (think “space travel,” “computers,” “microwave ovens,” “internet,”) that over time become inexpensive and commonplace (ok, so the former isn’t necessarily “inexpensive”). Sequencing genomes will become inexpensive before the analysis does, but both will come. And I think this paper is pointing to that future.

The other hurdle to large scale personal genomics I see (of course) is the understanding and use of the genomics and data resources. The authors use a large (and excellent, in my opinion) suite of genomics resources to do obtain data and do their analysis. I’ll list them here with links in alphabetical order:

dbSNP (T)
HapMap (T)
PubMed (T)
UniProt (T)

All of these resources have a wealth of data, but even then, that is a lot of analysis and familiarization that is needed with each tool. Each tool does have documentation and tutorials, and of course OpenHelix has tutorials on many of the ones mentioned (those with linked “T”s after the name). Still, this one analysis took a large number of tools and familiarization.

The paper does have a pretty good figure (figure 1) outlining the analysis process. For example, they SIFTed the genome to find gene-associated, non-synonymous, rare and novel and disease associated variations and then analyzed those using dbSNP, HGMD, OMIM and PubMed to analyze something like HFE2 which might have an association with Haemochromotosis. One of my quibbles with the paper, as often is with these papers, is that there isn’t a good methods ‘walk-through’ of the paper using something like Galaxy or Taverna in a history or workflow that would help reproduce the analysis.

We also have a tutorial I’d like to point you to, one that walks through a similar process and teaches users the basics of walking through that process. You can find this tutorial here, it’s free and publicly available. The tutorial walks the user through the analysis of a gene variation, in this case in the CYPC9 that effects an individual’s response to Warfarin. There is a similar variation (different gene, affects same drug response) in the paper. The tutorial uses the NIEHS SNPs site to get an overview of the variation including SIFT and PolyPhen predictions, then to the UCSC Genome Browser to find an overview of the region, walks through the dbSNP information and does a quick tag SNP analysis using GVS. That tutorial is only one very small step in what will have to be a immense education into genomics and genomics resources.

That is all to point out that the paper is an fascinating first step, and as a first step suggests the gaping holes we will have in bringing personal genomics to medicine.

Ashley, E., Butte, A., Wheeler, M., Chen, R., Klein, T., Dewey, F., Dudley, J., Ormond, K., Pavlovic, A., & Morgan, A. (2010). Clinical assessment incorporating a personal genome The Lancet, 375 (9725), 1525-1535 DOI: 10.1016/S0140-6736(10)60452-7

Tool you might not know: F-SNP

We go through the thousands of resources and databases available online in our search to do tutorials we found many that are great resources but for one or more reasons we don’t or can’t do a tutorial for. Yet they are great resources. So, we occasionally do “Tip of the Week” on some, but even those are not enough to at least touch on all the great resources out there, so occasionally I we are going to give a quick “shout out” to some of these resources occasionally.

So today it’s F-SNP.

Continue reading