Tag Archives: snps

UniSNP database

There have been a bunch of tweets lately around the UniSNP database–so I thought I’d do a quick post to raise awareness of that. The mission of UniSNP stated on their homepage at NHGRI is:

UniSNP is a database of uniquely mapped SNPs from dbSNP (build 129) and HapMap (release 27), where differences in SNP positions and names have been resolved, insofar as possible. In addition, SNPs are annotated with various functional characteristics, based on overlap with tracks from the UCSC browser. For details, see [PUB CITATION].

Well, I went looking for a [PUB CITATION] in PubMed for this. I entered the text UniSNP. I got a bunch of results. But that’s because….

Your search for unisnp retrieved no results. However, a search for unison retrieved the following items.

Unison? Um. Ok.

Anyway: the bioinformatics folks seem interested in this resource. So maybe others will be as well. It does offer you the opportunity to look for unique SNPs, using the UCSC assembly hg18/NCBI36. You can search by regions, or by starting with a list of SNPs, It gives you a dozen ways to filter the SNPs for things that might be of interest to you (RefSeq transcript characteristics, HapMap-ness, VISTA enhancer regions, etc).

I would probably accomplish this with a UCSC Table Browser query myself. But if you haven’t had a chance to get familiar with how to use that yet, this form would be a quick way to get similar answers.

Quick links

UniSNP: http://research.nhgri.nih.gov/tools/unisnp/

UCSC Table Browser tutorial: http://openhelix.com//cgi/tutorialInfo.cgi?id=28

The Table Browser tutorial is freely available to everyone as UCSC sponsors that. It’s the same material that we use in our live workshops, with the slides, handouts, and exercises available for anyone to use.

Here’s the tweet that’s going around if you’d like to re-tweet; hat tip to Khader:

@kshameer: UniSNP: uniquely mapped SNPs from dbSNP (build 129) and HapMap (release 27) http://1.usa.gov/gE3Ou0 #genomics #bioinformatics

Tip of the Week: SIFT, Sorting (SNPs) Intolerant From Tolerant

Last tip I did was on PolyPhen, an algorithm that helps predict the phenotypic result of a non-synonymous SNP. There are other such algorithms available including MAPP (Multivariate Analaysis of Protein Polymorphism), SNPs3D and SIFT (Sorting Intolerant From Tolerant). It’s the latter that today’s tip will be briefly covering.

SIFT, like PolyPhen, has a web interface, but is also used in various other tools to give indication of an amino acid substitution affect on  protein function. If you want to learn more about how the predictions are made (and some comparisons of the different methodologies), you might want to check out their paper (warning, pdf file).

Tip of the Week: PolyPhen

There are several methods that can be used to predict if a particular non-synonymous SNP is deleterious; SIFT and PolyPhenamong others. Which one to use will be up to the individual researcher and the strengths and weakness of the predictors, though the two mentioned do a pretty good job. Today’s tip will be on the web interface of PolyPhen 2 hosted at the Sunyaev lab*. Many tools and databases use PolyPhen to help predict the functional effect of a nonsynonymous SNP including PolyDoms, F-SNP (which I’ve done a tip on before), NIEHS SNPs and SeattleSNPs (which we have  free tutorials on), SeattleSeq and more. Today’s tip will focus on simply using the web interface, but you can always download the program and integrate it as you see fit or use one of the databases. Along with SIFT, it’s arguably one of the most used predictors out there.

From an earlier help section describing PolyPhen:

PolyPhen (=Polymorphism Phenotyping) is an automatic tool for prediction of possible impact of an amino acid substitution on the structure and function of a human protein. This prediction is based on straightforward empirical rules which are applied to the sequence, phylogenetic and structural information characterizing the substitution

To learn more about how PolyPhen works, you can view that page, or you can read some of the references. Next tip I do (early February) will be on SIFT.

Tip of the Week: RGenetics at Galaxy

About 6-7 months ago, Mary mentioned that R-Genetics analysis was coming to Galaxy. Well, it has now and is available at the public Galaxy site. The old Rgenetics site links to the new one and the information about using Galaxy as a wrap around interface for the Rgenetics project tools. Today’s tip just points you to the tool and gives you a quick overview of what is there. You’ll need to do some exploring to learn to use it! Of course, we have our publicly available Galaxy tutorial to get you started.

(oh, and I point you to this tutorial on analyzing Desmond Tutu’s SNPs using Galaxy that I thought was interesting)

Guest Post: SNAP — Andrew Johnson

This next post in our continuing semi-regular Guest Post series is from Andrew Johnson, one of the developers and the concept designer of SNAP, SNP Annotation and Proxy Search which is hosted at the Broad Institute. If you are a provider of a free, publicly available genomics tool, database or resource and would like to convey something to users on our guest post feature, please feel free to contact us at wlathe AT openhelix DOT com or the contact form (write ‘guest post’ as subject heading). We welcome introductions to your resource, information on updates, highlights of little known gems or opinion pieces on the state of genomic research and databases.

SNAP (http://www.broadinstitute.org/mpg/snap/, Johnson et al. (2008) Bioinformatics 24(24): 2938), “SNP Annotation and Proxy search”, is a flexible, web-based tool that allows anyone in the world to quickly accomplish a range of SNP-related genetics and bioinformatics tasks. This post highlights some common questions andfeatures of SNAP, some more obscure uses, and recent and planned developments.

How did SNAP come about?

The idea for SNAP was originally sparked by GWAS analysts within a large collaborative group (the Framingham Heart Study SHARe project). This was in the pre-imputation era when GWAS investigators from different groups using different SNP arrays often wanted to find best proxy SNPs based on HapMap for comparison when they didn’t have common genotyped SNPs across groups. We initially implemented local programs to lookup upHapMap LD and also consider the presence of query and proxy SNPs on different commercial genotyping arrays. We quickly realized this was a community-wide problem as we received requests from outside collaborators so we decided it was worth developing a public tool and approached investigators at the Broad Institute. Through collaboration with Paul de Bakker, Bob Handsaker and others at the Broad Institute we were able to add more features like plotting and build a nice, quick and accessible interface. Many people have contributed ideas, testingand improvements to SNAP, and Bob Handsaker and Pei Lin in particular continue to maintain and update SNAP.

What do you use SNAP for the most?

The two major features of SNAP widely used 1) SNP LD queries, and 2) plotting of LD and association data. There are a number of flexible options for these functions. Beyond these, as a SNP bioinformatics specialist, I often use SNAP to rapidly retrieve information about a list of SNPs for other uses (see specialized queries below).

What are some commonly asked questions from users of SNAP?

Continue reading

Tip of the Week: Genome Variation Tour II

The last tip of the week I did was Genome Variation Tour I where we started our journey following one SNP in an individual’s genome through various databases to see what we can find out about that variation. In that tip we started out by looking at a SNP in the CYP4F2 gene in the UCSC Genome Browser and followed it to dbSNP. Today’s tip will continue our journey to OMIM to see what information we can find there. We’ll find this variation is clinically associated with Warfarin dosage effects and specifically this individual’s C/T heterozygosity indicates an intermediate dosage for effectiveness if indeed he ever needed this drug.  In some ways, your guess is as good as mine as to what we will find and what avenues we will be taking in the next few tips I’ll be doing. I’m am discovering information as I go along too. I can tell you though that the next installment of the genome variation tour will take us to PubMed, and a few not particularly well known but gem databases perhaps and probably back to the UCSC Genome Browser to expand our look at the interactions of several variations in this individuals genome.

Guest Post: WAVe – Pedro Lopes

This next post in our continuing semi-regular Guest Post series is from Pedro Lopez, developer of WAVe at the University of Aveiro Bioinformatic Group in Aveiro Portugal. If you are a provider of a free, publicly available genomics tool, database or resource and would like to convey something to users on our guest post feature, please feel free to contact us at wlathe AT openhelix DOT com or the contact form (write ‘guest post’ as subject heading). We welcome introductions to your resource, information on updates, highlights of little known gems or opinion pieces on the state of genomic research and databases.

I would like to start by thanking Trey Lathe  for the opportunity to promote WAVe in this great blog. After his short tip of the week post, I’ll now try to make a more detailed overview of this new application.

What is WAVe?

WAVe stands for Web Analysis of the Variome and is a simple application focused on centralizing the access to distributed and heterogeneous locus-specific databases (LSDB). LSDBs are an emerging type of bioinformatics applications, aiming at providing gene-centric information regarding discovered genomic variants. In WAVe, we offer both LSDBs as well as to its variants. Moreover, we also provide access to a comprehensive list of carefully selected external resources. With this, users have, in a single application, access to gene and variation information enriched with a multitude of gene-related resources in a lightweight and easy to use web application.

What are WAVe’s key features?

At this early stage, WAVe’s publicly available features are related with data access. Users can easily browse through available genes, search for genes, view gene info and access each gene RSS feed. In WAVe’s entry page, users simply need to start typing a gene HGNC-approved symbol and several suggestions will appear: accepting one of them leads directly to the gene view page. Following theview alllink, users can browse all available genes or check, for each gene, how many LSDBs and variants are available.

To access the application data, users just need to navigate in the gene tree. Each tree node represents a distinct data type and the various leaf provide access to external applications: by clicking a leaf, the destination page is loaded in the main content area. Repeating this process, users can navigate in the dozens of listed links for each gene.

WAVe also offers its core data to other developers. To obtain the gene tree and its links, users just need to add the rss tag to the end of gene address. This will output a RSS2.0 feed that can be easily parsed by any application or added to a feed reader.

How was WAVe born?

The european GEN2PHEN project is an initiative to link, as deeply as possible, data from genotype features to its phenotype counterparts. The first step consisted in an attempt to improve various genomic variation resource scenarios. This implied normalizing LSDBs (the “LSDB-in-a-box” approach, LOVD) and defining novel data models and formats for data exchanges from and to LSDBs.

In a long term perspective, applying the GEN2PHEN-approved data models, will enhance the creation of new services and applications to integrate and interact with the exponentially growing dataset of genomic variation data.

With WAVe we tried a different approach based on three questions: why wait for everyone to adopt these new formats? What will happen to legacy LSDBs that won’t adopt the new formats? How can we have an immediate solution? We have created a lightweight integration architecture, based on links to applications and adopted a simple (yet familiar) tree-based navigation interaction to deploy a new application that can be used right now and will easily scale to integrate the foreseen data exchanges formats. Technical details aside, based on a manually curated LSDB list, we can connect and integrated any kind of LSDB application whether it is a modern LOVD application or a simple text-based legacy LSDB.

How is it relevant?

To demo WAVe efficiency let’s just try to perform a simple search in our lab: Are there any LSDBs for COL3A1 gene in the human species? And known variants? And what are the associated proteins and pathways?

In a WAVe-free scenario, to find out COL3A1 LSDBs (if any), researchers need to google it (the main COL3A1 LSDB does not appear in the first result page) or, if you they are used to it, go to HGVS site, go to the “Databases & Tools” section, select “Locus-specific Mutation Databases” and then search for the gene in search box. Now for the variants researchers just need to browse the last page they’ve just entered. How many clicks (and time!) does it take?

For protein information, researchers enter in UniProt and search for COL3A1: that gives about 29 results. Add a filter for the human species and there are 5 results. Good enough to access directly to P02461 (SwissProt reviewed). Though, there is new window/tab open. Now for pathway information, a KEGG quick search for COL3A1 lists 14 results. In the end, there are about 3 windows/tabs and made some 20 mouse clicks to obtain the desired information.

Using WAVe, researchers simply need to access WAVe, start typing the gene HGNC symbol, select COL3A1 from the suggestions and access COL3A1 page. Once in the page, it’s as easy as browsing in the tree… Variations? Check the variation node, they’re even grouped according to the change type. UniProt information? Check the protein node where you have direct access to SwissProt, TrEMBL, PDB, Expasy and InterPro. And I guess you get the picture. In the end, one window/tab and about 6/7 mouse clicks.

Other UA.PT Bioinformatics tools

At the University of Aveiro’s Bioinformatics research group we are mainly young and enthusiast computer science experts, simply trying to make biology easier (at least in terms of computer applications!). Our more relevant web-based tools include MIND (a microarray analysis tool), GeneBrowser (a gene expression tools, useful to process data gathered from systems like MIND) and QuExT (a comprehensive MEDLINE mining application).

-Pedro Lopes

Tip of the Week: WAVe, Web Analysis of the Variome

Today’s Tip of the Week is a short introduction to WAVe, or the Web Analysis of the Variome. The tool was recently introduced to us, and I’ve found it a welcome introduction to the tools available to the researcher to analyze human variation. This is apropos considering the recent paper we’ve been discussing on the clinical assessment of a personal genome (here, here and here) and that papers implications for personalized medicine and the use of online variation resources. WAVe also has introduced me to some additional tools I’ve either not been aware of, or haven’t used, which might be of use such as: LOVD (Leiden Open Variation Database), QuExT (Query Expansion Tool, also from the same developers as WAVe), and others. Of course there are also database information pulled in from Ensembl, Reactome, KEGG, InterPro, PDB, UniProt, NCBI and many others. Take some time to check it out.

Tip of the Week: HapMap data in Haploview

HapMap has had a few minor updates to their browser, and importantly, new phase 3 data was released early last year (drafts of that data were released in 2008). Haploview, the downloaded software that allows the user to perform in depth LD and haplotype analysis, has been recently updated from version 4.1 to version 4.2. Haploview can be used with user data or data downloaded from the HapMap project. Though, version 4.1 did not work for phase III HapMap project data, so the user had to use phase I and II data if they wanted to use version 4.1. Haploview has now been updated to version 4.2, allowing the user to use HapMap phase III data.

That’s a lot of versions and phases :). The short of it is, if you use Haploview 4.2, you can view and analyze data from any phase of the HapMap project.

Today’s tip briefly shows you how to download data from the HapMap project and view it in Haploview.

Top SNPs of the year

Interesting post from SNPedia blog (we mentioned being able to view SNPedia SNPS HapMap last year in a post) of the top 10 SNPs of the year.

Of course, as they mention, it’s very subjective.

Because they have chosen SNPs with serious health interest, I’ll semi-frivolously (because hey, no knowledge is necessarily “frivolous” :) nominate either:

The “ear wax” SNP which determines whether you have ‘wet or dry’ earwax, only because (yes, TMI) I have both, one in each ear so now I’m curious as to why.


The “Perfect Musical Pitch” SNP, only because my daughter and I seem to have that particular variation, and we know a few people who don’t ;-).