Guest Post: SNAP — Andrew Johnson

This next post in our continuing semi-regular Guest Post series is from Andrew Johnson, one of the developers and the concept designer of SNAP, SNP Annotation and Proxy Search which is hosted at the Broad Institute. If you are a provider of a free, publicly available genomics tool, database or resource and would like to convey something to users on our guest post feature, please feel free to contact us at wlathe AT openhelix DOT com or the contact form (write ‘guest post’ as subject heading). We welcome introductions to your resource, information on updates, highlights of little known gems or opinion pieces on the state of genomic research and databases.

SNAP (http://www.broadinstitute.org/mpg/snap/, Johnson et al. (2008) Bioinformatics 24(24): 2938), “SNP Annotation and Proxy search”, is a flexible, web-based tool that allows anyone in the world to quickly accomplish a range of SNP-related genetics and bioinformatics tasks. This post highlights some common questions andfeatures of SNAP, some more obscure uses, and recent and planned developments.

How did SNAP come about?

The idea for SNAP was originally sparked by GWAS analysts within a large collaborative group (the Framingham Heart Study SHARe project). This was in the pre-imputation era when GWAS investigators from different groups using different SNP arrays often wanted to find best proxy SNPs based on HapMap for comparison when they didn’t have common genotyped SNPs across groups. We initially implemented local programs to lookup upHapMap LD and also consider the presence of query and proxy SNPs on different commercial genotyping arrays. We quickly realized this was a community-wide problem as we received requests from outside collaborators so we decided it was worth developing a public tool and approached investigators at the Broad Institute. Through collaboration with Paul de Bakker, Bob Handsaker and others at the Broad Institute we were able to add more features like plotting and build a nice, quick and accessible interface. Many people have contributed ideas, testingand improvements to SNAP, and Bob Handsaker and Pei Lin in particular continue to maintain and update SNAP.

What do you use SNAP for the most?

The two major features of SNAP widely used 1) SNP LD queries, and 2) plotting of LD and association data. There are a number of flexible options for these functions. Beyond these, as a SNP bioinformatics specialist, I often use SNAP to rapidly retrieve information about a list of SNPs for other uses (see specialized queries below).

What are some commonly asked questions from users of SNAP?


Many of the common questions are covered in detail in the FAQ and/or Documentation available on the website. Here are some questions I commonly receive.

How do I return all LD proxy relationships within 500kb of my query SNPs? Change the r^2 threshold to “No Limit” and leave the distance limit at “500”.

Why doesn’t my favorite SNP appear when I make a query? Occasionally query and/or expected proxy SNPs will not be found. This could be a result of an error in your representation of the query SNPid. The most likely explanation for proxy SNP is that the filters you’ve selected (r^2 and distance limits) have caused a SNP not to be included. Another likely explanation is that the expected SNP is >500kb apart from query SNPs (see below). Rarely SNAP may return an alias SNPid for a proxy rather than the one you expect since SNAP takes aliases into account in queries (see below).  Finally, a SNP(s) may not be included in the HapMap release you are querying.HapMap releases 21 and 22 differ among a small number of SNPs. HapMap release 3 differs from prior releases at a greater number of SNPs. If it is important you should try querying different HapMap releases to find a release(s) with your SNP(s) of interest. Alternatively, you can also try to find genotype data separately and load into a program such as Haploview to calculate LD metrics.

Can I generate plots based on my own data? Yes. You can upload both your own association data (-log P) and your own LD data, as you may have generated LD estimates within your own population and/or a larger sample than available in the HapMap. If you don’t specify your own LD data SNAP uses HapMap by default. If you don’t provide chromosome and position SNAP fills these in based on HapMap. If you have de novo markers that are not in HapMap you can also include these as long as you specify the chromosome and positions and LD to the target SNP.

Why do I observe different LD estimates for the same pair of SNPs in different HapMap releases? Identical SNP pairs generally have identical, or very similar, LD estimates in different releases of HapMap. If LD estimates differ slightly it is attributable to differences in genotypes in the releases.

What if I just want to query LD among a select group of SNPs? Click the “Pairwise LD” tab. Copy and paste yourSNPid list, or upload a file. Your LD queries will be limited to only your SNPs of interest rather than all HapMapSNPs that meet the filtering criteria.

What if I want to find SNPs genotyped only on a specific array or group of arrays? There is a rapid way to limit queries to specific arrays. Click the ‘+’ on the Filter By Array. Select those arrays you want.

What do if I want to calculate long range LD or trans-chromosomal LD? SNAP returns results for pairwise LD between SNPs with distance up to 500kb. This is greater than the default of HapMap pre-calculated data of 250kb. In some cases users may want to assess SNPs that are further apart. A few options exist including 1) downloading the HapMap genotypes and loading into Haploview while removing the pairwise distance limitation, 2) calculating using PLINK, or 3) querying with the GLIDERS website (http://mather.well.ox.ac.uk/GLIDERS/). GLIDERS returns extreme long range queries on chromosomes or trans-chromosomal queries.

What are some specialized queries I can conduct with SNAP?

Find annotation for SNPs regardless of LD proxies. SNAP doesn’t have to be used as an LD querying tool. You can simply retrieve information about a list of SNPs. To do so load your SNPids. Under “Search Options” select Distance Limit as 0 instead of the default 500kb. With the default settings SNAP will now only return information for your query SNPs themselves. You can select additional options like GeneCruiser annotations and MAFs. This is an excellent way to rapidly answer questions like: would a SNP(s) be genotyped on my array(s) of interest?Which of my SNPs are nonsynonymous SNPs? What are the genomic coordinates for my list of SNPs in a specific genome build (just select the corresponding HapMap build – Release 21=hg17, Release 22/HapMap 3=hg18)?What are the HapMap MAFs for my list of SNPs? Of course, you can also turn on proxy querying and ask these same questions in relation to both query and/or proxy SNPs. For instance, of my significant GWAS SNPs are any of them in LD with r^2 > 0.5 with a known nonsynonymous SNP?

Find alias or alternate SNPids for my SNPs of importance. Some people do not realize that SNPs can suffer from a historical aliasing problem just like gene names. If you are using SNPids to query bioinformatics tools or databases, or to conduct cross dataset queries, to be extra cautious you should rely on genome positions and allelesor account for potential alias IDs. SNAP allows querying to return alias SNPids. Click the tab “Map SNP IDs“. You can retrieve IDs for a SNP across all previous dbSNP builds or specify a specific build to target.

What are recently added features of planned future updates to SNAP?

SNAP is in version 2.1. Recent updates have included the addition of 1) HapMap 3 release featuring 12 population groupings, 2) SNP information for 7 new commercial SNP arrays (25 arrays now listed), 3) and the ability to include HapMap major and minor alleles, frequencies and observed genotype counts in output. We welcome suggestions for additional features. Most of the added features in the past have come from suggestions by active SNAP users and testers. In the future we plan to include query options based on 1000 Genomes Project based LDdata. We also plan to keep up with additional SNP array releases as they come to our attention.