Tag Archives: GWAS

Friday SNPpets

Welcome to our Friday feature link dump: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

Tip of the Week: GRAIL for prioritizing SNPs

grail_snps_tipPerusing my copy of Nature Genetics last week, I was flipping through the pages and noticed an unusual graphic.  I looked at it a little closer and was convinced it was one of the Spirographs that I used to make as a kid.  (Remember those? I always liked that….)  I looked a little bit closer and realized it was somewhat more informative than the Spirographs I used to draw.  This represented the relationships between genes, based on the literature.  Hmmm….how did they do this, exactly?

The paper I was reading was Genetic variants at CD28, PRDM1 and CD2/CD58 are associated with rheumatoid arthritis risk by Raychaudhuri et al, which was interesting enough.  I like to read the GWAS papers to see what the current techniques and strategies are, not only for the specific genes themselves.  And this paper reported the strategy that they used to prioritize their SNPs, and that they used GRAIL to generate the data for this graphic of gene relationships.  Check out Figure 1 for the strategy.

When I saw the name GRAIL I thought–huh….GRAIL is back with a new use?  I thought that was…ah…retired…at this point.  But this isn’t that GRAIL (http://compbio.ornl.gov/Grail-1.3/, Gene Recognition and Assembly Internet Link).  This is a different GRAIL–the new one is Gene Relationships Among Implicated Loci. So I had to go and read that paper, which is  Identifying Relationships among Genomic Disease Regions: Predicting Genes at Pathogenic SNP Associations and Rare Deletions by Raychaudhuri et al.

This new GRAIL is all about text mining.  It is a tool that relies on statistical text mining of the literature for genes in a region and examines the relationships among those genes in the text.  The focus in their case is disease regions, but there’s no reason that you couldn’t use it for a variety of other topics.   As the authors state:

Given only a collection of disease regions, GRAIL uses our text-based definition of relatedness (or alternative metrics of relatedness) to identify a subset of genes, more highly related than by chance; it also assigns a select set of keywords that suggest putative biological pathways.

So you pull a set of genes out of the literature based on SNPs or locations of interest, and you can begin to assess what’s interesting in the set.   Now, the tool makes a lot of assumptions that you should be aware of if you are going to use it.  It assumes each region contains a single pathogenic gene.  I’m not sure that’s always going to be the case, but for this tool as long as you know that, that’s a fair assumption.  They suggest this helps to keep from multigenic regions from dominating the analysis.  Fair enough, but…what if that is the interesting aspect?  Still–that’s ok as long as you know.

In the paper they use validated SNPs from 4 different research areas:

  • SNPs associated with serum lipid levels: GRAIL finds genes in the cholesterol biosynthesis pathway.
  • SNPs associated with height; they identify pathways they consider plausible.
  • Crohn’s disease; they confirm associations that have been seen.
  • Schizophrenia–and here they used rare deletions as the items of interest; they find related genes, many highly enriched in the CNS. So this suggests using this not only for SNPs but for CNVs this may be a useful strategy.

Their Figure 1 nicely summarizes the strategy:


One curious tweak of the data analysis was that they used the literature prior to December 2006, because right after that there was an onslaught of GWAS papers that would list a whole bunch of genes associated with regions that might be more tenuous still.  I understand this in theory, but I imagine it also eliminates more current research on genes of interest from other methods too.  I saw in the tool you could choose either pre-Dec 06 or a more up-to-date literature set.  It would be useful to try both if you use GRAIL and keep that in mind.

Another point to keep in mind: some genes are just not found in the abstracts, and they mention that is an issue.   So the set you can examine are those that were in the abstracts, and were identified properly with nomenclature, spelling, etc.  Text mining is cool, but has a lot of limitations around those aspects, and the use of synonyms too in general. It’s not just an issue for GRAIL, but for all text mining tools at this point.

They also devise a way to use Gene Ontology (GO) and some expression data in GRAIL as other “relatedness” metrics.  You’ll find those available from the GRAIL tool as well. spirograph

They don’t show any spirographs in their figures in this first GRAIL paper.  That one that drew me in was Figure 2 in the arthritis paper.  So I went over to the software to try to generate these myself.  The outcome at this point is a web page with text and links to UCSC Genome Browser, and Entrez Gene (from the individual genes and from the keyword list–keywords collect multiple Entrez Genes).  I was a little surprised that the keyword link wasn’t to PubMed as well.  Currently it doesn’t provide the graphic, but maybe that will come along over time.  If it does I’ll be sure to mention it on the blog.

One final note on the paper: in the supplemental section they compare GRAIL to other tools in this arena.  If you are interested in tools like we are here you may find some of them interesting as well.   The tools are listed with URLs in Table S5, and the comparison outcome is in Text S1:

Prioritizer [2], Gene2Disease (G2D) [3,4,5], Commonality of Functional Annotation (CFA) [6], and Prospectr [7]. There were five supervised tools: Endeavour [8], GeneSeeker [9], SUSPECTS [10], TOM [11], and CANDID [12]

So check out GRAIL and see if you find gene relationships.  But don’t forget those caveats about the genes not listed in the abstracts, or the literature coverage dates.  The software can be found here:  http://www.broad.mit.edu/mpg/grail/

I know it’s a beta.  But I think it has a lot of potential to help people sift through the results they are getting from a variety of techniques.  Check it out.

NOTE: you may find periods that you can’t run GRAIL because it puts a burden on the servers.  You should try again during off hours if you are seeing problems with getting it to run. This happened to me during my testing of it last week.

The list of GWAS data I used to test GRAIL came from the NHGRI catalog, which we discussed here:  List of GWAS studies.  I tried the straight hair SNP list, and got a pretty interesting set of results that certainly included “epidermis” and “skin” as keywords, among other things.

++++++++++++ Citations ++++++++++++
Raychaudhuri, S., Plenge, R., Rossin, E., Ng, A., International Schizophrenia Consortium, Purcell, S., Sklar, P., Scolnick, E., Xavier, R., Altshuler, D., & Daly, M. (2009). Identifying Relationships among Genomic Disease Regions: Predicting Genes at Pathogenic SNP Associations and Rare Deletions PLoS Genetics, 5 (6) DOI: 10.1371/journal.pgen.1000534

Raychaudhuri, S., Thomson, B., Remmers, E., Eyre, S., Hinks, A., Guiducci, C., Catanese, J., Xie, G., Stahl, E., Chen, R., Alfredsson, L., Amos, C., Ardlie, K., Barton, A., Bowes, J., Burtt, N., Chang, M., Coblyn, J., Costenbader, K., Criswell, L., Crusius, J., Cui, J., De Jager, P., Ding, B., Emery, P., Flynn, E., Harrison, P., Hocking, L., Huizinga, T., Kastner, D., Ke, X., Kurreeman, F., Lee, A., Liu, X., Li, Y., Martin, P., Morgan, A., Padyukov, L., Reid, D., Seielstad, M., Seldin, M., Shadick, N., Steer, S., Tak, P., Thomson, W., van der Helm-van Mil, A., van der Horst-Bruinsma, I., Weinblatt, M., Wilson, A., Wolbink, G., Wordsworth, P., Altshuler, D., Karlson, E., Toes, R., de Vries, N., Begovich, A., Siminovitch, K., Worthington, J., Klareskog, L., Gregersen, P., Daly, M., & Plenge, R. (2009). Genetic variants at CD28, PRDM1 and CD2/CD58 are associated with rheumatoid arthritis risk Nature Genetics, 41 (12), 1313-1318 DOI: 10.1038/ng.479

Medland, S., Nyholt, D., Painter, J., McEvoy, B., McRae, A., Zhu, G., Gordon, S., Ferreira, M., Wright, M., & Henders, A. (2009). Common Variants in the Trichohyalin Gene Are Associated with Straight Hair in Europeans The American Journal of Human Genetics, 85 (5), 750-755 DOI: 10.1016/j.ajhg.2009.10.009

Whole genome association studies

Genetic Future reports: First ever association study using whole genome sequences.

New-technology DNA sequencing provider Complete Genomics will provide near-complete genome sequences of 100 individuals to the Institute for Systems Biology, driving the first ever association study for a complex trait using whole-genome sequencing. Here’s the press release, and GenomeWeb has some additional information

This study was done by Complete Genomics, and as Daniel mentions, does indicate some changes and advances to come. Read the entire post, he mentions some things learned at ASHG about how these studies will look in the future, and particularly, this sentence…

Now the real challenge - coming up with ways of handling the massive volumes of data generated by these technologies

goes to the heart of something I see as a very important question. Not only the right tools but funding them.

An embarrassment of riches.

Are you there God? It's me, and I have another question.

ResearchBlogging.orgIf you are an American woman of a certain age, chances are that like me your first written introduction to reproductive biology was with Judy Blume.  I was thinking back to that time this weekend as I came across several papers in Nature Genetics that addressed the timing of certain female-affecting SNPs.  Unfortunately I found that I would have preferred Judy Blume to the guidance I got from these papers.  A copy of her book would have been dramatically more cost effective for me.

The first two papers address variants that are associated with age at menarche.  The authors of both papers begin by describing age at menarche, which is approximately 13 years in the US for girls of Euro ancestry, and slightly earlier for African-American girls (Sulem et al give a bit more detail).  They talk about the risks of early or late menarche.  Both papers touch on issues of BMI or body mass and or height as related issues.  The Sulem paper describes the drop in average age from 16 years in the nineteenth century–likely from environmental factors.  It’s all very informative in a general sense.  But how are they getting the reports of first periods from study participants–”adult recalled”?  I mean, I know some people kept diaries and maybe some people wrote it down, but really–even in a place with national medical records these numbers have to be fuzzy at best.  And if there is such a large environmental component tied to nutrition or obesity, how can you conclude from that?

Anyway, the work goes on to describe fairly standard techniques for identifying loci of interest.  The Perry et al paper finds 2 separate regions of interest: 9q31.2 and 6q21.  Sulem et al find the 6q21 region.  The 9q signal was associated with a 5 week reduction in menarcheal age.  5 weeks?  From largely adult recalled data?  Ok.  I guess.  The same group also finds a 5 week window for the 6q signal.  Huh.  5 weeks again.   The Sulem team find the 6q signals have effects that are 1.2 months (ah, 5 weeks?) or .9-1.9 months for a second allele in that region.  Both groups list adjacent genes that could be affected–but certainly there were no smoking guns.

If you are looking for anything helpful for working with girls in your life, get Are You There God? It’s Me, Margaret.  Because there’s really nothing actionable in these papers.

The other paper that caught my eye was by Stolk et al (Stolk was also an author on the Perry paper): Loci at chromosomes 13, 19, and 20 influence age at menopause.  They begin by illuminating me on the various risks associated with this (cancer, heart disease, osteoporosis).  We have an even bigger age range for this process–age 40-60, with an average of 51 years. Again we have a self-reporting age from participants.  We get a signal in these several regions, and again a list of adjacent genes without anything definitive. The outcome is about a 19% increased risk for menopause before age 50–for each of the regions.  How…similar. After re-examining the data to account for some environmental factors (like BMI, contraceptive and horomone use) the data was unaffected.

I actually appreciate this work–I’m glad someone is looking at women’s health issues.  I firmly believe that the next really interesting data is going to come from temporal and spatial gene expression that the pure genomic data won’t reveal.  But I guess I wanted more–something actionable, something I could do with this.  But I know how this works and I’ll deal with the not knowing.  I wonder how people will take this sort of information when they have their genomes done.  Will they scan their sequences for these SNPs?  Will they worry about that 19% chance for themselves, or that 5 weeks for their daughters?  What will they do with that information?   It will be interesting to see.

Personally I was just wishing Judy Blume wrote a menopause book.  But I don’t see one….Are you there God?  I have a subsequent and related question….

Perry, J., Stolk, L., Franceschini, N., Lunetta, K., Zhai, G., McArdle, P., Smith, A., Aspelund, T., Bandinelli, S., Boerwinkle, E., Cherkas, L., Eiriksdottir, G., Estrada, K., Ferrucci, L., Folsom, A., Garcia, M., Gudnason, V., Hofman, A., Karasik, D., Kiel, D., Launer, L., van Meurs, J., Nalls, M., Rivadeneira, F., Shuldiner, A., Singleton, A., Soranzo, N., Tanaka, T., Visser, J., Weedon, M., Wilson, S., Zhuang, V., Streeten, E., Harris, T., Murray, A., Spector, T., Demerath, E., Uitterlinden, A., & Murabito, J. (2009). Meta-analysis of genome-wide association data identifies two loci influencing age at menarche Nature Genetics DOI: 10.1038/ng.386

Stolk, L., Zhai, G., van Meurs, J., Verbiest, M., Visser, J., Estrada, K., Rivadeneira, F., Williams, F., Cherkas, L., Deloukas, P., Soranzo, N., de Keyzer, J., Pop, V., Lips, P., Lebrun, C., van der Schouw, Y., Grobbee, D., Witteman, J., Hofman, A., Pols, H., Laven, J., Spector, T., & Uitterlinden, A. (2009). Loci at chromosomes 13, 19 and 20 influence age at natural menopause Nature Genetics DOI: 10.1038/ng.387

Sulem, P., Gudbjartsson, D., Rafnar, T., Holm, H., Olafsdottir, E., Olafsdottir, G., Jonsson, T., Alexandersen, P., Feenstra, B., Boyd, H., Aben, K., Verbeek, A., Roeleveld, N., Jonasdottir, A., Styrkarsdottir, U., Steinthorsdottir, V., Karason, A., Stacey, S., Gudmundsson, J., Jakobsdottir, M., Thorleifsson, G., Hardarson, G., Gulcher, J., Kong, A., Kiemeney, L., Melbye, M., Christiansen, C., Tryggvadottir, L., Thorsteinsdottir, U., & Stefansson, K. (2009). Genome-wide association study identifies sequence variants on 6q21 associated with age at menarche Nature Genetics DOI: 10.1038/ng.383

GWAS Monday

Ok, so we don’t have GWAS (Genome-wide Association Study) mondays, but we might as well have. The field of study seems to be growing hugely fast, especially when you consider one of the first major GWAS was published just a short 2 years ago (or 4 years ago, depending on how you define major, still… short time ago :).  I read this post at Spittoon last month and thought I’d link to it (better late than never), but it appears now that there now over 328 GWAS published and many more coming. The post goes on to wonder “what next?” and summarizes some interesting articles at the New England Journal of Medicine from last month.

While I’m at it, let me point out some past recent posts on GWAS and tools here ;). Last summer I posted a note about Ensembl and UCSC Genome Browser’s GWAS viewers, in November Mary posted a link to a list of then complete GWAS, in January she also posted a Tip of the Week on visualizing GWAS using HapMap (where a commenter pointed to this useful paper), in February I posted a quick link to a new GWAS viewer, and you can find a few other posts on GWAS by doing a simple boolean search of the blog.

Gene expression and SNPs…very neat stuff

microarray_nhgri_publicA question on the blog last week got me going through my old posts, because I was sure that I had done one on a database of SNP effects on gene expression.  But it turned out that was in my memory, but still in the draft posts for the blog….

I had come across the work on Genomeweb here:

Duke Team Finds Variants Linked to Tissue-Specific Gene Expression, Splicing

A team of Duke University researchers used a genome-wide screen to find interactions between genetic variants, gene expression, and alternative splicing in blood and brain tissue. In doing so, they found extensive between-tissue differences in SNP effects — only about half of the polymorphisms had common effects in both tissues tested. The team is starting to catalogue the data on the effects that specific genetic variants have on gene expression and splicing in various tissues.

So of course I went looking for the paper and the catalog….

Tissue-Specific Genetic Control of Splicing: Implications for the Study of Complex Traits by Heinzen et al.  The paper is from PLoS Biology last December, and they introduce some QTLs that were new to me–eQTLs, for expression quantitative trait loci, and sQTLs for splicing ones.  They interrogate exon-based microarrays and look for possible effects of nearby SNPs.  I think this approach has some limitations that they concede (you can’t tell exactly which transcript may be affected, just that there is likely an effect on that gene’s expression).  I also think that known exons do not represent the alternative splicing world completely yet.  I think there are a lot of rare temporal and spatial transcription events that aren’t captured in the public databases yet, and won’t be represented in the tissue types selected.  But I think it was a nice attempt to ask the question, and I’m sure  more tissues will be explored over time.

The resource you can explore that has this data is called SNPExpress, and the introduction states:

SNPExpress is a database and its user interface that we developed to permit interrogation of the effects of common SNPs on exon and transcript level expression, in two different human tissues: brain and PBMC ( Peripheral Blood Mononuclear Cell ).

So if this type of data is of interest to you, please check out their paper and their database.  They also have a related tool attached to SNPExpress that is a WGA viewer that might be of interest.

Specific links from post:

GenomeWeb article http://www.genomeweb.com/issues/news/151467-1.html

SNPExpress: http://people.genome.duke.edu/~dg48/SNPExpress/

WGA viewer: http://people.genome.duke.edu/~dg48/WGAViewer/

SNPs in splicing paper: http://biology.plosjournals.org/perlserv/?request=get-document&doi=10.1371/journal.pbio.1000001&ct=1

Heinzen, E., Ge, D., Cronin, K., Maia, J., Shianna, K., Gabriel, W., Welsh-Bohmer, K., Hulette, C., Denny, T., & Goldstein, D. (2008). Tissue-Specific Genetic Control of Splicing: Implications for the Study of Complex Traits PLoS Biology, 6 (12) DOI: 10.1371/journal.pbio.1000001


UPDATE: (6/13/2012) I just noticed that the links to this software weren’t working, so I checked with the team. The new link for SNPExpress is: http://compute1.lsrc.duke.edu/softwares/SNPExpress/

WGAViewer is http://compute1.lsrc.duke.edu/softwares/WGAViewer/

New GWA viewer

Genome-wide association studies (GWA/GWAS) generate a lot of data that needs to be viewed and analyzed. There are some software tools out there to do that, including UCSC’s Genome Graphs.

I haven’t looked at it in detail yet, but this new downloadable, java viewer was recently developed and reported in Bioinformatics: AssociationViewer (download here). I’m passing it on to you. As I said, haven’t had a chance to give it a test drive, but as the title of the article states, it’s a ” scalable and integrated software tool for visualization of large-scale variation data in genomic context.” At first glance, it looks interesting.

Tip of the Week: Visualizing GWAS with HapMap tools

hapmap_gwas_movie.jpgWe are seeing a lot of interest in visualizing GWAS data lately.  We cover this a bit in our UCSC Genome Browser tutorial.  And we recently did a pretty popular post on a quick look at the NHGRI GWAS catalog data using the UCSC Genome Graphs tool.

But as I was looking at the HapMap tools again recently, I noticed that they have a tool for this as well.  So today’s tip examines that tool for visualizing the NHGRI GWAS catalog data, and having a look at the GBrowse view of this data in genomic regions with the HapMap context.  In this movie I load up one of the sample data sets and move from that GWAS karyogram visualization to the HapMap GBrowse view.  Click the image to view the movie.

List of GWAS studies

They are still working on the recorded version of the NHGRI GWAS seminar that we attended last week, but I wanted to point you to a useful web page they mentioned. It is a collection of GWAS studies with the top 5 SNPs from each listed, as long as they made a certain threshold.

As of 11/24/08, this table includes 202 publications and 435 SNPs.” according to the Catalog of Published Genome-Wide Association Studies.

So if you are interested in GWAS data this is a nice collection of that literature. It also comes as an Excel doc you can download.

The traits they cover are quite a range–from freckles to diabetes to bipolar disorder and many more. I think I would like to take some of these data over to the UCSC Genome Brower’s Genome Graphs feature where you can visualize the data on a handy genome graphic. To get this figure, here’s what I did:

1. Took the GWAS excel file.

2. Pulled out the rs IDs for the SNPs. Some cells had to be fixed because the data within it is a series of comma delimited SNPs. Moved each to a single cell.

3. Cleaned up any non rsIDs. I end up with 480 SNPs. I left the duplicates for now.

4. Created a plain text file of these SNPs. I gave each one a value of 1 just for the purposes of the genome graphs software. I just wanted to see all these SNPs on the genome in one graphic. Genome graphs tool tells me:

Loaded 12351941 elements from snp126 table for mapping.
Mapped 479 of 480 (99.8%) of markers
These data are now available in the drop-down menus on the main page for graphing

Off we go…Here are my SNPs on the genome graph–the SNPs are teeny blue dots. Ok, I don’t know what it means either. I just wanted a sense of what was coming out of all the GWAS studies and where they actually were on the genome. I would like to take another look at the data, this was just a quick pass–I’m intrigued by the SNPs that come up in multiple studies and I’m curious about what those genes do. Hmmm…..


Genes for Complex Traits in the Domestic Dog

dog_webinar.jpgThe title for the next seminar in the NHGRI webinar series is just a teaser–I don’t have any more information about the next seminar right now. Thursday, January 8, 2009. 1pm ET.

It was posted on the webinar I attended yesterday on GWAS studies so I took a screen shot of it.

If this is a topic that interests you, watch the webinar website. You do need to register ahead of time for these, and an email comes with login information specific for you.

I’ll have more thoughts on the GWAS one later, but I wanted you to be able to put this on your calendar and save the date if it something you might want to see.