Tag Archives: GWAS

ZBrowse sample image

Video Tip of the Week: ZBrowse for GWAS viewing and exploration

Maybe you’ve heard of the others. ABrowse. BBrowse. CBrowse. [you get the idea] GBrowse has been widely adopted. JBrowse is picking up steam. Into the orderly arrangement we now throw ZBrowse: a new way to look at genome-wide association study data.

Sharing and chatter about ZBrowse for viewing GWAS was abundant when the paper was published recently.

I could see the appeal immediately. One of the first things I check when exploring new software is the species range. See, I’m agnostic on species, and especially like to find tools that support a wide range of species. ZBrowse does this. Right in their paper they provide a chart comparing their features to other tools, and that tidbit jumped right out at me.

Although we usually like to highlight web-based tools, this one was really different and worth covering even though it requires you to do a bit more lifting on installing it. But they help with that, in their videos and instructions. And ultimately it runs in your browser, once you’ve got the right pieces in place. I was able to set it up and run it (after updating my R and RStudio).

I’m going to skip the installation and data loading videos for now, but you should go over and see them when you are ready to try it out. I’ll just give you a look at the features they show in their introductory video for the browser part. That will give you the best idea of why it’s worth trying it out.

It does require you to have R installed, and RStudio. We’ve talked about both of those before, but if they are new to you, check’em out in these other Video Tips of the Week: Introduction to R Statistical Software, RStudio as an Interface for using R.

It comes loaded with some plant data, but you can use other data you have. It was very easy to look at the Manhattan plot view, and then focus on smaller chosen regions. I really liked how easy it was to see what’s in the neighborhood of a selected item when you turned to the annotation tab. ZBrowse sample image

It might also be worth trying this out as a software delivery strategy–I was just reading about other folks who are offering tools that sit on top of R and RStudio this way (come back tomorrow for another example). People who want to offer you the chance to look over large data sets they are providing are considering this.

Quick links:

ZBrowse at Baxter Laboratory: http://www.baxterlab.org/#!/cqi0

R: http://www.r-project.org/

RStudio: http://www.rstudio.com/


Fertig, E. (2012) Getting Started in R.

Racine J.S. (2011). RStudio: A Platform-Independent IDE for R and Sweave, Journal of Applied Econometrics, 27 (1) 167-172. DOI: http://dx.doi.org/10.1002/jae.1278

Ziegler, Greg R., Ryan H. Hartsock, and Ivan Baxter. “Zbrowse: an interactive GWAS results browser.” PeerJ Computer Science 1 (2015): e3. DOI: 10.7717/peerj-cs.3


Friday SNPpets

This week’s SNPpets include finding hidden treasures in a “big data” repository, genomic epidemiology and malaria, cannabis strain phylogeny, hackathons and lessons learned, ClinGen for clinical genomics, and more….

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

Tip of the Week: Human SNP-coexpression associations, SNPxGE2

Today’s tip is on a new database based on data from a single interesting paper, SNPxGE2. With a  large scale association study from HapMap data (269 individuals, 4 populations, over 500k SNPs and 15k expression profiles), the research reported:

the computationally predicted human SNP-coexpression associations, that is, the differential co-expression between 2 genes is associated with the genotype of an SNP.

This data is organized in an easily searchable database called SNPxGE2. As the paper only came out 2 months ago, it’s a promising database. It’s interesting and helpful as is, but I can see more data being added over time.

Related Links:

Tip of the Week on SNPexp (correlation between SNPs and expression)
False Discovery Rate article

Wang, Y., Joseph, S., Liu, X., Kelley, M., & Rekaya, R. (2011). SNPxGE2: a database for human SNP-coexpression associations Bioinformatics, 28 (3), 403-410 DOI: 10.1093/bioinformatics/btr663

Are you wicked smart?

Or, as it’s pronounced around my house: ah you wikked smaht? There’s a recruiting effort by the BGI (Beijing Genomics Institute–which is not limited to Beijing anymore) to obtain a collection of wicked smart people and analyze their DNA. They are looking for genes  for intelligence.

I looked at the automatic criteria, and I’m not eligible. At least by that measure. But maybe some of you are. On the first page of the study it says:

We are currently recruiting subjects for:

  1. A Genome Wide Association Study of intelligence. If you are cognitively gifted, we encourage you to participate!

On the participation page:

How to qualify

We currently seek participants with high cognitive ability. You can qualify for the study if you have obtained a high SAT/ACT/GRE score, or have performed well in academic competitions such as the Math, Physics, or Informatics Olympiads, the William Lowell Putnam Mathematical Competition, TopCoder, etc. You may also qualify via exceptional academic credentials, which you will have a chance to specify in the survey.

Automatic qualifying criteria include:

  • An SAT score of at least 760V/800M post-recentering or 700V/780M pre-recentering; ACT score of 35-36; or GRE score of at least 700V/800M.
  • A PhD from a top US program in physics, math, EE, or theoretical computer science.
  • Honorable mention or better in the Putnam competition.

Umm…never even heard of the Putnam. I think I’m out….

Anyway–if you want to advance the research and you qualify, you may want to check it out. Be sure to read and understand the privacy policy and the consent form (PDF).

They also intend to do another one on face blindness (prosopagnosia). I know some people with this, I think it might be interesting to see if they can find genes for that. Right now they are directing people to faceblind.org for signing up for information.

Hat tip @BGI_Events:

RT @BGI_Events: Now at #google @hsu_steve is launching our drive to recruit US participants in our #cognitive #genomics project http://www.cog-genomics.org/

cross-posted at Genomes Are Us.

What’s the Answer? Open Thread (GWAS genotyping)

BioStar is a site for asking, answering and discussing bioinformatics questions. We are members of the community and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those questions and answers here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

Question of the Week:

How much of the genome is captured by a GWAS?

Two great answers to this questions, a quote from the first one. Click the above link for more.

Human genome encodes 1 SNP/100-300bp; ~3GB sequence ~10million SNPs. It is impossible to analyze such a large number of data due to several limiting factors. To deal with this issue we can use Linkage Disequilibrium (LD) mapping (See section on D’, recombination rate), HaplotypeHaplotype blocks and Haplotype Tag SNPs (tagSNPs). (Read about HapMap project here). Instead of genotyping all the 10M SNPs we can genotype tagSNPs in a haplotype block. This is a representative SNP in a given region of genome with high LD. This will enable to find genetic variation without genotyping all the 10M SNPs. Previous studies indicated that genotyping chips with .5M-1M SNPs will be sufficient for a good GWAS.

Tip of the Week: Prioritizing Genes

Many types of experiments today return large lists of genes, association studies, expression arrays, linkage analysis and more. The researcher needs to determine which of those genes are of most interest and promising so the next step in the analysis is to prioritize the list and find the method to do so.

There are a lot of methods and tools to prioritize a list of genes and getting a handle on which tool to use can be a bit of a daunting task. The Gene Prioritization Portal is an excellent resource to find the right tool. It’s a bit more than just a database of databases or tools. it’s a regularly updated list with detailed information about the tools (there are 25 at the moment), stats about what the data sources of the tools are, the outputs and references. There is also a nice search tool to find the tool that most fits your needs.

Today’s tip will introduce the site and perform a quick search. Future tips might be highlighting some of these tools.

Update: due to technical difficulties either in recording or the upload/processing the audio isn’t working. I’m trying to fix this. In the meantime, you can get a basic overview watching, but I’ll get a new version up as soon as I’m able to fix this.

“What’s the Answer” Thread

BioStar is a site for asking, answering and discussing bioinformatics questions. We are members of the community and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday* we will be highlighting one of those questions and answers here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

BioStar Question of the Week:

What’s the difference between GWAS and genome wide linkage studies?
I don’t understand the difference between a GWA (genome wide association study) and a GWLS (genome wide linkage study). I’m a computer scientist having to brush up biology!

Highlighted Answer:
The accepted, and well detailed answer is from David Quigley. Of course, if you are deep into GWAS and similar studies the answer is obvious :D, but for those who are new to the field or interesting in it…

…Linkage studies are performed when you have pedigrees of related individals and a phenotype (such as breast cancer) that is present in some but not all of the family members. These individuals could be humans or animals; linkage in humans is studied using existing families, so no breeding is involved. For each locus, you tabulate cases where parents and children who do or don’t show the phenotype also have the same allele. Linkage studies are the most powerful approach when studying highly penetrant phenotypes, which means that if you have the allele you have a strong probability of exhibiting the phenotype. They can identiy rare alleles that are present in small numbers of families, usualy due to a founder mutation. Linkage is how you find an allele such as the mutations in BRCA1 associated with breast cancer.

Association studies are used when you don’t have pedigrees; here the statistical test is a logistic regression or a related test for trends. They work when the phenotype has much lower penetrance; they are in fact more powerful than linkage analysis in those cases, provided you have enough informative cases and matched controls. Association studies are how you find common, low penetrance alleles such as the variations in FGFR2 that confer small increases in breast cancer susceptibility…

Go here for the rest of the answer

Tip of the Week: RepTar, a database of miRNA target sites

microRNAs have become a rich source of research as they probably have a huge effect on gene expression and disease. The human genome may encode over 1,000 miRNAs that target over half of our genes. They might be implicated in a lot of common diseases (which not yet have been picked up in GWAS studies?). They are a fascinating area of biology that has only come of it’s on in the last decade. As such, the number of databases to catalog miRNAs is large. Today’s tip is on a new one, RepTar, which is reported in the upcoming NAR database issue. The niche RepTar is attempting to fill is to get predictions of miRNAs more comprehensive by including new research in the algorithm. This new research suggests there are more possible target sites than previously thought. As mentioned in the article,

Recently, the miRNA binding options were expanded further with the identification of ‘centered sites’, functional miRNA target sites that lack both perfect seed pairing and 3′-compensatory pairing and instead exhibit pairing with the target along 11–12 contiguous pairs at the center of the miRNA (4). While some algorithms relaxed the evolutionary conservation criterion (5–11) and/or offer also predictions of 3′-compensatory sites [e.g. (6,12,13)], few databases offer predictions of the whole repertoire of miRNA targeting patterns. Furthermore to date, no database lists genome-wide prediction of cellular targets of viral miRNAs. These miRNAs lack significant evolutionary conservation and their targets are not necessarily expected to be evolutionarily conserved. In addition, the few identified viral miRNA targets have shown both conventional seed binding and 3′-compensatory binding [e.g. (3,14)].

Here we present a database of genome-wide miRNA target predictions for mouse and human genes, based on the predictions of our novel target prediction algorithm, RepTar

I’ll leave the predictive value up to miRNA researchers, but I thought I’d introduce the site.

While I’m at it, allow me to list a few other miRNA sites from labs and institutes as far flung as China, Italy, Israel, Canada and the U.S.. Perhaps someday I’ll do a comparison.

CircuitsDB, which Jennifer did a great tip of the week tutorial on.

miRBase, which we have a full-length tutorial on.
PicTar, they have an annotation track for UCSC Genome Browser
PuTmiR (in relation to transcription factors)

two lists to catch some others: http://mirnablog.com/microrna-target-prediction-tools/ and  http://www.ncrna.org/KnowledgeBase/link-database/mirna_target_database

Elefant, N., Berger, A., Shein, H., Hofree, M., Margalit, H., & Altuvia, Y. (2010). RepTar: a database of predicted cellular targets of host and viral miRNAs Nucleic Acids Research DOI: 10.1093/nar/gkq1233

DNA Deniers


From Michael Pollan:


Michael Pollan and his flock became all aerated the other day when Michael tweeted this tidbit. It links to a story with quite the title:

The Great DNA Data Deficit: Are Genes for Disease a Mirage?

Srsly. That’s what it says.

What do I think this is? The second case of gene denialism that I have observed. (The first was a group disputing autism genes.)

I knew that after the genome came along there would be woo. I knew that snake-oil salesmen would be pitching purchases that would work with your skin genes. I knew there would be anti-aging compounds that work with your genes. I know there’s already a DNA diet, and vitamins sold to you based on your DNA. I’ve seen DNA dating. But honestly, I didn’t expect the DNA deniers.

Probably I should have seen it coming. I’ve followed a couple of different topics that flow with anti-science woo: anti-vaxxers and anti-GMOsters. There is overlap between these groups, but it’s not complete. But there is remarkable coincidence between their argument styles. Both groups make big claims, mostly unsourced–or if sourced are cherry-picked points or entirely misused. And when the science isn’t going their way, they deny the science, and then they move the goal posts.

This article–which purportedly blows away the connection between genes and disease–is appallingly mistaken. Let me be clear: genes can influence disease risk. Period. Of course environment may influence biology. Diet and exercise can affect health, certainly. Exposure to natural or man-made carcinogens can trigger cancer. And even the hardest core gene jocks know this.  But this desire to sever the connection between genes and diabetes–or prostate cancer, or Crohn’s disease–because they haven’t found a single smoking gun gene yet, using one kind of study?  That’s just bizarre and twisted. There are numerous examples of leads on complex disorders that are quite strong, insights into disease pathways and mechanisms, and we’ve really just started. And new technologies are opening new paths as well. A nice article on this was in Nature this fall: Genomics: The search for association.

Sure, we’ve wanted more data and stronger signals from GWAS (genome-wide association studies). But it turns out humans are inveterate outbreeders and it’s hard to tease out strong pointers from them.

Probably if you are a regular reader of this blog I don’t need to convince you of that. But for anyone else who stumbles across this let me offer some resources:

What I can’t quite figure out is why the authors of that post attempt to discredit all the work and all of the discoveries that have been made so far–and those we are going to unearth. As a relatively new strategy, and as we refine the tools, the populations study groups, and build on new knowledge, we are going to find more. And much had been done–check out the GWAS Catalog for an overview. Scroll down. And keep scrolling. As it says on that page: “As of 12/09/10, this table includes 725 publications and 3606 SNPs.

And I also can’t figure out why Pollan’s minions are celebrating this. Here are samples of the responses to this:

If this was true (which it certainly isn’t), why would this be a reason to chuckle? Why celebrate? I honestly don’t get it. The actual emotion ought to be embarrassment for the credulity.

But ok, you aren’t down with the human GWAS data right now–let’s look at other GWAS and see what’s coming out of that. There have been some really stunning examples of these studies in dogs. There was a talk a couple of years back that we watched: Genes for Complex Traits in the Domestic Dog. You can watch that online to learn more. An advantage of working with dogs is that they are highly inbred. A professor of mine in grad school once snarked that we can’t do that with humans–although that was part of the purpose of the Ivy League, he claimed–a couple of hundred years of intensive breeding and good pedigree records make gene hunting in the canine genome somewhat easier than it is in the messy human populations. Here’s another recent article on dog traits in NatureIf you think that genes don’t cause complex disorders, you have to dispute that some dog breeds are prone to anxiety due to their genes. And that some are prone to deafness–it’s clearly the Dalmatian lifestyle, right? Or that dobermans are bringing their narcolepsy on themselves somehow. [Seriously--they are narcoleptic? Who knew....]

Clearly the authors have an agenda. At the end they make their case:

Nevertheless, most governments cooperate far more, for example, with their food industries than with those who wish to eat a healthy diet. The laying to rest of genetic determinism for disease, however, provides an opportunity to shift this cynical political calculus. It raises the stakes by confronting policy-makers as never before with the fact that they have every opportunity, through promoting food labeling, taxing junk food, or funding unbiased research, to help their electorates make enormously positive lifestyle choices.

Author Latham goes further at HuffPo (home to mucho woo of many stripes) emphasis mine:

­That means environment must be the entire cause of ill health, i.e. junk food, pollution, lack of exercise, etc. The reason we wrote an article about human genetics (when we are a food and agricultur­e website) is that we believe that if people live right, agricultur­e and therefore the planet will more or less fix itself.

I don’t care if you want to discredit the food industry and if you hate Big Ag and want to say so on your own blog. But misusing and discrediting science and the efforts of scientists that have nothing to do with that is a stupid and flawed strategy. And Michael Pollan: please use better judgment before hitching your agenda to deniers.

This piece of tripe is one of those sorts of sciency-ness things that Mike the Mad Biologist once hailed as having The Asymmetric Advantage of Bullsh-t.  It has multiple levels of crap. And there isn’t a comment feature on it, so you can’t discuss it over at their site. I will look for other responses to this item and collect them here if I find them, or add them in the comments if you have them.  Anyway, I’m sure someone will take on the #FAIL in other parts of that post–there are plenty of opportunities. I wanted to address the denial aspect.  I agree with Deanna–wow–and I’d love to see a good Fisking by Genomes Unzipped–and it may be coming.

Top tweet on this so far goes to @emmecola:

But scientists, my plea to you: don’t let the DNA deniers get a foothold on this topic. We’ve seen what happened with anti-vaxxers. Like that group, this could affect the public health if people start dismissing real risks of colon cancer and subsequent screenings, or forgoing treatment for their psychiatric disorders because someone told them they could fix it with an organic carrot. There are real consequences to this.

Baker, M. (2010). Genomics: The search for association Nature, 467 (7319), 1135-1138 DOI: 10.1038/4671135a

Cyranoski, D. (2010). Genetics: Pet project Nature, 466 (7310), 1036-1038 DOI: 10.1038/4661036a

1. Here’s a take on it from Genomes Unzipped: Estimating Heritability Using Twins

2. Here’s a take at HuffPo–fount of crap science: Is There a Genie in the Genome? I am embarrassed to link to it, but did think this bit was funny: “suggests that genomics is one part boondoggle, one part conspiracy by the military-industrial establishment.” Snorf.

3. Oh, my–look what the twitter fairies just dropped on my desk. An evidence-based review of diabetes:  Genomics, Type 2 Diabetes, and Obesity and be sure to look especially at Table 1.

4. Another discussion of it from Mike at ScienceBlogs: GWAS FIGHT! (Hiss! Snarl!): Déja Vu All Over Again and check out Daniel MacArthur’s comment over there #FTW.  (Hat tip to GenomeWeb).

5. Oh, FFS: Marion Nestle catches teh stupid too. And Marion–this is not a “study”. It’s a polemic. You should know better than that. You have an appropriate degree. Pollan I can sorta cut some slack–he’s got an English art degree. 6. And Daniel brings the shredder. This is great: Bioscience Resource Project critique of modern genomics: a missed opportunity

7. ROFL: Keith snarks it up with this title: The Great Health Data Deficit: Are Environmental causes for Disease a Mirage? That was a fun start to my day.

8. my GenomiX summarizes the conflama with La complessità dei viventi è un dato di fatto. Teh Google tells me this says “The complexity of living beings is a fact”. Sì. (By the way, translators are getting much better–I think. That looked great to me.)

9. Mary Carmichael #FTW–she does a multi-media smackdown of “environmental determinism” with a great essay and accompanying video.

10. Mike the Mad Biologist weighs in with: On Genetic Denialism

Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

  • RoBuST “has been developed as root and bulb plant community research platform for integrated analysis of root and bulb genomics data.” Cool. I’m a big fan of roots and bulbs–oh, crap,  just realized I forgot to buy carrots for the Pav Bhaji.  Will try to get them tomorrow at the farmer’s market or Faneuil.  [Mary]
  • FEAST is a sensitive local alignment program with multiple rates of evolution. An interesting project as part of a Ph.D. thesis :). I haven’t tried it yet, but from the commentary, it looks good. [Trey]
  • Because Trey often talks about the CLOCK gene, I found this set of Nature papers interesting: Editor’s Summary – Clocking on to diabetes [Jennifer]
  • From BioMed Central: CIG-DB: the database for human or mouse immunoglobulin and T cell receptor genes available for cancer studies plus a link to the actual site (free, no registration required):  CIG-DB [Jennifer]
  • announcement: GMOD Europe 2010, 13-16 September 2010, Cambridge UK [Jennifer]
  • As most parents and anyone who has watched a child over time knows, a large portion of our personalities are genetic. But like height and sexuality, they aren’t easily reduced to single (or even multiple) gene causes as this recent GWAS research is showing. [Trey]
  • There’s a site that is fielding questions about predominantly on Next-Gen type sequencing related issues: http://i.seqanswers.com/ [Mary]