The Lancet paper, Clinical assessment incorporating a personal genome, has held my fascination this weekend (yes, I read it at the beach). Mary posted Friday and again Saturday on the paper and related NPR segment. It feels to me to be a seminal paper, though I do agree with Daniel at Genetic Future, there are a lot there we still don’t know. A large portion of the variation is in non-coding regions, and thus predictions and propensities are hard to come by with the available analysis. In fact, as he pointed out, many of the coding region variations have little information as to their effect on disease. I would add also that even if we get to that holy grail of $1,000 to sequence a personal genome, this kind of extensive analysis would still be time and cost-prohibitive for the vast majority of sequenced genomes.
Yet, as with all early steps in science and medicine, there’s missing pieces, large gaps and huge efforts (think “space travel,” “computers,” “microwave ovens,” “internet,”) that over time become inexpensive and commonplace (ok, so the former isn’t necessarily “inexpensive”). Sequencing genomes will become inexpensive before the analysis does, but both will come. And I think this paper is pointing to that future.
The other hurdle to large scale personal genomics I see (of course) is the understanding and use of the genomics and data resources. The authors use a large (and excellent, in my opinion) suite of genomics resources to do obtain data and do their analysis. I’ll list them here with links in alphabetical order:
All of these resources have a wealth of data, but even then, that is a lot of analysis and familiarization that is needed with each tool. Each tool does have documentation and tutorials, and of course OpenHelix has tutorials on many of the ones mentioned (those with linked “T”s after the name). Still, this one analysis took a large number of tools and familiarization.
The paper does have a pretty good figure (figure 1) outlining the analysis process. For example, they SIFTed the genome to find gene-associated, non-synonymous, rare and novel and disease associated variations and then analyzed those using dbSNP, HGMD, OMIM and PubMed to analyze something like HFE2 which might have an association with Haemochromotosis. One of my quibbles with the paper, as often is with these papers, is that there isn’t a good methods ‘walk-through’ of the paper using something like Galaxy or Taverna in a history or workflow that would help reproduce the analysis.
We also have a tutorial I’d like to point you to, one that walks through a similar process and teaches users the basics of walking through that process. You can find this tutorial here, it’s free and publicly available. The tutorial walks the user through the analysis of a gene variation, in this case in the CYPC9 that effects an individual’s response to Warfarin. There is a similar variation (different gene, affects same drug response) in the paper. The tutorial uses the NIEHS SNPs site to get an overview of the variation including SIFT and PolyPhen predictions, then to the UCSC Genome Browser to find an overview of the region, walks through the dbSNP information and does a quick tag SNP analysis using GVS. That tutorial is only one very small step in what will have to be a immense education into genomics and genomics resources.
That is all to point out that the paper is an fascinating first step, and as a first step suggests the gaping holes we will have in bringing personal genomics to medicine.
Ashley, E., Butte, A., Wheeler, M., Chen, R., Klein, T., Dewey, F., Dudley, J., Ormond, K., Pavlovic, A., & Morgan, A. (2010). Clinical assessment incorporating a personal genome The Lancet, 375 (9725), 1525-1535 DOI: 10.1016/S0140-6736(10)60452-7