What would you do with your genome?

So last week I treated myself to my first vacation in a long time.  It was my birthday, and I wanted to disconnect a bit and recharge.  Mostly it worked, although the hundreds of emails I’m facing this morning are a bit daunting.  But just before I left I got an email from a colleague who asked me a really great question:

….I would love to know where you would start when you get back a personal genome sequence….

And I couldn’t shake this out of my head.  I was sitting on a bridge outside Windsor Castle thinking about it as the sun set on my first day.  (On subsequent days I found that the far superior ciders in the UK were able to push this question out of my head for some periods of time. And also pie.)

I’ve spent some significant time thinking about the onslaught of personal genomics, of course.  It’s all been very theoretical, because I would have refused to even begin the process of obtaining my personal genome sequence until the GINA legislation fully kicked in.  But now that barrier is down.  I’m still not ready to get mine done for a variety of reasons (cost, quality, informative value).  But it’s still worth thinking about what I would do with it if it was handed to me–in specific terms, with concrete actions.  So here’s what I decided I would do.  Your mileage may vary.  And I’d love to hear what others might do with theirs.  Follow the link for the specific actions I’d take.

This assumes a fully-sequenced genome.  I don’t want a SNP check.  I want all the nucleotides.

Step 1: Assessment and QC

Assessment: The first thing I would do would be to open the files and figure out the formats.  I have no idea what the sequence might be like.  Do you get the read data? Do you get summary FASTA files?  Do you get processsed/annotated stuff?  What’s in there?  So I’d assess the files.  Here’s what I’d want: I would want access to the raw output data (I’m assuming short-read or somewhat longer-read data. Technology may change, though).  We’ve seen in the past that there can be issues with the raw data, and if there was anything I was going to make any sort of a health-related decision on, I would want to go all the way back to the read data and check the quality.  It may also be that software improvements come along and you’d want to re-build the summary data.  But mostly I’d like to work with a summary/consensus file in FASTA format for my own analysis.  I’d also like more processed output from the provider, with annotations.  But I would use those as a starting point and verify it all myself probably anyway.  Although as I say this now, I think I would have asked about the output files prior to committing to a service provider for this.  And I would have requested all of the files in my contract.

QC: My next step would be to take chunks of the data and look at sequence alignments of my sequence vs the reference sequence.  For example, I would take maybe a chunk of chromosome 21 and look closely at it.  Not just the known genes, but all the other pieces as well.  According to the GRCh37 view of Chr21 reference sequence I just called up on the UCSC Genome Browser there are over 48 million bases.  I’d take chunks that had known genes, and regions with no genes, maybe avoid the centromere and the very ends, but I’d use that to look at the data nearly letter-by-letter probably with BLAT.  If there were variations every 10 bases I’d be very concerned.  If my variations appeared to correspond with known SNPs, I’d be more confident in the quality.  I’d pull up some regions with known repeats and have a hard look at how those had been handled. (I’d also do a second check using BLAST at NCBI on the official reference sequence to make sure the conclusions were about the same. Can’t help myself, I love to QC things….).  I’d also probably spot-check some other chromosomes and other regions.  I should also say that I mean not only single-nucleotide variations (SNPs) but also copy-number variations (CNVs).  I’d be extra-curious if CNV-sized chunks were observed in my own genome and where they were, and how they were handled.

But let’s assume the quality is respectable.  Then I’d start to look at more targeted pieces.  I would look at well-known genes.  Not necessarily disease genes (although those are certainly in the pool).  I would look at how my collection of variation compares to highly-studied genes.  I might also look across the deeply examined ENCODE data too, since that’s been re-examined now as well.  Probably I’d look at the genomes of some of the other individuals who have been sequenced for comparison.

I’d probably have to stop at this point and get back to my real work.  I also have mixed feelings about how much I want to know about my own “disease” variations right now.  So I’d need some more thought about how to process that psychologically before I went to look into the regions that are outside of the flashlight areas.

Step 2. Get someone to build me a genome browser

Yeah, I might be able to follow the GBrowse instructions and build my own. Maybe build my own UCSC Browser since I know that one so intimately.  It might be possible to get away with just creating your own DAS or custom track to load into an existing GBrowse or UCSC Browser which I could definitely accomplish on my own.  But it seems to me to be most useful over the long term I’d want my own coordinates, and my own database that I could curate as I go along and and new data comes out on variations, etc.  Would want to add my own personal notes to some pieces.  So I’d lean to full browser, with a curation pipeline, with my own genome as the reference sequence/coordinates.  And it strikes me that it would be most time- and cost-effective to have someone do it for me.  I’d pay for that as a service.

Step 3: Look closer, and ongoing monitoring

Ok, here’s where I hesitate a bit.  I’m honestly not quite sure what I want to know from my genome at this point.  I suspect I’d be unable to resist looking at longevity genes–we’ve seen some good long lives and healthy seniors in my family, and I’d want to see if I can hope for that.  I’d look at the tone-deaf genes as this is a long-standing problem in my family and probably good for a laugh.  I’d check on that curly hair gene to be sure mine was the same variation they’ve seen.  But here’s the hard part: do I want to know about the Alzheimer’s genes?  Do I want to know about the cancer ones?  I’m not sure that I do. Eventually I probably wouldn’t be able to resist this either.  I’d look at the NHGRI catalog of GWAS studies.  I’d check SNPedia.

And what about the other variations that aren’t in known genes?  I’d scan the regions.  I’d probably run GRAIL.  I’d probably set up a MyNCBI saved search to run regularly for papers that come out on either variations, regions, or diseases I’m concerned about.

But then what? Do I make changes based on what I’m seeing?  Do I alter my diet? Do I check my genome before getting prescriptions filled?  Do I discuss any findings with my siblings?  Do I drive myself insane with genome minutia?  Do I start hanging out with the people doing recreational genomics?  Quite frankly, I don’t know.

So that’s what I did on my vacation–planned the analysis of my genome.  It was actually a rather fun exercise.

What have I missed? What would you do?

8 thoughts on “What would you do with your genome?

  1. Devon Jensen

    Hi Mary,

    Like you I’m excited to one day get my genome sequenced! So much so that I’m building a genome browser right now. It is “consumer” / non-expert focused but I believe the platform will be adaptable and useful for more expert analysis too. I will be forming a company around the program but expect that personal and academic use will be free.

    It sounds like we have some similar ideas – one thing I’m adding is a place for personal notes on different genes and features.

    Given your history in testing biological software, I’d love to have you take a look at it when I reach the early beta stage. Expected in a month or two.

    I’ve been following the blog for awhile now, thanks to you and your team –

    Devon Jensen

  2. Mary Post author

    Oh, how very cool–I’d definitely give it a spin. Thanks for the opportunity Devon! Do ping me when you have something. I love testing, and am a great bug finder. Highly tolerant of alpha and beta software.

  3. Daniel MacArthur

    Hey Mary,

    As you might have guessed, I have some thoughts on this as well!

    Firstly, I’d make sure that I had already bought one of the cheaper SNP chip-based personal genomics products – these are extremely accurate, and provide at least half a million points of data for QC.

    In terms of downstream analysis: by the time you and I have our genomes sequenced I suspect the landscape for free genome browsers will have changed quite dramatically. But there’s also a pretty good chance that whatever company sequenced your genome will offer a decent browser itself, along with extensive functional annotation.

  4. Mary Post author

    Hey Daniel–

    Yeah, I thought you might :)

    I can see the appeal of the SNP chips, but it’s only part of what I’d want to know. And since I’m not in a hurry, I’m willing to wait. The person who asked me about this is trying to get into the next round of PGP, though–and that may happen more quickly than my timeframe.

    But the other point about downstream analysis: I would specifically not want to be tied to the provider’s browser. I mean, I’d look at theirs. But I would want my own that was not subject to the whims of their software changes, their proprietary stuff, and the possibility that they simply vanish.

    It’s still the wild west as far as I’m concerned. And we’re certain to discover that I carry the control freak gene in this little foray anyway….

    EDIT (afterthought): one of the other things I don’t want is for anyone else to see my personal annotations either.

  5. Bryan

    “Get someone to build me a genome browser.” <- Some people (ahem) are already working on this, if you want to hook up I might know a few names.

  6. Mary Post author

    Thanks, Bryan–but I’m good with the network….I have a significant number of people on my email/skype list that could do it. But I’m not even close to having a sequence yet.

  7. Kevin

    I can see wanting one’s own browser for privacy reasons, but keeping the annotation current and interpretable would be a major maintenance headache. Maintaining the reference genome at UCSC requires a large staff for QA, even though everyone is using that coordinate system. Trying to import the annotation to a custom browser would be a nightmare.

    Much better to put up your genome as a custom track on a well-maintained browser.

  8. Mary Post author

    But I’m really good at curation and QC, and I really like to do it. I know that’s nuts. I’m sure there’s a curator gene. I definitely have the same phenotype as others who like this.

    I think the larger problem is that I’ll get roped into curating for the family, which will be a time sink.

Comments are closed.