Drama surrounding the $1000 genome erupts every so often, and earlier this year when the HiSeq X Ten setup was unveiled there was a lot of chatter–and questions: Is the $1,000 genome for real? And some push-back on the cost analysis: That “$1000 genome” is going to cost you $72M. A piece that offers nice framework for the field of play is here: Welcome to the $1,000 genome: Mick Watson on Illumina and next-gen sequencing. Aside from the media flurry, though, what matters is the data. And not many people have had access to the data yet.
Via Gholson Lyon, I heard about access to some:
— Gholson Lyon (@GholsonLyon) July 30, 2014
The Garvan Institute of Medical Research, DNAnexus and AllSeq have teamed up to offer the genomics community open access to the first publicly available test data sets generated using Illumina’s HiSeq X Ten, an extremely powerful sequencing platform. Our goal is to provide sample data that will allow you to gain a deeper understanding of what this technological advancement means for your work today and in the future.
My focus won’t be this data itself–but if you are interested in many of the technical aspects of this system and their process, have a listen to this informative presentation by Warren Kaplan from Garvan:
The sample data is derived from a cell line, the GM12878 cells. These cells are from the Coriell Repository here: Catalog ID: GM12878. Conveniently, this is one of the Tier 1 cell lines from the ENCODE project too, so there is other public data out there on this cell line–which I have explored in the past and knew some things about.
There are 2 different data sets of the sequence in the download files, and one of them is available in the browser to view. I’m sure the Genoscenti will be all over the downloadable files. But because I’m always interested new visualizations, I wanted to explore the genome browser they made available. Although I had heard of Biodalliance before, we hadn’t highlighted it as a tip, so I thought that would be interesting to explore. Biodalliance is a flexible, embeddable, extensible system that’s worth a look on it’s own, besides delivering this test data. And if you come by at a later date and the X Ten data is no longer available, go over to their site for nice sample data sets. Their “getting started” page has a nice intro to the features.
In the video, I’ll just take a quick test drive around some of the visualization features with the X-Ten GM12878 data. I’ll look at a couple of sample regions, one with the SOD1 gene just to illustrate the search and the tracks. And I’ll look at a region that I knew from the previous ENCODE CNV data had a homozygous deletion to see how that looked in this data set. (If you want to look for deletions later, search for the genes OR2T10 or UGT2B17).
Note: the data is time-sensitive–apparently it’s only available until September 30 2014. So get it while it’s hot, or browse around now.
Test data site: http://allseq.com/x-ten-test-data
Biodalliance browser software details: http://www.biodalliance.org/
Down T.A. & T. J. P. Hubbard (2011). Dalliance: interactive genome viewing on the web, Bioinformatics, 27 (6) 889-890. DOI: http://dx.doi.org/10.1093/bioinformatics/btr020
Check Hayden E. (2014). Is the $1,000 genome for real?, Nature, DOI: http://dx.doi.org/10.1038/nature.2014.14530
Dunham I., Shelley F. Aldred, Patrick J. Collins, Carrie A. Davis, Francis Doyle, Charles B. Epstein, Seth Frietze, Jennifer Harrow, Rajinder Kaul & Jainab Khatun & (2012). An integrated encyclopedia of DNA elements in the human genome, Nature, 489 (7414) 57-74. DOI: http://dx.doi.org/10.1038/nature11247
Garvan NA12878 HiSeqX datasets by The Garvan Institute of Medical Research, DNAnexus and AllSeq is licensed under a Creative Commons Attribution 4.0 International License