The data isn’t in the papers anymore. Again.

8 April, 2012 (08:51) | Genomics Research, Genomics Resource News, New Resource | By: Mary

I know this is a topic I keep hammering on. But I’m not sure that it’s really grokked by a lot of people who are not as deep into the bioinformatics aspects of biology today. Or those who support biologists, such as publishers and librarians, who may not be as immersed in the daily software aspects.

There was a nice post by Ed Yong last week about a paper published on sticklebacks. There are several cool things about this paper–but one of them is merely the fact that we can use the next-generation sequencing technology we have to examine species in ways that we just couldn’t before. And Ed made the point that there wasn’t only one genome in this paper–there were 21 genome sequencing events in this paper.

And because of the cool biological niches of these sticklebacks–it was possible to compare populations that varied in interesting ways. Some were fresh-water, some were salt-water based, and this could be examined in different regions of the planet to compare whether the same adaptations happened in different places for the same reason.

It really is a sweet paper. But it also serves another point of mine, that I keep making over and over again. The data is not in the papers anymore. The paper is a nice sort of summary statement of the work. But you cannot put 21 genomes in press–and a big list of A, T, G, and Cs wouldn’t be that valuable on paper anyway. You cannot show the analysis tracks in the papers. You can merely sample a subset of them. You can illustrate a few “compelling examples” as we used to call them at one place I worked.

But if you want to explore other features, or you want to build on this work yourself, you need to turn to the databases. The real magic happens there now–not in the papers. Back in the days of my training and early career, the papers were enough. They are not anymore. It’s not clear to me if publishers appreciate this fact entirely in this field.

And the authors offer a whole genome browser (based on the UCSC Genome Browser software platform) for their stickleback data. It’s quite lovely, actually–I’ll link to it below. It’s also an excellent demonstration of how to use existing open source software to craft a version for your needs.

Quick links:

Here’s Ed’s post on the key features of the work: Stickleback genome reveals detail of evolution’s repeated experiment

Look at the Sticklebrowser yourself. It’s actually rather lovely. And informative. http://sticklebrowser.stanford.edu

To learn to use UCSC Genome Browser based software, see the training materials sponsored by UCSC: http://openhelix.com/ucsc

Reference:

Jones, F., Grabherr, M., Chan, Y., Russell, P., Mauceli, E., Johnson, J., Swofford, R., Pirun, M., Zody, M., White, S., Birney, E., Searle, S., Schmutz, J., Grimwood, J., Dickson, M., Myers, R., Miller, C., Summers, B., Knecht, A., Brady, S., Zhang, H., Pollen, A., Howes, T., Amemiya, C., Baldwin, J., Bloom, T., Jaffe, D., Nicol, R., Wilkinson, J., Lander, E., Di Palma, F., Lindblad-Toh, K., & Kingsley, D. (2012). The genomic basis of adaptive evolution in threespine sticklebacks Nature, 484 (7392), 55-61 DOI: 10.1038/nature10944

Comments

Comment from Iddo
Time April 10, 2012 at 2:49 PM

Gigascience is coming at at opportune time.. http://www.gigasciencejournal.com/

Comment from Mary
Time April 10, 2012 at 3:09 PM

Yeah, that’s a very timely idea. But existing publishers need to grab hold of this concept too.

Pingback from Secret source code? | The OpenHelix Blog
Time April 12, 2012 at 4:31 PM

[...] pretty dire. But Jeremy Hsu does have a point. And it’s related to my ongoing issue of the data not being in the papers anymore either. And it’s further related to the repeated issue of us joking around BioStar that we [...]

Pingback from “The magic is finding questions” | The OpenHelix Blog
Time April 13, 2012 at 11:38 AM

[...] illustrates. This is excellent as a summary of how big data changed everything. And why I said in my post on Sunday: But if you want to explore other features, or you want to build on this work yourself, you need to [...]

Pingback from Bibliotheken en het Digitale Leven in de tweede April week van 2012 | Dee'tjes
Time April 14, 2012 at 7:24 AM

[...] The data isn’t in the papers anymore. Again. (The OpenHelix Blog) [...]

Pingback from Video Tip of Week: Bioproject, it’s where to start finding data (hint, not the papers so much anymore)) | The OpenHelix Blog
Time April 25, 2012 at 8:00 AM

[...] Mary (and we here at OpenHelix) keep not-so-gently reminding you, the data isn’t in the papers any more. Huge projects like 1000 Genomes, ENCODE and others [...]

Pingback from Video Tip of the Week: Publications track in UCSC Genome Browser | The OpenHelix Blog
Time June 20, 2012 at 7:53 AM

[...] processes can help. Of course, there’s also tons of stuff flowing into databases that’s not made it into the literature yet and may not ever in full, and that’s a whole ‘nother issue. But what if there was a [...]