Subtitled: the data is not in the papers anymore. Again. And again.
As the data deluge continues, and those next-gen sequencing setups and labs continue to crank out more and more data, the details cannot be captured in the papers anymore. They just can’t. Authors can summarize the key findings, and show compelling examples and representative pieces. But they simply can’t show the volume of data that comprises the complete oeuvre from a given project anymore.
This is a point we keep hammering on. Knowing how to effectively use the software that stores and displays this data is now just as important as learning how to read publications in the first place. In the stone age when I was in grad school, most of what you needed to grasp from a paper was within the text and figures in the main body. Those days are gone in genomics, and they are never coming back. However, the software has limitations too. I’ll get to that later…
I was alerted to this interesting paper on Google+ by Robert West (but the specific item was unlinkable, sorry). The research involves analysis of the human mitochondrial transcriptome. Which even as a 1-off sort of assessment would have been interesting. But this group evaluated the transcriptome in over a dozen tissues and cell lines. That’s a lot of data.
And the paper summarizes key highlights–like the fact that the transcriptome does vary by tissue. Heart and muscle have different energy requirements and it appears to be reflected in their mitochondria at the level of transcript abundance. And there is a terrific Circos diagram (Figure 1) to summarize a lot of what they examined and mapped.
But: there’s no way for you to convey in a traditional publication all of those results. No. Way. And yes, I realize there are 6 large supplements attached to this paper. But that’s still not good enough.
I am delighted to report that there is a whole genome browser provided for a large fraction of this data: http://mitochondria.matticklab.com . That is da bomb, IMHO.
So in this week’s tip of the week will show you how to look at the data from this paper in the custom GBrowse that was built for this paper. We’ll have a look at how to display the tracks you want to explore.
As great as this special browser is, though, this paper made me aware of a limitation of this representation as well. The team of researchers was also interested in nuclear-encoded genes for mitochondrial proteins. Also intriguing to think about–because you can also imagine tissue-specific issues around nuclear gene expression impacting the functions of the mitochondria. But what you can’t do in this browser is layer that on. I mean, I can imagine a way to kludge that together, in fact. You could add those genes to one end of the linear representation, with some spacers, and sort of fake it out. Like this I mean pretend this is the reference sequence:
===nuclear gene1===nuclear gene2===nuclear gene3===mitochondrial genome===
And then you could compare them all together. But it’s certainly a work-around rather than a real complex visualization. We need better visualization tools. (I have a thought here that a custom Caleydo would work, but I’d be interested in other ideas too).
So that’s what I think, to summarize: the data’s not in the papers; you need to be as adept at software are you are at reading; and we need more and better visualization tools. But this was one cool example of all of that, plus a very cool and informative set of results. I’ve been thinking about this for a while since I read it. And those are my favorite papers–the ones that make me think about a whole bunch of different things.
Human mitochondrial transcriptome browser: http://mitochondria.matticklab.com
Mercer, T., Neph, S., Dinger, M., Crawford, J., Smith, M., Shearwood, A., Haugen, E., Bracken, C., Rackham, O., Stamatoyannopoulos, J., Filipovska, A., & Mattick, J. (2011). The Human Mitochondrial Transcriptome Cell, 146 (4), 645-658 DOI: 10.1016/j.cell.2011.06.051