"Research news gives biologist a heart attack"

The GenomeWeb blog had a “This week in Genome News” post that lead with:

In Genome Biology this week, scientists at the University of Rochester studied the evolution of R1 and R2 retrotransposons across 12 Drosophila genomes. These specifically insert into the 28S rRNA genes. They found that most copies of R1 and R2 in each species were found to exhibit less than 0.2% sequence divergence, suggesting that all copies are relatively new. In looking at target DNA cleavage and synthesis, they found that each active element generates its own independent lineage and that both R1 and R2 use “imprecise, rapidly evolving mechanisms” for second strand synthesis.

I read it wrong. I thought it said that there was less than 0.2% difference in copies _between_ species. I had a minor heart attack. Why?

Well, I did my Ph.D. research on R1 and R1 retrotransposons in the same lab. Our research at the time suggested that these elements have been stably maintained (and diverged) within each species, the lineages were quite distinct (this was in contrast to other transposable elements, DNA elements like Hobo, that seem to just fly across species boundaries like SouthWest planes). Suddenly, I had this minor surreal “was all that research a strange dream 15 years ago” moment? Did the genomic era of complete genomes so totally upend everything we did? That couldn’t be right.

Of course a quick scan of the paper (and rereading the article)  quickly made me come to my senses. No, it wasn’t a dream, my Ph.D. research, this paper from the Eickbush lab was a good study of copies within a species lineage.

In fact, the results seem to really help understand the mechanisms of how these elements propogate and transpose.

Most copies of R1 and R2 in each species were found to exhibit less that 0.2% sequence divergence. However, in many species evidence was obtained for the formation of distinct sublineages of elements, particularly in the case of R1. Analysis of the hundreds of R1 and R2 junctions with the 28S gene revealed that cleavage of the first DNA strand was precise both in location and in the priming of reverse transcription. Cleavage of the second DNA strand was less precise within a species, differed between species, and gave rise to variable priming mechanisms for second strand synthesis.

But what interested me is that the data came from genomics analysis, specifically a BLAST of the trace archives at NCBI. I joke with my students when teaching how to use UCSC Genome Browser, NCBI tools, Galaxy, etc. that today, with the genomes completed, I could have completed my Ph.D. research in a few months instead of the few years it took (ok, so I could have shaved a year or so off back then.. but.. well :D) using the data available to me in today’s databases. Well, after reading this paper, I’m thinking that might be an interesting exercise.

One thing I found myself wanting in the Materials and Methods was an annotation track using UCSC Genome Custom Tracks or Ensembl’s system and an analysis history using Galaxy. This isn’t really a criticism targeted to the paper, 99% of papers don’t seem to have this. But these types of details (simply accessed, in-context data tracks and well-delineated analysis histories) really would help promote research on the topic. I’ve always known that on a theoretical level, but wanting to actually get to the data (like I’ve done before recently and for this paper), brings that out in full relief.

Oh, and about that heart attack. When I told my colleague about this (she was in the same graduate class in the department), and about reading it wrong and my surreal moment, she said: “Yeah, you and about 4 others in the world who care about this stuff were probably panicking.”

Hey, it’s at least a few times that :-/.

2 thoughts on “"Research news gives biologist a heart attack"

  1. Max

    I absolutely agree that journals/referees should oblige authors to add their data in “custom track” format to the papers. Actually, a fasta file might be better, as it is still mappable when the assemblies are gone. Same applies for source code: a genomics paper without source code is quite useless, I think.

    An analysis track, “galaxy style”, is usually impossible, not everything can be done with galaxy. I think adding source code as supplemental data would be already a big step.

Comments are closed.