In the recent database issue of NAR, there are two reports of transposable element (TE) databases. I already discussed one in an earlier post. That one is a database that includes Gypsy elements (non-LTR retroposons) and retroviruses and aims to be a database “devoted to the non-redundant analysis and evolutionary-based classification of mobile genetic elements.” Hopefully to develop into a database of all TE’s. The other paper, by Levy et al. (“TranspoGene and microTranspoGene: transposed elements influence on the transcriptome of seven vertebrates and invertebrates“)1, I’ll discus briefly here introduces a somewhat different database of transposable elements.
As the paper discusses, TE’s have been implicated in a large number of effects on the vertebrate genome: affecting expression and “contributing to genetic diversity, genomic expansion, genomic content and genomic rearrangements.” Whether these immense changes are the raison d’être for transposable elements or the byproduct of a parasitic DNA element, I’ll leave for discussion (though I would side with the latter), but they are unarguably major factors in genome evolution and don’t fit the definition of “junk DNA.” :D.
So, I welcome this database (actually two). The authors obtained a dataset of well-characterized protein coding genes and TE insertions (using the UCSC Genome and Table Browsers) from seven diverse species: Human, mouse, chicken, zebrafish, fruit fly, nematode and sea squirt. Using this data, and using Galaxy, they were able to obtain a dataset of TE insertions within exons. They also did an analysis to find data for ‘exonization events’ (TE insertions that created two exons), events that effect putative proximal promoters, human diseases (using OMIM), etc.
The result? A database of TEs influence on genomes.
I found the database, TranspoGene, quite easy to use. Basically, you put in our gene or genes of interest, choose what kind of TE you want (or all) in what organism (or all), which type of insertions you are interested in (or all) and search. I did a quick search on “BRCA1″ in all species, all kinds of insertions (exons, introns, promoters, etc) and found a long list of possibilities.
The database is a welcome addition to studying TEs. I do have a few minor quibbles. The results interface is difficult to read, in that the tables are so wide that one has to scroll way over to the side to see some useful and relevant links (to OMIM data, etc). This is due basically to the fact that the TE sequence is reported in one line. Minor quibble, but it does make it difficult to read. The results link to the UCSC Genome regions of interest, which brings me to something else I’d like to see, these data created as an annotation track in UCSC Genome Browser. We see this with VISTA and other databases, it’d be useful to have this data from the database viewed directly as an annotation track so I can see it in context and view the other data links from there.
Of course, now I can see this database and the one reported earlier somehow merged into a growing ‘one-stop-shop’ database to go to for TE evolution, effect, sequence, etc. data. Something I’ve dreamed about for a decade or so :).
Sidenotes: Genomics science is intensely international.. this is from Isreal, the other reported earlier is from Spain, UCSC is from California. It knows no boundaries. Also, as you can tell, my ‘interest of the day’ seems to be reverting to my Ph.D. field. That graphic above is from the database site. Kind of cracks me up.. little jumping jesters wreaking havoc in the genome :).
1. Levy, A, Sela, N & Ast, G, 2007, ‘TranspoGene and microTranspoGene: transposed elements influence on the transcriptome of seven vertebrates and invertebrates’, Nucleic Acids Research. 10.1093/nar/gkm949