I started my Ph.D. studies into the evolution of non-LTR retrotransposable elements in 1990 and found the world of mobile genetic elements or transposons (aka a long time ago… jumping genes) to be varied, complicated and fascinating. In 1993 I discovered the web. I’ve hoped for and searched for a database of these “Mobile genetic elements [that] are self-contained genomic units capable of proliferating within their host genomes”1 to little satisfaction. Such databases exist such as the Mouse Transposon Insertion Database and the dbRIP (human retrotransposon polymorphisms in humans) and several others. But these almost entirely organism-specific databases. This helps the study of the organism (mobile elements have an effect on the genome), but little in the study of transposons as a class.
As the abstract of the paper states:
In this article, we introduce the Gypsy Database (GyDB) of mobile genetic elements, an in-progress database devoted to the non-redundant analysis and evolutionary-based classification of mobile genetic1 elements.
Great! With reservations. Let’s quickly summarize what is in this database before I continue the discussion on what I’d like to hopefully see if it is to fulfill it’s potential.
This first iteration of the database focuses on the TY3/Gypsy elements and Retroviridae. Hence the database name “Gypsy” (though that’s my first beef, see below :). These are a phylogentic class of “LTR” retrotransposable elements and retroviruses. The TY3/Gypsy elements are very similar in their coding structure to retroviruses (gag-pro-pol-env proteins), but are not themselves viruses. (just FYI, Retroviruses, LTR and non-LTR retrotransposons are assumed to have a common ancestor but there is some discussion as to which was first2. LTR elements make up about 8% of the human genome and the non-LTR element LINES make up another 13% and SINES… well, you get the picture3).
This first version of the database includes “exhaustive [phylogenetic] analysis of 120 non-redundant Ty3/Gypsy and Retroviridae full-length genomes.” Their analysis was a majority rule consensus tree of a gag-pro-pol alignment from these. They also provide phylogenies based on the gag polyprotein, the pol polyprotein and all pol protein domains, and the env polyprotein and those done with the Neighbor Joining and Parsimony methods.
Also, included in the database are BLAST and HMM servers. These allow you to search the database with a sequence of interest.
Additionally, there is a literature search of ty3/Gypsy-related citations.
So, some thoughts. Though I understand that they are focusing on ty3/gypsy elements, they aspire to be a much larger database of all mobile elements. So, why the name “Gypsy”? Ok, it’s a small, and possibly petty ;-D, beef, but if they want a larger audience, another name could be used. Maybe JUMP! (someone’s going to have to come up with a nice acronym for that) or the McClintock Database of Mobile Elements.
The phylogeny is good. Minor quibbles with some of the analysis, but nothing worth noting. What would I like to see? More interactivity. I would like to be able to select a subset of sequences and/or elements and redo the trees with another analysis, perhaps allowing me to choose one of the basic phylogenetic methods (NJ, ML, Parsimony, etc) and parameters. That way, if I’m interested in say.. only gypsy elements in flies, I could select those, select the parts of them I want to analyze and do it. That would be powerful.
I’d like to be able to view and sort the elements by species.
It’d be nice if there were some type of wiki for the descriptions.
I would also like to see some sort of automatic submission of the data, pulling known non-redundant elements from the various completed genomes and databases. Though I am sure that is a herculean task.
They provide an empirical example in their paper of how to use the database to analyze a retrovirus. It shows the promise of the database.
It could be _the_ mobile element database to go to. I’ll be looking forward to see what their second iteration is like.
1.Llorens, C., Futami, R., Bezemer, D., Moya, A. (2007). The Gypsy Database (GyDB) of mobile genetic elements. Nucleic Acids Research, 36(Database), D38-D46. DOI: 10.1093/nar/gkm697
2. Flavell, A, 1999, ‘Long terminal repeat retrotransposons jump between species’, Proceedings of the National Academy of Sciences, vol. 96, no. 22, pp. 12211-12212. 10.1073/pnas.96.22.12211
3. Lander, E, Linton, L, Birren, B, Nusbaum, C, Zody, M, Baldwin, J, Devon, K, Dewar, K & Doyle, M, 2001, ‘Initial sequencing and analysis of the human genome’, Nature, vol. 409, no. 6822, pp. 860-921. 10.1038/35057062