Transposon Database

ResearchBlogging.orgI started my Ph.D. studies into the evolution of non-LTR retrotransposable elements in 1990 and found the world of mobile genetic elements or transposons (aka a long time ago… jumping genes) to be varied, complicated and fascinating. In 1993 I discovered the web. I’ve hoped for and searched for a database of these “Mobile genetic elements [that] are self-contained genomic units capable of proliferating within their host genomes”1 to little satisfaction. Such databases exist such as the Mouse Transposon Insertion Database and the dbRIP (human retrotransposon polymorphisms in humans) and several others. But these almost entirely organism-specific databases. This helps the study of the organism (mobile elements have an effect on the genome), but little in the study of transposons as a class.

So, I was happy to see the latest article in the NAR database issue on GypDB

As the abstract of the paper states:

In this article, we introduce the Gypsy Database (GyDB) of mobile genetic elements, an in-progress database devoted to the non-redundant analysis and evolutionary-based classification of mobile genetic1 elements.

Great! With reservations. Let’s quickly summarize what is in this database before I continue the discussion on what I’d like to hopefully see if it is to fulfill it’s potential.

This first iteration of the database focuses on the TY3/Gypsy elements and Retroviridae. Hence the database name “Gypsy” (though that’s my first beef, see below :). These are a phylogentic class of “LTR” retrotransposable elements and retroviruses. The TY3/Gypsy elements are very similar in their coding structure to retroviruses (gag-pro-pol-env proteins), but are not themselves viruses. (just FYI, Retroviruses, LTR and non-LTR retrotransposons are assumed to have a common ancestor but there is some discussion as to which was first2. LTR elements make up about 8% of the human genome and the non-LTR element LINES make up another 13% and SINES… well, you get the picture3).

This first version of the database includes “exhaustive [phylogenetic] analysis of 120 non-redundant Ty3/Gypsy and Retroviridae full-length genomes.” Their analysis was a majority rule consensus tree of a gag-pro-pol alignment from these. They also provide phylogenies based on the gag polyprotein, the pol polyprotein and all pol protein domains, and the env polyprotein and those done with the Neighbor Joining and Parsimony methods.

Also, included in the database are BLAST and HMM servers. These allow you to search the database with a sequence of interest.

Additionally, there is a literature search of ty3/Gypsy-related citations.

So, some thoughts. Though I understand that they are focusing on ty3/gypsy elements, they aspire to be a much larger database of all mobile elements. So, why the name “Gypsy”? Ok, it’s a small, and possibly petty ;-D, beef, but if they want a larger audience, another name could be used. Maybe JUMP! (someone’s going to have to come up with a nice acronym for that) or the McClintock Database of Mobile Elements.

The phylogeny is good. Minor quibbles with some of the analysis, but nothing worth noting. What would I like to see? More interactivity. I would like to be able to select a subset of sequences and/or elements and redo the trees with another analysis, perhaps allowing me to choose one of the basic phylogenetic methods (NJ, ML, Parsimony, etc) and parameters. That way, if I’m interested in say.. only gypsy elements in flies, I could select those, select the parts of them I want to analyze and do it. That would be powerful.

I’d like to be able to view and sort the elements by species.

It’d be nice if there were some type of wiki for the descriptions.

I would also like to see some sort of automatic submission of the data, pulling known non-redundant elements from the various completed genomes and databases. Though I am sure that is a herculean task.

They provide an empirical example in their paper of how to use the database to analyze a retrovirus. It shows the promise of the database.

It could be _the_ mobile element database to go to. I’ll be looking forward to see what their second iteration is like.

1.Llorens, C., Futami, R., Bezemer, D., Moya, A. (2007). The Gypsy Database (GyDB) of mobile genetic elements. Nucleic Acids Research, 36(Database), D38-D46. DOI: 10.1093/nar/gkm697

2. Flavell, A, 1999, ‘Long terminal repeat retrotransposons jump between species’, Proceedings of the National Academy of Sciences, vol. 96, no. 22, pp. 12211-12212. 10.1073/pnas.96.22.12211

3. Lander, E, Linton, L, Birren, B, Nusbaum, C, Zody, M, Baldwin, J, Devon, K, Dewar, K & Doyle, M, 2001, ‘Initial sequencing and analysis of the human genome’, Nature, vol. 409, no. 6822, pp. 860-921. 10.1038/35057062

9 thoughts on “Transposon Database

  1. Pingback: Research Blogging and CrossRef | The OpenHelix Blog

  2. Carlos Llorens

    Hi

    This is Carlos

    Whoever you are, thank you very much for your nice comments to my database, they have arrived me.

    A few words

    All the questions you address is just I want to do, but require a time because as you can imagine involves a little effort, mobile genetic world is not easy to manage.

    The main name “Gypsy” was just adopted by the reason you note. I do not think that a name or other would particularly relevant. However If other authors claim for other name I will have no trouble into change it into a more appropriate.

    Now I am trying to see the way to automate the database to let other authors to improve the project by them. This is a long-term project and I want to improve it with my insights but I do understand that there are other ideas and perspectives and indeed the diversity in mobile genetic elements is so large to let other researchers interested in the area to enter. That is the idea, because it is the best way I see to make the database always interesting and informative. I am working on it.

    Best and thank you

    Carlos

  3. Trey

    Greeting Carlos,

    This is Trey, one of the three bloggers on this blog and I posted this.

    Thanks for commenting. I definitely will be checking back and looking for the growth of your database! It’s pretty good and has great potential.

    I love transposable elements.

  4. Pingback: TE insertions in genomes | The OpenHelix Blog

  5. china bao

    Hi
    This is bao ,from china.
    I admire your idea of builting a databse of transposons.
    Now I have a question want to ask you .
    that is if I have a sequence,about 5oobp,how I can know whether there is a transpons element in this sequence,scine your blog is the first articles in which I found someone is building this database.I have consider useing bioinfmatic method to try,but it need a lot of knowledge about computer programing.
    if you can give me a recall.I am very glad

  6. Trey

    Bao,

    Just to clarify, the GypDB is Carlos’ and his team’s idea :) (though I’ve wanted something like this for a long time and looking forward to watch how it’s developing).

    In answer to your question, the answer isn’t very straightfoward I’m afraid.

    500bp (or did you mean it to be 500kb?) isn’t much sequence information. Do you have an idea what kind of transposon you might suspect (DNA? LTR retroposon, non-LTR?). That might help limit where you do your search.

    First suggestion would be to use Repeat Masker which “screens DNA sequences for interspersed repeats and low complexity DNA sequences. ” The use is self-explanatory on the site I think.

    You could blast using the GypDB if you suspect a retroposon, or perhaps just blast the sequence against the genome of interest.

    I hope that gets you started.

  7. Pingback: So long SSAHA | The OpenHelix Blog

  8. MRA

    Hi!

    I’m also interested in mobile genetic elements database. What do you think about ACLAME? I’ve just discovered it…

Comments are closed.