NAR database issue (always a treasure trove)

The advance access release of most of the  NAR database issue articles is out. As usual, this this database issue includes a wealth of new and updated data repositories and analysis tools. We’ll be writing up additional more extensive blog posts on it and doing some tips of the week over the next couple months, but I thought I’d highlight the issue and some of the reports:

There are a lot of updates to many of the databases we know and love (links to go full text article): UCSC Genome Browser, Ensembl, UniProt, MINT, SMART, WormBase, Gene Ontology,  ENCODE, KEGG, UCSC Archaeal Browser, IMG/M, DBTSS, InterPro and others (we have tutorials on all those listed here).

And, as an indication of the explosion of data available (itself a subject of a database issue article, SRA), there are a lot of new(ish) databases on specific datatypes such as MINAS, a database of metal ions in nucleic acids (nice name :D); doRiNA, a database of RNA interactions in post-transcriptional regulation; BitterDB, a database of bitter compounds and well over 100 more.

And I’ll give a special shout out to my former PI at EMBL because I can, Peer Bork’s group has 4 databases listed in the issue: eggNOG, SMART, STITCH and OGEE. (and he and a couple members are on the InterPro paper also).

This is going to be a wealth of information to wade through!

New databases and resources from NAR database issue

If you haven’t noticed, articles in the Database issue of Nucleic Acids Research have been going to Advance Access in the last week . There is a wealth of new resources and databases, as always, in this issue. I’ll be going through these in the coming month or so and will post more in depth reviews of them then, but I’d thought I’d list some that were released Friday, go to the link above for even more from earlier in the week:

AmoebaDB (article)
MicrosporidaDB (article)
BriX (article)
WebGeSTer DB: a transcription terminator database. (article)
SCLD: Stem cell lineage database (article)
OrthoDB: Hierarchal  Catalog of Eukaryotic Orthologs (article)
CaSNP: a database for interrogating copy number alterations of cancer genome from SNP array data (article)

Stuff read over the weekend…

Just a few links for your reading pleasure from the last week.

While the mainstream news is reporting on the demise (redefinition) of the ‘gene‘, some high schools kids are doing amazing things with ‘genes.’

Oh, and if, like us, you can’t wait till the annual NAR database is published officially, you can always check out the advanced online publication of the articles to find new and updated databases (like the SpBase, sea urchin, database that went public earlier this year and SuperToxic, a database of over 60,000 toxic compounds) and genome resources! :D

TE insertions in genomes

transposon graphicIn the recent database issue of NAR, there are two reports of transposable element (TE) databases. I already discussed one in an earlier post. That one is a database that includes Gypsy elements (non-LTR retroposons) and retroviruses and aims to be a database “devoted to the non-redundant analysis and evolutionary-based classification of mobile genetic elements.” Hopefully to develop into a database of all TE’s. The other paper, by Levy et al. (“TranspoGene and microTranspoGene: transposed elements influence on the transcriptome of seven vertebrates and invertebrates“)1, I’ll discus briefly here introduces a somewhat different database of transposable elements.

ResearchBlogging.orgAs the paper discusses, TE’s have been implicated in a large number of effects on the vertebrate genome: affecting expression and “contributing to genetic diversity, genomic expansion, genomic content and genomic rearrangements.” Whether these immense changes are the raison d’être for transposable elements or the byproduct of a parasitic DNA element, I’ll leave for discussion (though I would side with the latter), but they are unarguably major factors in genome evolution and don’t fit the definition of “junk DNA.” :D.

Transposon Database

I started my Ph.D. studies into the evolution of non-LTR retrotransposable elements in 1990 and found the world of mobile genetic elements or transposons (aka a long time ago… jumping genes) to be varied, complicated and fascinating. In 1993 I discovered the web. I've hoped for and searched for a database of these "Mobile genetic elements [that] are self-contained genomic units capable of proliferating within their host genomes"1 to little satisfaction. Such databases exist such as the Mouse Transposon Insertion Database and the dbRIP (human retrotransposon polymorphisms in humans) and several others. But these almost entirely organism-specific databases. This helps the study of the organism (mobile elements have an effect on the genome), but little in the study of transposons as a class.

So, I was happy to see the latest article in the NAR database issue on GypDB

So, I was happy to see the latest article in the NAR database issue on GypDB

Non-coding, non-functional or junk ncRNA

I just finished reading this paper out this month in PNAS, "Specific expression of long noncoding RNAS." From the looks of it, the paper has conjured up an interesting discussion in the science blogosphere surrounding the paper and the term "Junk DNA." Before I get to that discussion, let me give a brief synopsis of and thoughts on the paper (and a link to a ncRNA database at the end).

Eagerly awaiting the NAR database issue

For years I have been following the NAR database issue, watching the growth in the number of resources from a handful years ago to the hundreds lately. It is great to see what new resources have been added each year–there are people who are solving genomics problems in really creative ways.

Last year (2007) there were 968 resources listed, which was over a hundred more than the previous year (Galperin, Nucleic Acids Research, 2007, Vol. 35, Database issue D3-D4).I’m going to bet on over another hundred new ones this year, for a total of 1086.

Now I’m going over to look for new number in the advanced access section…. The Molecular Biology Database Collection: 2008 update and Michael Galperin reports on: 1078 databases. 110 more this year!

110…well, I will have to check them out and see who the new kids on the block are. I love new databases!