A recent paper (couple weeks ago) in PLoS Biology from Hingamp et al. had me intrigued. Entitled Metagenome Annotation Using a Distributed Grid of Undergraduate Students, the lecturers put together a system to teach bioinformatics to undergraduates that uses new unannotated sequences from metagenome projects. As stated in the announcement,
This method asks students to randomly pick and analyze unknown metagenomic DNA fragments from a real research sequence stockpile. The student’s mission, using Internet tools only, is to figure out from which organism the DNA comes from, and what biological function it might have. As well as gaining confidence and proficiency in bioinformatics, students experience the authentic research process of weighing the arguments, establishing prediction reliability, building hypotheses, and maintaining rigorous disourse.
The lecturers have put together a teaching-annotation procedure in a publicly accessible “annotation environment” they call “Annotathon.” This web interface walks the student through the annotation process in a procedure as you see in the figure here. Since you can join and use this interface, I thought I’d give it a test drive.
You start by getting a user account. After that, a painless process, if you are a ‘student’ (as I am in this case), you are give a sequence. My sequence?
>GOS_748010 genomic DNA (North American East Coast: South of Charleston, SC)TCTCATTATTTAGCCTTTGGACTATCAAAGAGTTATACATTAGGTAGTTATACTTTGGGTGTAGGATCAAATTTTCAATATGCACAGCTATTTACTGACA ATAAATATGCTGTTTCCATTGATTTAGGACTTAAAAAACATATTTCTGATAAACTCAGGGCTGGTATTTTAATTGAAAACTTATCATCTAATAATATAGA CTTACCTCTAAATAGTTCTTTAGGCTTTTCTATTTATAATAAAAAAATTAAAACTGAAATATTATTTGACTATAATTATTCATCAGTACATGATAATGGT TTGCATCTAGGAGTCATTAACAAAAATAAATATCTGACATTAAATTTTGGCTATTCATTATATAAATCAAGAACGACCCTCTCATCTGGAGTTGACTTTA TAATTAAAGAAAAATATAAGTTCATTTATTCAATTCTATCTTTAGAAAGTTCAAACTTAGGACTTTCACATTATTTTGGACTAGAGATATCAATTTAATC CGTTGAGGAATTTGATTTAATTTCTTAATATATGGTGATAATTATTCCCATTATATAAAGGATGTCGATATGCTAGCTACAATTATAACCTTAGTTTCAT CAATTTTATTTTCCCAGTCACTATTTTTTTCAGAGTATGCTGAGGGTACTTCTAATAATAAATATTTAGAGATATATAATCCGACTTCAGAATCTATTGA TTTATCTGGGTATGCATTTCCTAGTACGGCAAATGAACCTTCTACTCCAGGTATGCATGAATATTGGAATGAATTTGATAGCGGTTCGATAATAGCCCCT GGTGATGTATTTGTAATATGTCATGGCTCATCAGATGCGCTGATTCTAGCTGAATGCGATCAATTCCATACATATCTTAGTAATGGTGATGATGGATATT GCTTAGTTTCACGACCTGAGAGCTCATATG
I’ve started the process using two of the three ORF finders they recommend, NCBI ORF Finder and EBI’s Transeq. Already I find there are advantages and disadvantages to both (or rediscover, I’ve used both before). I actually ended up using both for this because the displays were different. I liked that Transeq gives me all 6 frames easily in one place to copy and paste, and I liked the ability to immediately BLAST the sequence in NCBI’s tool. I found four possible ORFs in this sequence, the translation of the third frame from nt 570 to 929 one being the most likely a real coding sequence. The BLAST results of this translation show high similarity to a Flavobacteria bacterium hypothetical protein and a Hahella chejuensis extracellular nuclease protein, both species are marine bacteria (a larger ORF in the first frame had a low-similarity result to a rat sequence).
The next steps in the Annotathon, which I have not yet accomplished, are to find molecular weight, protein domains, sequence alignment and taxanomic placement through phylogenetic analysis. The tools they suggest for these are SMS, MWCALC (Molecular weight), PROSITE, PFam, InterPro (protein domains), NCBI BLAST homolog search (or EBI’s BLAST interface), EBI’s Clustal interface, and the http://www.phylogeny.fr/ or Mobyle (phylogeny).
There are several things I think would improve the Annotathon interface. I’d like to see more integration of the tools into the annotation ‘cart’, if at least just a simple link to the different tools for each annotation section. I also, right off the bat, can think of several tools that might be nice to use or at least show a compare/contrast in analysis for teaching different phylogeny programs, SMART, etc). (oh, and of course a good introductory tutorial on all these tools would be nice for students But, these are minor quibbles to what looks like a great teaching tool. I know I’m having some fun with it and I’ll report as I go along over the next couple weeks. That brings me to their last conclusion of the paper…
These students have done some excellent annotations and the process allows their annotations to be submitted to the public databases.
“The 515 students that have taken part in the Annotathon over the past three years have analyzed a total of 2.3 Mb of ocean microbial DNA, representing 9,500 hours of cumulative annotation.”
And from the analysis, it appears their annotations are as good as those done by large automated annotation projects. Though I don’t see students taking over and solving the huge issue the scientific community has, and will increasingly have, with annotating the HUGE amounts of sequence data that we are generating, every little bit helps.. and if they are learning some valuable skills while contributing to our knowledge, it’s an excellent endevour
I suggest you go over and try your hand at it. Let me know what you are finding (here AND in their discussion boards).