If you didn’t know it already, there is an entire community out there, having noted the similarity between biological sequences and music, that has been working out methods to convert DNA and protein sequences into music using the properties/differences in the bases and amino acids. Some of the results of this fiddling have created some interesting, and often haunting and beautiful, music. One such site is Gene2Music. Here, several people (Rie Takahashi, Jeffery Miller et. al) have created an algorithm to translate gene sequences into music. In this tip of the week we’ll show you the site and how to take a gene of interest and create your own piece. Take a quick listen to Human ThyA (audio) and then listen to a piece composed by Rie Takahashi based on the themes of ThyA. Impressive. Maybe the next Mozart will do a series of compositions based on the human genome (that will be one long concert).
So, Carl Zimmer asks the question (from his reader):
What actually is the longest word (in any language) encoded by the reference human genome? If I had the time and computer power I’d have a look…
Guesstimate – it’ll be somewhere in the 4-5 letter range, depending on letter frequency in the target language.
Well, 6-letter words are somewhat easy to find actually in the database of all known proteins. When I was doing my graduate research, I had to do it the old-fashioned way… I read gels. To pass the time when analyzing, I’d see if I could find words in the translated amino acid sequences. I found a few 6 and 7 letter words… and if my memory doesn’t fail me, an 8 letter word. But I don’t remember what it was! Doing a quick BLAST on two words I found SEARCH (in Plesiocystis pacifica, a bacteria) and CHANGE (in Danio rerio, zebrafish). I’m sure there are many others.
Wanna play? (who’s done this before?). The question above is of course the human genome, but we could do subcategories… all genomes, human only.. :). I suspect you could parse a simple dictionary into a FASTA format and blast against the genome :).
update: I’ve really got to get back to work. I looked for my full name and I’ve found every part “Warren” “Calvin” “Lathe” (and III is everywhere) and NO, the fact that the first is in Neisseria gonorrhoeae and the latter is in Salmonella enterica is of NO significance whatsoever. Hey “Calvin” is in platypus, so… oh nevermind.