gatcFor those of us who remember setting up those sequencing gels, running them, developing them, and begging a colleague to enter the nucleotides as we read them off the lightbox–some of the next-gen sequencing stuff is really new conceptually.  And it is adding a whole new level of software complexity to sequence assembly.  Right now the field is very active and there are numerous platforms and packages available to work with.  Since I’m not associated with a sequencing lab I hear about it second hand in presentations and papers.

So when I read the new paper in Nature Biotechnology by Trapnell and Salzberg I was delighted.  They have wrangled a collection and assessment of various programs that map short read sequences.  They have a nice figure that conceptually describes a couple of the main algorithmic approaches (“spaced seeds” and “Burrows-Wheeler”, Figure 1).  There’s also a handy table with links to some of the software and even more tools embedded in the text.

It’s a nice introduction to the software that’s currently available for these projects.  And if you expect to rely on this type of data later (and I assure you, you will…), this paper is worth a read.  This field moves really fast, though–and by the time I finish this blog post some of this will be out of date, I’m sure….But they also point to the SeqAnswers message board as a great place to get help from the community.

How to map billions of short reads onto genomes

Cole Trapnell & Steven L Salzberg. Nature Biotechnology 27, 455 – 457 (2009). doi:10.1038/nbt0509-455

Some of the software they touch on includes:

Bowtie , BWA , Maq , SAM tools , Mosaik , Novoalign , SOAP2 , Zoom , SHRiMP , G-Mo.R-Se , ERANGE , TopHat

