What’s the Answer? (bioinformatics one-liners)

As much of a fan as I am of web-based tools for accessing what you need, there are times when the command line can so quickly accomplish what you need. When I looked at this new post at Biostars it was already hugely popular 3 hours into the day. So I think this captured some attention from the field, and some of our readers might want to check out some of these ideas, or offer your own. So, this week’s unusual highlighted question is about the command line.

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

Question: Best bioinfo one-liners?

Whereas an infinity of efficient tools exists out there, it is sometimes still quicker for achieving simple tasks to execute a one linux command. I’m starting by sharing 3 I use quite often.

##1 get the sequences length distribution form a fastq file using awk
zcat file.fastq.gz | awk 'NR%4 == 2 {lengths[length($0)]++} END {for (l in lengths) {print l, lengths[l]}}'

##2 Reverse complement a sequence (I use that a lot when I need to design primers)
echo 'ATTGCTATGCTNNNT' | rev | tr 'ACTG' 'TGAC'

##3 split a multifasta file into single ones with csplit:
csplit -z -q -n 4 -f sequence_ sequences.fasta /\>/ {*}

I may be wrong, but I’ve not found such a list in Biostars.

So, what comes to your mind? I hope this post will yield some gold nuggets ;-)

Manu Prestat

There was a lot of chatter–have a look.