Tag Archives: sequences

Friday SNPpets

Welcome to our Friday feature link dump: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

Tip of the Week: File Format conversion

fileformat_thumbMany of us have worked with DNA and protein sequences of course in several different formats: GenBank and FASTA to name two broadly used ones, but there are many others. Different tools and databases will often require different formats. More often than not, converting from one to the other format isn’t too much of a problem, the database will do it for you, or there will be some help documentation. But this isn’t always the case. There are several ways you can convert formats, for example Galaxy has some limited ability to do this and some databases allow you to export sequence in one of several formats, but often you’ll need a bit more help. ReadSeq is a publicly available software package (downloadable from that link) that will do just that. You could download or install, but EBI also has a web interface for ReadSeq (as do some other services). Today’s tip is for those of you somewhat new to sequence formats (or even some of us who aren’t) and need a quick web interface to converting formats.

Teaching and annotating at the same time

plos teaching paperA recent paper (couple weeks ago) in PLoS Biology from Hingamp et al. had me intrigued. Entitled Metagenome Annotation Using a Distributed Grid of Undergraduate Students, the lecturers put together a system to teach bioinformatics to undergraduates that uses new unannotated sequences from metagenome projects. As stated in the announcement,

This method asks students to randomly pick and analyze unknown metagenomic DNA fragments from a real research sequence stockpile. The student’s mission, using Internet tools only, is to figure out from which organism the DNA comes from, and what biological function it might have. As well as gaining confidence and proficiency in bioinformatics, students experience the authentic research process of weighing the arguments, establishing prediction reliability, building hypotheses, and maintaining rigorous disourse.

The lecturers have put together  a teaching-annotation procedure in a publicly accessible “annotation environment” they call “Annotathon.” This web interface walks the student through the annotation process in a procedure as you see in the figure here. Since you can join and use this interface, I thought I’d give it a test drive.

Continue reading

New Online Tutorials for Sequence Similarity Search Tools BLAST and FASTA

OpenHelix today announced the availability of new tutorial suites on two highly used sequence similarity search resources: BLAST and FASTA. BLAST, from the National Center for Biotechnology Information (NCBI) at NIH and FASTA, accessed through a web interface at European Bioinformatics Institute (EBI), are both excellent and widely used tools for finding sequence similarities for proteins and nucleic acids.The tutorial suites, available for single purchase or through a low-priced yearly subscription to all OpenHelix tutorials, contain a narrated, self-run, online tutorial, slides with full script, handouts and exercises. With the tutorials, researchers can quickly learn to effectively and efficiently use these resources. These tutorials will teach users:

  • the principles of sequence comparisons
  • basic explanations of scoring matrices and alignments
  • main features of the FASTA and BLAST algorithms
  • how to perform a sequence similarity search
  • how to view and interpret the similarity results
  • how to find additional biological information about matching sequences

To find out more about these and other tutorial suites visit the OpenHelix Tutorial Catalog and OpenHelix or visit the OpenHelix Blog for up-to-date information on genomics. About OpenHelix
OpenHelix, LLC, (http://www.openhelix.com) provides the genomics knowledge you need when you need it. OpenHelix currently provides online self-run tutorials and on-site training for institutions and companies on the most powerful and popular free, web based, publicly accessible bioinformatics resources. In addition, OpenHelix is contracted by resource providers to provide comprehensive, long-term training and outreach programs.

Sequence Formats

fasta file formatThere are a lot of them. FASTA comes to mind. GenBank is another. Clustal, EMBL, GCG and the list goes on. I’d say FASTA is one of the most commonly used or accepted, but I could be wrong. Still, many databases and software programs have their own format that they accept and generate. Some of these programs and databases will accept several formats or generate files in several formats. It can get a bit confusing. So, you’ve got a sequence file in PAUP but you need it in FASTA? Don’t even know what format it is? Or what they look like or the information that they contain?

Here are some links that could help I have gathered over time and lately as I was working with a phylip file:
Oxford’s CGRB’s examples of sequence formats.

EMBOSS’s explanation of sequence formats.

EBI’s help section on sequence formats.

Here are two programs that will convert one format to another:

Readseq (home URL and downloadable code here)


Hopefully that will get you started in making sense of sequence formats. Have another other help pages or conversion programs to suggest?