Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…
RoBuST “has been developed as root and bulb plant community research platform for integrated analysis of root and bulb genomics data.” Cool. I’m a big fan of roots and bulbs–oh, crap, just realized I forgot to buy carrots for the Pav Bhaji. Will try to get them tomorrow at the farmer’s market or Faneuil. [Mary]
FEAST is a sensitive local alignment program with multiple rates of evolution. An interesting project as part of a Ph.D. thesis :). I haven’t tried it yet, but from the commentary, it looks good. [Trey]
Because Trey often talks about the CLOCK gene, I found this set of Nature papers interesting: Editor’s Summary – Clocking on to diabetes [Jennifer]
As most parents and anyone who has watched a child over time knows, a large portion of our personalities are genetic. But like height and sexuality, they aren’t easily reduced to single (or even multiple) gene causes as this recent GWAS research is showing. [Trey]
There’s a site that is fielding questions about predominantly on Next-Gen type sequencing related issues: http://i.seqanswers.com/ [Mary]
The VISTA comparative genome analysis resource updated their interface a few months ago. Additionally, they’ve added VISTA-Point (which replaces and greatly extends VISTA text browser) which, as the site says, allows the user to:
Access complete data and visual presentation of pairwise and multiple alignments of whole genome assemblies.
The homepage has undergone a very nice redesign. Much of the underlying VISTA browser and other tools functionality and use is similar (though updated of course). We understand also that there will be upcoming updates to some tools and the addition of others. Look for that here :D.
Also, we’ve updated our tutorial to reflect the new site and functions. As before, this tutorial is free to users and sponsored by VISTA. Check it out.
I put in 2K9G and 1ZLM, two SH3 domains. Used chain A for both and blast2seq as comparison algorithm (there are several choices), and these are my results. I’m not promising any biological significance to my choices and results, but the widget works nicely and simply. I can see it as a nice addition to someone’s homepage.
I have a vague memory of reading about COBALT a while back, but at the time it was an executable file to download and I think I put it away as “to do.” Well, a couple days ago I was over at the NCBI BLAST site for something (tip of the week?), and noticed there was a “new” flash for COBALT. So, COBALT is now integrated as a web-tool on the NCBI site. The short description of what COBALT is, from the site:
COBALT is a multiple sequence alignment tool that finds a collection of pairwise constraints derived from conserved domain database, protein motif database, and sequence similarity, using RPS-BLAST, BLASTP, and PHI-BLAST.
Pairwise constraints are then incorporated into a progressive multiple alignment.
I haven’t tried it out yet, compared it to other multiple sequence alignment tools, but thought I’d point it out to those who haven’t yet noticed it.
Today’s tip is on a TARGeT. TARGeT is, as the the paper’s title in the this year’s NAR’s issue states, “a web-based pipeline for retrieving and characterizing gene and transposable element families from genomic sequences.” There are several things you can do at TARGeT. Using BLAST, PHI BLAST, MUSCLE and TreeBest ,the main function of TARGeT is to quickly obtain gene and transposon families from a query sequence. The tip today is a quick intro to the tool and a search on an R1 non-LTR transposon.
If you are a biomedical researcher, have you ever used protein databases like UniProt to get information about proteins that you are interested in? Do you know how that database got there? I don’t mean today, I mean decades ago—how did a resource like this come to even exist at all? When researchers search a protein database or align amino acid sequences, frequently they’ll come across a name helped start it all years ago. Margaret Dayhoff was one of the people that pioneered this crucial functionality, a true founder in the field of bioinformatics. But in some histories and timelines of bioinformatics she barely gets a mention. To celebrate Ada Lovelace Day, I’m going to introduce you to Dr. Dayhoff and I hope to raise awareness of her important fundamental contributions to the field of bioinformatics.
Because we can access all the protein information we can stand with a few keystrokes today, it is easy to forget that this data 1) didn’t always exist, and 2) when it did exist, it wasn’t easy to find and work with. In the 1960s, only a handful of protein sequences were known. But it was clear that more of this data would be incredibly useful in a number of ways, and was certainly going to be generated at an increasingly faster pace. And soon it would overwhelm any one person’s ability to analyze and retain. DNA sequences…don’t even go there yet….
But there were some prepared minds ready to begin thinking about these data and the associated opportunities around them. They were also aware that computers might help with these problems. Robert Ledley was one of them. Ledley had trained as a dentist, but obtained a degree in physics and became increasingly interested in the possibilities of applying computational resources to biomedical problems. A reportauthored by Ledley is one of the earliest studies of biomedical computation, and can be viewed on Google Books today.
Working with Ledley at the National Biomedical Research Foundation was a woman named Margaret Dayhoff. With an undergraduate degree in mathematics and graduate studies in chemistry, Dayhoff had pioneered work with punch cards and data processing machines to evaluate molecular resonance energies of organic molecules. She obtained a Watson Computing Laboratory Fellowship to pursue the work to complete her PhD, which is describedby a biographer as:
The process was iterative and required manually carrying cards from one type of machine to another (4 types), as no single machine could do the whole iteration. Convergence was slow and several months could be required for a result. Continue reading →
This week’s tip introduces a nice feature and tool of the Viral Bioinformatics Resource Center (VBRC). There are a lot of great tools at the VBRC to search and analyze hundreds of viral genomes. Most, if not all, of the tools can be used for searching and analyzing bacterial genomes also. The tool we are introducing in this tip is Base by Base. This tip actually came from a question from one of our readers in our weekly “WYP” feature a few weeks back. Reader Azalea asked:
I’m looking for a pairwise sequence alignment tool which can anchor specific nucleotides to be arbitrarily aligned.I just hope to fix certain positions to be aligned, which will change the whole alignment.
Chris Upton at VBRC suggested Base by Base. I’ve had the opportunity to use Base by Base and it’s a useful tool for working with pairwise alignments (could probably be used for any two sequences, not just bacterial and viral) and looks like a tool that Azalea might be able to use. Today’s tip shows you quickly how to add two sequences, align part by hand and select another region to align by algorithm (choice of T-Coffee, ClustalW or MUSCLE).