One of the topics I keep an eye on is visualization of various types of genomics data, and I’m always interested in new tools for graphical representations. In the past some of our most popular posts have been tools that aren’t heavy-lifting analysis types of tools–but better ways to visualize and explore data, or different ways to present it.
This week’s tip of the week is a tool of this type–Ambiscript Mosaic offers a new way to look at nucleotides in stretches of sequence data. Now, I know–you think: new ways to look at A, T, G, and C? Really? Do we need this? And I’ll admit I wasn’t convinced at first. When I read the first paper on it the sequences just looked like Elvish–which I thought was cute, but I wasn’t convinced it was useful. But the more I thought about this interesting abstraction and read more about it, the more I liked the concept.
The basic idea is that the Roman letters for ATGC are certainly important and useful. But they can be represented with graphical elements that convey more detail visually. And the 5′ to 3′ representation of letter-based sequence info offers one way to think about the sequence, but the reverse complement of that requires a translation step. However, if represented graphically, the same data is just a physical flip away with no additional changes.
This strategy isn’t one that you’d want to replace every view of sequence data, of course. But for some purposes this might offer a new view of the information that will be better suited to seeing some types of motifs or patterns.
In this week’s tip I’ll illustrate an example of how this type of visualization could offer a complementary way to evaluate a particular DNA motif. As a bonus, I’ll also provide the video of the presentation by the Rozak team that helped me to understand why this offers something different from the letter system. You can see it on their site, but I wanted to have a video version of it as well for cross-platform access.
For the demonstration video, I chose to compare the sequence logo style representation generated by the MEME suite tools with this graphical notation. MEME is a tool that I would use to identify motifs–to do the heavy lifting of the analysis part–and then visualize the results. They offer several ways to visually examine the results, and one of them is a sequence logo. The MEME documentation offers a sample motif, which I used to then display the Mosaic style. Here is MEME above, and Mosaic below it:
In the demonstration video I don’t have the time to cover a number of the useful aspects of the graphical strategies employed by the Ambiscript tool–this just covers the basics. Be sure to read their papers and see that other background video to understand more about the actual graphical representational choices and details of colors and shading, for example. There’s a lot more thought behind this than I had time to cover. I didn’t show gaps here either, but it can account for gaps.
This bonus video offers some of the background and foundations of the graphical representations they’ve selected. It is based on the prior work, so it doesn’t have some of the additional features that the Mosaic paper describes. But it helps to explain the conceptual basis for the styles. It helped me to connect to the ideas about the choices for graphics. There’s no audio with it, it’s just a conversion of the slide walk-through.
This tool is unusual, I know—I’m sure not everyone will want to let go of ATGCs as letters. And it won’t be suited for every sequence visualization purpose. It took me a while to wrap my head around the idea of not having the letters there. But as a different way to consider sequence data, I think it could be useful for exploring some features. You’ll still want to use the algorithms like the MEME suite has to discover features like possible transcription factor binding motifs. But you can think about seeing them differently with Ambiscript Mosaic.
Credits or quick links to things you saw in the demo video:
Ambiscript Mosaic site: http://www.ambiscript.org/
Rozak slide presentation on the foundations: http://rozak.us/design/DNA_notation/
Wikipedia Base pair page: http://en.wikipedia.org/wiki/Base_pair
MEME documentation sample motif: http://meme.nbcr.net/meme/doc/examples/meme_example_output_files/meme.html#motif_1
MEME Suite homepage: http://meme.sdsc.edu/
Thanks to David Rozak for permission to convert the slide presentation to video.
Rozak D. & Rozak A. (2008). Simplicity, function, and legibility in an enhanced ambigraphic nucleic acid notation, BioTechniques, 44 (6) 811-813. DOI: 10.2144/000112727
Rozak D.A. & Rozak A.J. (2014). Using a color-coded ambigraphic nucleic acid notation to visualize conserved palindromic motifs within and across genomes, BMC Genomics, 15 (1) 52. DOI: 10.1186/1471-2164-15-52
Bailey T.L., Boden M., Buske F.A., Frith M., Grant C.E., Clementi L., Ren J., Li W.W. & Noble W.S. (2009). MEME SUITE: tools for motif discovery and searching, Nucleic Acids Research, 37 (Web Server) W202-W208. DOI: 10.1093/nar/gkp335