Friday SNPpets

This week’s tips offer some software, including a pre-print from one of my favorite groups–the folks who do great visualizations of sets. I have talked about UpSet before, but now there’s an R package for it. Speaking of great visuals, check out the sponge-microbe symbiosis. And 10 legume genomes. Cannabis as a gateway to plant genomics. And a story of entrepreneurship, and how it ends.

And for kicks, there’s a video that I helped to script-edit that’s been popular: Are GMOs Good or Bad?

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don't make it to a blog post. Here they are for your enjoyment…


What’s the Answer? (to Venn or not to Venn)

This week’s highlighted item from Biostars gets back to the visualization challenges that I love to think about. The question posted asked for help for an 11-set Venn diagram. What was funny about the response was that the overwhelming consensus was: please, no! And alternatives were offered.

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

Question: Venn diagram with 11 sets

Hi All,

can anyone recommend me a tool/package, which allows me to create Venn diagrams up to 11 sets? The packages which I have found so far can support to create less sets only.

Many thanks!


The question was a frequent problem in various data sets. You want to find the members of groups that overlap in different conditions, treatment situations, genes present or absent in different species, whatever. In the most famous case of Venn illustration, the banana genome team created a much-discussed masterpiece of sets of genes in shared gene families among other plant species. It was so astonishing to look at that it even got Cory Doctorow’s attention: Just look at that banana genome Venn diagram. But genomics Venn diagrams get around. Here’s one that became fashion:

However, part of the problem with the Venn is that it was so difficult to interpret. As a developer of visualization tools told me later, Venns do not scale well for genomics types of data. He was UpSet about genome folks trying to force the data in, and created the very neat UpSet tool to help that: Video Tip of the Week: UpSet about genomics Venn Diagrams?

So I added that to the responses, but there were other suggestions too. Go have a look at the ensuing discussion.

Online chart tools

One of the biggest challenges in the era of big data is representing the material you want to explore. As with a lot of bioinformatics tools, there are some excellent strategies with a variety of software tools depending on what you want to display. Sometimes you need something with a little less overhead, though–for presentations, classes, or even lab meetings you might want something quick and handy to do an effective visualization. Sometimes you will want to use the heavy tools–but other times these will be sufficient.

Recently a former colleague (Steffen Schmidt–thanks!) pointed me to several useful tools for generating charts online. The first source he provided was a blog post called 22 Useful Online Chart & Graph Generators. You’ll find a number of choices there you could explore–and I noticed in the comments that other people offered further suggestions.

I’ll highlight another one that was recommended by Steffen that’s not on the list.  Gliffy looks like it offers a nice range of visualization types, and he’s had success with it before. There is a free version where you can do 5 diagrams (note: they are publicly visible under this free plan, which may not be the best thing for your unpublished data, but fine for lab meeting or blogging), and there are subscription version with further features that might be useful in lab/collaboration situations it seems.

You may need more specific visualization tools for other scientific purposes, of course. In the past we’ve talked about the UCSC Genome Graphs tool as a way to show genome-wide data on a chromosome view.  I know there are tools in R that people have used for things I’ve seen. One of them was a word cloud feature. (But you can also use other sites for that such as Wordle). One of our more popular posts over the years was this one about a collection of tools for Venn Diagrams that are sometimes more bio-data specific: I ♥ Venns. Drawing domains has been another popular feature we’ve done a couple of times, once with DomainDraw and also with MyDomains.  People have also really enjoyed using WebLogo and IceLogo that we’ve highlighted that make these kinds of representations:

One of the biggest challenges going forward, though, is visual representation of the huge volumes of data we are seeing. How do you show what you want from 1000 genomes? 10,000 genomes? And that’s just the beginning….

Anyway–I hope this tip can help you visualize a few more things useful to your work, as we continue to wrestle with larger questions of big and complex data visualizations. If you have other handy quick drawing tools, or other smaller tools that just do something you need, let us know, we’d love to have a look at them.

“What’s the answer?” thread

BioStar is a site for asking, answering and discussing bioinformatics questions. We are members of the community and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday* we will be highlighting one of those questions and answers here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

BioStar Question of the Week:

Often in these days of “big data” people are coming to us and asking what they should do with their big lists of stuff, in order to understand more about the functions of the items in their list. A question at BioStar tackled this:

Tools for visualizing overlap between GO terms

Some sibling GO categories are highly correlated. Is there any tool / webserver that would take a list of GO terms and show me a matrix of the fraction of shared genes in a particular species? Or at least a Venn diagramm for two or three GO terms?

For example, almost 1000 human genes are annotated as “cell cycle”, but the sum of the node counts is much higher. I’d like to see which categories are overlapping with each other.

Although there was no “selected” answer, the discussion offered a variety of strategies to go about these kinds of comparisons, including a couple of tools that were new to me. Go over there to check out the answers.

I ♥ Venns

Ok, I know they aren’t the most sexy graphics in biology–yes, you 3D protein structure geeks have that down. They are a pretty straight-forward representation of the numbers of items in a group and the overlap. But I have always found them really quickly helpful as I’m trying to assess results of lists of things I may have been working on.

So conveniently enough I just got a notice of a new Venn tool today–and it’s a Cytoscape plug-in. That’s a sample of one of the components of it on the right.  And I just happen to be working on the update for our Cytoscape tutorial this week (subscription), so I’ll be able to add it to our tutorial.  Anyway, here’s the notice as it came across the Cystoscape-announce mailing list:

Cytoscape plugin for venn diagrams, version 0.2 has been released.

The plugin provides a diagram view and a details view for comparing two to four Cytoscape groups at a time.

Project web site http://www.dishevelled.org/venn-cytoscape-plugin

Downloads http://sourceforge.net/projects/dishevelled/files/venn-cytoscape-plugin

Screencast http://www.youtube.com/watch?v=UtoW0nVwOV4

This plugin is for Cytoscape version 2.7.0 only, an incompatible change was made in Cytoscape version 2.8.0-beta that hasn’t been resolved or worked around yet.


And they even offer a screencast on it. Check out the interactive capacity of that–a segment of your Venn gets highlighted back on your network/pathway window. How nifty. Made my morning!

Are there other Venn tools out there that people use and like?