Video Tip of the Week: UpSet about genomics Venn Diagrams?

Who can forget the Banana Venn? It was one of the most talked-about visualizations in genomics that I’m aware of.

So, yeah–#NotSureWhatItMeansButDontCare, and the extended Storify of the responses are still worth reading. It even got the wider tech media’s attention: Just look at that banana genome Venn diagram, by Cory Doctorow. I remember trying to follow the diagram for about 20 minutes before I gave up. But I still loved it for its audacious attempt to genesplain. It was impenetrable. But seriously intriguing. It was awarded the title of “best genomics Venn Diagram ever” by Jonathan Eisen.

It also spawned other examples. The loblolly pine genome folks did one of their own. Recently I actually had to look up what a jujube looked like to see if resembled the Venn they just recently delivered. Um, sorta, maybe–but I don’t know that was the goal or just a happy coincidence of a kinda oval fruit. However, I did catch a fun discussion on the actual origin of the species GO Venn, and currently the evidence points to the rat genome team, however the original published image lacks whiskers and eyes:

So as amusing as this has all been, one team took another approach to this issue. They wondered if this Venn craze was the best way to tackle this data, or if there were more effective and interactive ways to explore this sort of data. Some data set visualization tools may not be right for a task. Give me the bullet One problem is scaling Venn diagrams to capture the full range of features that that genomics folks want to illustrate. They are now prepared to UpSet the applecart. In their intro video to UpSet, they summarize with this:

I’ve talked about the terrific data visualization tools around the Caleydo project a number of times. They are developing really useful and intuitive strategies for looking at numerous types of data, and you can see our previous posts on StratomeX, LineUp, Entourage and enRoute (the combo of genomics data and pathways here is particularly nifty). They work really hard with the theories and techniques of data visualization, and implement effective ways to explore data. They recently looked across various genomics data papers to see how data sets were being used, and they attempt to encourage good behavior with the right visualizations to make the necessary points (Points of View reference below):

Understanding the tasks that the diagrams are meant to support and being aware of the data structure are required to find an appropriate representation.

They also have tried to help. UpSet, for visualization of intersecting sets, is one of their new efforts, championed by Alexander Lex, with the other team members. Looking for both effective and efficient representation of the types of data genomics researchers need, this interactive tool is a really nice way to explore which items belong in which subset. And, of course, which ones don’t.  But that’s just the beginning. With this tool you can easily spot the intersections, query for ones you are interested in, and sort in various ways. There are ways to explore the attributes and elements for the items as well. The other great thing about the Caleydo team is that they make nice intro videos–I’ll embed the overview one as this week’s video Tip of the Week, but they have a shorter basic intro one as well. In this video the examples include Simpson’s characters and movie data sets, but it will certainly allow you to quickly grasp the utility of this tool. But there’s a lot more to it as well. Read the UpSet paper linked below (and you will spot a copy of the notorious banana Venn, in fact, which inspired their thoughts on a better way to illustrate sets). It has a lot of nice guidance on set theory and will help you think about the appropriate uses of different representations.

The github pages have more help, documentation, and a link to try out an installation with your own data. I also recently had the chance to meet Alexander at a talk he gave, and I know he’s interested in knowing what other visualization challenges are problems in genomics, and would be interested in any feedback you have on the tools.

My dreams for this tool: it would be embeddable in journal articles. So I could see the data as the team presented it, but then also be able to explore the underlying stuff. And if it could be a sort of a “session” so I could snap back to the original view. And I wish I could embed an image faintly on the background….

Quick links:

UpSet: http://vcg.github.io/upset/about/

Live version to kick the tires: http://vcg.github.io/upset/

Caleydo tools overall project: http://www.caleydo.org/

References:

D’Hont A., France Denoeud, Jean-Marc Aury, Franc-Christophe Baurens, Françoise Carreel, Olivier Garsmeur, Benjamin Noel, Stéphanie Bocs, Gaëtan Droc, Mathieu Rouard & Corinne Da Silva & (2012). The banana (Musa acuminata) genome and the evolution of monocotyledonous plants, Nature, 488 (7410) 213-217. DOI: http://dx.doi.org/10.1038/nature11241

Lex A., Gehlenborg N., Strobelt H., Vuillemot R.V. & Pfister H. (2014). UpSet: Visualization of Intersecting Sets, IEEE Transactions on Visualization and Computer Graphics (InfoVis ’14), DOI: TBD

Lex A. and Nils Gehlenborg (2014). Points of view: Sets and intersections, Nature Methods, 11 (8) 779-779. DOI: http://dx.doi.org/10.1038/nmeth.3033

Gibbs R.A., George M. Weinstock, Michael L. Metzker, Donna M. Muzny, Erica J. Sodergren, Steven Scherer, Graham Scott, David Steffen, Kim C. Worley, Paula E. Burch & Geoffrey Okwuonu & al (2004). Genome sequence of the Brown Norway rat yields insights into mammalian evolution, Nature, 428 (6982) 493-521. DOI: http://dx.doi.org/10.1038/nature02426