Tip of the week: CompaGB for comparing genome browser software
Here at OpenHelix we think a lot about the differences between nominally similar software that will accomplish some given task. For example, in our workshops we are often asked about the differences between genome browsers. Although UCSC sponsors our workshops and training materials on their browser, we know they aren’t the only genome browser out there and we can talk about them all–in fact, that’s one of the coolest things about being separate from UCSC or a specific software tool provider/grant–we can talk about everyone! And our answer is usually something like this:
The basic foundation of the “official reference sequence” is usually the same in all the main browsers. However, the way they choose to organize the display, the tools for showing/hiding annotation data, and the custom query and display options vary. But they generally all have some mechanism for this. For me, usually the choice comes down to what data I need to look at–and how a given software tools shows me that and lets me interact with it.
I know that’s largely an end-user perspective, but that’s who is attending our workshops. I can remember talking to one guy at our conference booth who only wanted to use a genome browser with the reference sequence display organized vertically. I gave him Map Viewer. Some people need a specific species–and no matter how good the software is, if your research species isn’t in there, it just doesn’t matter…. I’ve seen super-users on twitter complain about the look of the background at one browser or another. That doesn’t have much bearing on my choice–but I do have to say I really hate “hidden” menus and features you have to hover and dig to find, in general. What you don’t see is just impossible to know as an end-user.
But quite frankly when I’m looking for some details in a given region for a research use, I often explore all the browsers I know because of their differences in display and available data to show–to make sure I’m not missing anything. It doesn’t take that long to use them all (if you know your way around, and I think I do…).
But one group has tried to quantify the differences between software tools in a standardized way with with specific metrics. A group from INRA has collected and assessed various characteristics of genome browsers, and has developed a database where you can look at what they have curated. It’s called CompaGB.
You can assess the features as one of these profiles: biologist, computational biologist, or computer scientist at this time. In this tip of the week I explore the CompaGB interface, from an end-user biologist perspective. I’ll choose a couple of browsers to compare, and we can look at the type of things that the CompaGB team scores to give you a sense of what you can find. For developers you’ll see there are different metrics and you should go back and explore those as well.
In their paper they describe their inspiration for this project–which is QSOS. The Qualification and Selection of Open Sources software project provides a model and framework to standardize descriptions of available software features. The QSOS framework is illustrated in this graphic on their Welcome page:
In short, they have 4 steps: defining frames of reference appropriate for the software tool; assessing the features; qualify the features with a weighting mechanism, and selecting the appropriate tool.
You can easily see how the CompaGB team integrated these ideas in their database of genome browser comparisons. They let you choose criteria you are interested in, and offer a radar plot display as well as a tabular representation of the scores so you can consider the overall view or the details.
There are scores for “full, limited/medium, and poor” but not a lot of detail on that. They assessed the tools in 4 sections: (1) technical features, (2) data content, (3) GUI, and (4) annotation editing and creation. There is apparently no swimsuit competition. Alas.
The paper says that 4 different evaluators examined the tools (at this time 7 different genome browser: MuGeN, GBrowse, UCSC, Ensembl, Artemis, JBrowse, Dalliance). They have version numbers–for example you can compare the 2 widely used GBrowse versions right now. How often these will get re-evaluated I don’t know. And how to compare different installations of GBrowse at different sites is not really clear to me–they can vary a lot by what the project team wants and needs to implement.
One evaluator did each tool in most cases. And reportedly the results were sent to the database providers for checking. I have no idea what was sent to UCSC on the training issue… [*cough* I have issues with the UCSC training score details, for example...Yeah, we do workshops and so does UCSC. Lots of workshops around the world, we have slides and exercises...I'll show you in the tip where I saw it.] They do encourage users to comment or suggest on their web site if you have supplementary information–I may want to add some details later And it appears possible to create new items and curate, but I haven’t tried this. They also say they are re-vamping the evaluation process going forward to simplify it.
But…this statement in their paper surprised me:
The UCSC browser natively displays a broad range of human annotations, including cross-species comparisons. UCSC browser’s underlying strategy focuses upon centralizing data on UCSC servers and, as far as we know, no external lab has installed it locally for the purpose of storing and browsing their own data.
Ummm–no. We talk to people all over the place who maintain local installations of UCSC. Quite often in hospital situations where patient data privacy is a major issue there are local installs; certain companies have them. But there are others as well–among them a bunch of mirrors around the world. There’s a whole separate mailing list where people discuss their issues with their own installations. But we’ve also seen the UCSC infrastructure used for other species that UCSC doesn’t support such as HIV, malaria, phage browsers, and more. Maybe this unusual setup of the UCSC software at the Epigenetics Visualization Hub would be interesting to be aware of. And we know the UCSC team consults with groups and helps them to do it. And by the way–we mention in our tutorials and workshops that we’ve done around the world that other installations are possible and available. And we know that the materials we provide are used in many countries to do local trainings as well.
So it was an interesting attempt to measure software features, and I understand why they attempted it, but it seems challenging to scale and maintain. And the curation strategy will have to be considered when evaluating the data. These are fixable if the project proceeds beyond this early set of browsers and branches out to other types of open source software. It really is hard to know what’s worth spending your time on, I admit. And that’s why we hope end-users have a look at our training materials to get introduced to a specific site and see if it suits their needs, and they can kick the tires with the exercises.
Lacroix, T., Loux, V., Gendrault, A., Gibrat, J., & Chiapello, H. (2011). CompaGB: An open framework for genome browsers comparison BMC Research Notes, 4 (1) DOI: 10.1186/1756-0500-4-133