Galaxy, a stride towards reproducible computational research
Galaxy started out as a very useful tool to do genomics research that was reproducible and sharable. One of my pet peeves in reading research papers that use genomic analysis or online genomics resources is the materials and methods sections. Often the methods and parameters used are mentioned only in a very cursory manner, if at all. I would not be able to reproduce the research. This, along with the ability to easily do and share analysis, is one of the fundamental purpose Galaxy was developed and does a pretty good job of it (I am a bit biased*).
The Galaxy developers have recently published a paper: “Galaxy: a comprehensive approach for supporting accessible, reproducible and transparent computational research in the life sciences” in Genome Biology.
There have been a couple questions or functions I have felt that Galaxy needed to better fulfill the goal of reproducible and transparent computational research. One of the things we’ve been asked in workshops on Galaxy has been how long will ‘histories’ and ‘workflows’ persist. The Galaxy developers insisted these would persist indefinitely (as indefinite as an online world could be). In this paper, the developers answer that question with what seems to me a pretty good, broad approach to persistence:
We are pursuing three strategies to ensure that any Galaxy analysis and associated objects can be made easily and persistently accessible. First, we are developing export and import support so that Galaxy analyses can be stored as files and transferred among different Galaxy servers. Second, we are building a community space where users can upload and share Galaxy objects. Third, we plan to enable direct export of Galaxy Pages and analyses associated with publications to a long-term, searchable data archive such as Dryad.
Another feature that, though I knew this was coming, it’s good to see it in published form and in the beta site, a community of tools and users. It’s mentioned in the quote above, but it’s more than that. It’s an extension of the ability to share histories and workflows:
To help users make better and faster choices within Galaxy, we are extending Galaxy’s sharing model to help the Galaxy user community find and highlight useful items. Ideally, the community will identify histories, workflows, and other items that represent best practices; best practice items can be used to help guide users in their own analyses.
The beta site gives you a look at what’s coming in the “Galaxy Tool Shed,” a place to upload, download and share tools to import into Galaxy installations. Hopefully this will eventually also include the ability to rate and discuss tools. Another aspect I’ll be looking forward to is the ability to share workflows to an open and broader community. Right now there is the excellent ability to share histories and workflows with other users within your network of colleagues, but I would like to see an open community to share and rate workflows. From the comment above, it seems that is coming. It will be a very welcome addition.
One last feature added I’d like to mention is pages:
Galaxy Pages (Figure 4) are the principal means for communicating accessible, reproducible, and transparent computational research through Galaxy. Pages are custom web-based documents that enable users to communicate about an entire computational experiment, and Pages represent a step towards the next generation of online publication or publication supplement. A Page, like a publication or supplement, includes a mix of text and graphs describing the experiment’s analyses.In addition to standard content, a Page also includes embedded Galaxy items from the experiment: datasets, histories, and workflows. These embedded items provide an added layer of interactivity, providing additional details and links to use the items as well.
I tried out the pages (click “user” at the top right of the page, then click “pages” to access pages). I like the ability to basically write what is a methods and materials for computational biology. You can describe what you did, embed histories, datasets and the like. Unfortunately, at the time of this writing I was able to build a page, but unable to view it (server error, I used latest versions Safari and Firefox in Mac 10.5). I am sure this is a temporary glitch.
Galaxy has making huge progress in the last couple years and looks poised to become a go-to tool for computational analysis for experimental biologists. In that vein, you might want to check out their introductory tutorial or screencasts to get acquainted with the tool!
*disclaimer: The Galaxy group contracts with OpenHelix to provide an introductory tutorial on Galaxy (free and open to all users).
Goecks, J., Nekrutenko, A., Taylor, J., & Galaxy Team, T. (2010). Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences Genome Biology, 11 (8) DOI: 10.1186/gb-2010-11-8-r86