Tag Archives: analysis

Video Tip of the Week: Visualizing the Galaxy

An antennae galaxy

Well, not that kind of galaxy (though visualizing those are quite nice), this kind of Galaxy. Galaxy is an excellent tool to analyze, reproduce and share genomics data and the Galaxy folks are always updating, improving and adding features to the tool. We have a tutorial for Galaxy to help you get started using this tool. As you might have guessed from the previous sentence, Galaxy is a moving target. The basics (and that’s what the tutorial is for) are the same, but the tutorial is in the process of being updated to reflect some of those changes. That update should be out sooner rather than later, but that said, we just can’t fit everything into the tutorial. The relatively new visualization tool is something that will not be in the tutorial. As there are no tutorials on visualization at the Galaxy site that I can find (if you know of any, link them here!), I’ve included a quick intro to visualizations using Galaxy in this tip of the week.

There are other ways to visualize the data analyzed at Galaxy. Galaxy datasets can often be viewed directly at UCSC Genome Browser, Ensembl, RViewer or in GeneTrack within Galaxy. Those are all excellent tools and powerful ways to view and explore your analysis in depth. In addition, the Galaxy visualization tool is a way to quickly visualize your data to help  discovery,  direct further analysis and share what you’ve found. It is obviously not a full fledged browser, but is very useful in doing a simple visualization of your data from within Galaxy. Today’s tip gives a quick introduction to Galaxy visualization.

Quick Links:
Galaxy (OH tutorial-subscr.)
UCSC Genome Browser (OH tutorials-free)
Ensembl (OH tutorials-subscr.)

P.S. You might here some bird song in the background. I am in, and working from, Hawaii for the next month (yeah, it’s tough work but someone has got to do it). No way to get those birds (or the frogs at night) to be silent for a bit.

What’s the Answer? (Bioinformatics Tools on Biostar)

BioStar is a site for asking, answering and discussing bioinformatics questions. We are members of thecommunity and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those questions and answers here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

Just about a month ago, BioStar added a tool section as a place to announce and update new and old bioinformatics tools. So, in a sense, today’s post is are answers in search of your questions :):

We just added a new section to the site called **Tools** We designate this section to announcements regarding bioinformatics software tools, both new and old.

There are about 10 tools listed currently for various areas of analysis. This dovetails nicely with a the “obituary” section Mary created at Biostar for no longer supported tools and databases she discussed earlier this week. Circle of life and all that :).

What’s the Answer: Open Thread (NGS Tools)

BioStar is a site for asking, answering and discussing bioinformatics questions. We are members of the
community and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those questions and answers here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

Question of the week:

Now we are analysing NGS data, and I wonder if you know some collections of bioinformatics tools which can help me (like biopieces).

There were a few good answers with a good suggestion of lists of tools for analysis and preparation of Next-Generation Sequencing tools. Here’s one answer, click the link above for the rest:

(after some advice about mastering scripting language and unix commands…)

  1. learn the most used bioinformatics tools. e.g.:
    • one (or preferably many) short-read aligners
    • samtools
    • an NGS viewer (IGV is a good one to start)
    • Bedtools
    • A means to view and filter your NGS reads
    • and certainly many others depending on your specific focus.
  2. Then start to learn some of the common data repositories. e.g.:


Real bioinformaticians write code, real scientists…

Just over a week ago, Neil Saunders wrote a post I agreed with: Real bioinformaticians write code. The post was in response to a tweet conversation that started:

Many #biostar questions begin “I am looking for a resource..”. The answer is often that you need to code a solution using the data you have.

He’s right, and that’s very true for bioinformaticists to whom he’s talking. My concern is for the rest of biological researchers. He states in the post:

In other words: know the data sources, know the right tools and you can always sculpt a solution for your own situation.

This is very true and I whole heartedly agree. So many solutions exist already in thousands of databases and analysis tools. It’s what we do here at OpenHelix, help experimental biologists, genomics researchers and bioinformaticists find the right data sources and tools and then go and “sculpt a solution for their situation.”

In the last part of my comment,

BioMart, UCSC Genome Browser, Galaxy, etc, etc are excellent tools and data sources and could probably answer about 80% of most posed questions :). But my caveat would be that knowing the data sources and right tools can be a bit of a daunting task.

And it is, despite the somewhat dismissive response :). We’ve all seen the graphs, exponentially rising amounts of data over time. It’s an issue as the Chronicle of Higher Education article title states:

Dumped on by Data: Scientists Say a Deluge is Drowning Research

The journal Science also had an entire 10 article section on the issue. It’s not a problem that will go away.

Along with that deluge of data, has come a deluge of databases and data analysis tools (created for the most part by bioinformaticists!), many of which _alone_ are quite daunting to find the right data and tool within. There are thousands such databases and tools. I’ve lost count.

Neil Saunders is correct. The solution is out there, find the right tools and data, sculpt a solution. He responds to my comment with “Learning what you need to know in bioinformatics can certainly be daunting. But then, science isn’t for for the easily daunted :-).” In other words, “if you are daunted, you aren’t a scientist?”

We give workshops to researchers around the world from Singapore to the US to Morocco and at institutions as varied as Harvard, Stanford, University of Missouri, Mt. Sinai, Stowers and Hudson-Alpha. The researchers we’ve given workshops and answered questions from were also varied, developmental biologists, evolutionary, medical researchers, bioinformaticists, researchers quite well versed in genomics and those not.

The overriding theme is finding and knowing the data and the tools is not only daunting, but sometimes not possible. Not because they don’t exist, but because finding and knowing them is a drain of personal and lab resources considering the shear growing field of things to find and know. I refer you to the Chronicle article… drowning in data..

They are real scientists not easily daunted, but daunted just the same, by what’s in front of them. And yes, many of those specific questions to specific research needs can be answered by existing tools. We come across many questions on Biostar that a well-crafted database search or analysis step will answer beautifully, without the need for reinventing the wheel with more code (and the answers are often code).

I suspect that most of those scientists out there who call themselves ‘bioinformaticists” should have a grasp of the tools and databases available to them (but I can tell you, even the brightest of them don’t sometimes). So, the advice and final words of the linked blog post above…

In other words: know the data sources, know the right tools and you can always sculpt a solution for your own situation…. real bioinformaticists write code

Yes, real bioinformaticists write code, but this advice is insufficient to the other 90% of real scientists who don’t. Perhaps Biostar is not the solution (I suspect a lot of those questions being asked he points out are those by non-bioinformaticists who only have a basic, if any, knowledge of coding nor access to those who do). Perhaps it, or something like it, can be.

Tip of the Week: Galaxy Pages

This week’s tip is a brief introduction to Galaxy Pages. These are special pages that users can create within the Galaxy system to annotate, describe and explain various analyses done using Galaxy. The user has many abilities to link to and embed histories, workflows and datasets along with using text and images and more to fully annotate analyses. As described last week, this is one of the many additions Galaxy has added to increase reproducibility and transparency of genomics research.

Galaxy, a stride towards reproducible computational research


Galaxy started out as a very useful tool to do genomics research that was reproducible and sharable. One of my pet peeves in reading research papers that use genomic analysis or online genomics resources is the materials and methods sections. Often the methods and parameters used are mentioned only in a very cursory manner, if at all. I would not be able to reproduce the research. This, along with the ability to easily do and share analysis, is one of the fundamental purpose Galaxy was developed and does a pretty good job of it (I am a bit biased*).

The Galaxy developers have recently published a paper: “Galaxy: a comprehensive approach for supporting accessible, reproducible and transparent computational research in the life sciences” in Genome Biology.

There have been a couple questions or functions I have felt that Galaxy needed to better fulfill the goal of reproducible and transparent computational research. One of the things we’ve been asked in workshops on Galaxy has been how long will ‘histories’ and ‘workflows’ persist. The Galaxy developers insisted these would persist indefinitely (as indefinite as an online world could be). In this paper, the developers answer that question with what seems to me a pretty good, broad approach to persistence:

We are pursuing three strategies to ensure that any Galaxy analysis and associated objects can be made easily and persistently accessible. First, we are developing export and import support so that Galaxy analyses can be stored as files and transferred among different Galaxy servers. Second, we are building a community space where users can upload and share Galaxy objects. Third, we plan to enable direct export of Galaxy Pages and analyses associated with publications to a long-term, searchable data archive such as Dryad.

Another feature that, though I knew this was coming, it’s good to see it in published form and in the beta site, a community of tools and users. It’s mentioned in the quote above, but it’s more than that. It’s an extension of the ability to share histories and workflows:

To help users make better and faster choices within Galaxy, we are extending Galaxy’s sharing model to help the Galaxy user community find and highlight useful items. Ideally, the community will identify histories, workflows, and other items that represent best practices; best practice items can be used to help guide users in their own analyses.

The beta site gives you a look at what’s coming in the “Galaxy Tool Shed,” a place to upload, download and share tools to import into Galaxy installations. Hopefully this will eventually also include the ability to rate and discuss tools. Another aspect I’ll be looking forward to is the ability to share workflows to an open and broader community. Right now there is the excellent ability to share histories and workflows with other users within your network of colleagues, but I would like to see an open community to share and rate workflows. From the comment above, it seems that is coming. It will be a very welcome addition.

One last feature added I’d like to mention is pages:

Galaxy Pages (Figure 4) are the principal means for communicating accessible, reproducible, and transparent computational research through Galaxy. Pages are custom web-based documents that enable users to communicate about an entire computational experiment, and Pages represent a step towards the next generation of online publication or publication supplement. A Page, like a publication or supplement, includes a mix of text and graphs describing the experiment’s analyses.In addition to standard content, a Page also includes embedded Galaxy items from the experiment: datasets, histories, and workflows. These embedded items provide an added layer of interactivity, providing additional details and links to use the items as well.

I tried out the pages (click “user” at the top right of the page, then click “pages” to access pages). I like the ability to basically write what is a methods and materials for computational biology. You can describe what you did, embed histories, datasets and the like. Unfortunately, at the time of this writing I was able to build a page, but unable to view it (server error, I used latest versions Safari and Firefox in Mac 10.5). I am sure this is a temporary glitch.

Galaxy has making huge progress in the last couple years and looks poised to become a go-to tool for computational analysis for experimental biologists. In that vein, you might want to check out their introductory tutorial or screencasts to get acquainted with the tool!

*disclaimer: The Galaxy group contracts with OpenHelix to provide an introductory tutorial on Galaxy (free and open to all users).

Goecks, J., Nekrutenko, A., Taylor, J., & Galaxy Team, T. (2010). Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences Genome Biology, 11 (8) DOI: 10.1186/gb-2010-11-8-r86

Galaxy http://www.galaxyproject.org

Tip of the Week: Sharing your analysis process

galaxyworkflow_thWe’ve introduced Galaxy (http://www.usegalaxy.org) before in the Tip of the Week section, have showed you one thing useful you could do with it, and now we also have a free tutorial and training materials that introduce you to the basic use of the tool. In today’s tip of the week, I’m going to show you workflows. Workflows was in beta until recently, so it isn’t in the first version of our tutorial (though it will be in the second). It’s a great feature, so I wanted to introduce it here. Workflows allows you to set up an automated process that takes your data through a preset series of manipulation and analysis steps. This can be very useful if there is a process you are doing a lot and you don’t want to have to do each step every time, or if you create a analysis process you’d like to share with a colleague. (and I’d like to point out that Galaxy has a good number of other short screencasts of tasks you might want to check out after doing the tutorial )

Tip of the Week: Galaxy

galaxytutorialWe’ve done two tips of the week on Galaxy so far, one showing you the interface and the other how to convert genome coordinates between assemblies. Galaxy is not a database, but rather a analysis tool that allows you to pull in data from many different sourcs such as UCSC Genome Browser database, Ensembl‘s Biomart or your own data. The tools in Galaxy also allow you to prepare and manipulate that data in many different ways, merging datasets, eliminating or adding columns of data, sorting, filtering and many other actions. Galaxy includes many  tools that allow you to analyze the data in a variety of ways. One aspect of Galaxy that is very useful is that a history of your actions and steps is kept and can be revisited, shared and reconfigured giving your analysis a collabarative, reproducible quality that makes your research much more useful.

This week, we would like to introduce an entire tutorial, quick reference cards and additional training materials for Galaxy. Galaxy has teamed up with OpenHelix to provide you with tutorials and training materials that will introduce you to this excellent tool. You can watch the tutorial from the Galaxy landing page (http://www.openhelix.com/galaxy). You can also download slides for your use with scripts, exercises to get you better acquainted with the tool or order Quick Reference Cards.

Tip of the Week: Galaxy intro

We had a tip last week on converting genome coordinates using Galaxy. This week I’d like to introduce you to the Galaxy interface. This screencast was actually done by one of the developers of Galaxy and is a quick introduction to the interface. We are currently working with Galaxy on a longer introduction to the tool, but thought I’d give you taste of it here. Galaxy is an excellent analysis tool. It’s not a database, but rather a tool to analyze data you can obtain from other sources and allows you to save your workflows and many other tools that help you analyze and collaborate. (if the movie to the left doesn’t load, try this link to view the movie).


New On-line Tutorials on Promoter Analysis Tools

Comprehensive tutorials on a set of promoter analysis tools (Melina II, Consensus, MDscan, Gibbs and MEME) enable researchers to quickly and effectively use these invaluable resources.

OpenHelix today announced the availability of new tutorial suites on several promoter analysis resources including Melina II, Consensus, MDscan, Gibbs and MEME. The first tutorial of this set is on Melina II , a freely available web-based tool for promoter analysis. Researchers can use several algorithms with the Melina II analysis including Consensus, MDscan, Gibbs and MEME. Melina II and these four algorithms are used to find DNA motifs that may represent regulatory regions such as promoters or enhancers, or protein motifs and domains important in protein function. The additional tutorials in the OpenHelix promoter analysis set include short introductory tutorials on all four of these useful and important algorithms for promoter analysis.

The tutorial suites, available for single purchase or through a low-priced yearly subscription to all OpenHelix tutorials, contain a narrated, self-run, online tutorial, slides with full script, handouts and exercises. With the tutorials, researchers can quickly learn to effectively and efficiently use these resources. These tutorials will teach users:

* about the algorithms that can be used through Melina II
* how to do basic DNA and protein motif searches
* how to understand and interpret your search results
* how to display graphic representation of motifs

To find out more about these and other tutorial suites visit OpenHelix or the OpenHelix Blog for up-to-date information on genomics.

About OpenHelix
OpenHelix, LLC, (http://www.openhelix.com) provides the genomics knowledge you need when you need it. OpenHelix currently provides online self-run tutorials and on-site training for institutions and companies on the most powerful and popular free, web based, publicly accessible bioinformatics resources. In addition, OpenHelix is contracted by resource providers to provide comprehensive, long-term training and outreach programs.