Tip of the Week: GO EAST for GO terms

goeast_tip.jpgWell, actually, GOEAST young scientist. This week’s tip of the week builds on the data from my previous tip. I had generated a list of genes and I wanted to use that list at a variety of sites to analyze the features of my list. So this week I have tried that at GOEAST.

I used the first 1999 items in my UniProt disease list and uploaded them to GOEAST. The movie shows you the process and a quick look at the outcome.

This view is just a quick example of a basic list upload using their Batch tool. The Batch tool algorithm is somewhat different from the pre-loaded microarray gene analysis they say, because of the way the background is calculated. It outputs the GO terms and groups the genes that fit that GO term. There are a number of other features of GOEAST that look intriguing and helpful. It looks like they handle various microarray platforms easily. They have a variety of outputs (web, text file, graphical). I couldn’t get the graphical output to work sometimes (probably my list is too large). I also would have liked to do the list the other way–to have the list of my genes and have the terms associated with them. I haven’t figured out if there is a way to do that so far.

I’m not going to draw many conclusions from this yet–I want to try a variety of tools and think about the features and the quality of the results. But this tool seemed to effectively group my genes into buckets with GO terms that could be helpful for an analysis.

Check out their paper for more details:

Zheng Q, Wang XJ. Nucleic Acids Res. 2008 May 16. GOEAST: a web-based software toolkit for Gene Ontology enrichment analysis. PMID: 18487275

2 thoughts on “Tip of the Week: GO EAST for GO terms

  1. Carl

    The only issue I see is the inability to specify the background. The backgrounds offered are either the entire microarray set or species set. I’ve found this to be problematic for some cases. Say you’ve run an experiment of FACS-sorted cells from a given tissue which yields four cell types. You’ve run microarrays for each of the cell types. You want to see which GOs are enriched in each of the given cell types. If you use the entire species or microarray as background, all cell types will show GOs enriched for the entire tissue, rather than just the cell type of interest (as each of the cell types are derived from that tissue).

    The only way I’ve been able to get around this in the past is specifying the background as all “expressed genes” in the tissue type rather than the entire set of microarray genes. I imagine one would run into a similar problem if they wanted to compare GO enrichment across clusters derived from the same set of microarrays.

    Am I correct here or am I missing something?

  2. Mary

    Yeah, I see what you mean Carl. That’s why I flagged the difference between the background methods in my comment–because that could an issue in some cases. In my sample case it isn’t.

    Would it help if you could create your own background file? Or if you could run your experimental samples and then subtract your controls somehow?

    Software developers are often looking for input from users–we can suggest things to them :)

Comments are closed.