Tag Archives: galaxy

What’s The Answer? (F1 of Biostars x Galaxy version!)

Offspring of original Biostars site, Galaxy Biostars replaces their support mailing list.

Offspring of original Biostars site, Galaxy Biostars replaces their support mailing list.

Generally each week we highlight a post from the main Biostars site, which answers some question or offers discussion of bioinformatics tools or analyses across many arenas. But this week I want to give you a look at the offspring of Biostar–Galaxy Biostar!

I’m calling it the F1 of Biostars x Galaxy, rather than the gendered “son of Galaxy”. There’s a post over at the Galaxy Biostar support site that describes a transition away from their traditional mailing-list based support to this new format. I’ll link part of that here, but it’s long so you should go read the whole thing over there.

Forum: Welcome to Galaxy Biostar

Dear Galaxy Community,
Galaxy has teamed up with Biostar to create a Galaxy User support forum at https://biostar.usegalaxy.org!

We want to create a space where researchers using Galaxy can come together and share both scientific advice and practical tool help.  Whether on usegalaxy.org, a Cloudman instance, or any other Galaxy (public or local), if you have something to say about Using Galaxy, this is the place to do it!

[has a lot more detail--go read the whole thing over there]

Jennifer Hillman Jackson

As I noted over there, I’ve been using mailing lists happily for a long time because that’s what we had. But I think this is a great way to transition to support now instead of email lists. Go check it out!

 

Video Tip of the Week: list of genes associated with a disease

I am currently in Puerto Varas, Chile at an EMBO genomics workshop. The workshop is mainly for grad students and the instructors are, for the most part, alumni of the Bork group. I gave a tutorial on genomics databases.

Anyway, the last two days of the workshop is a challenge, in teams of 3-4 advised by an instructor, students are to develop a list of genes associated with epilepsy. Obviously, this could be a trivial task, just go to OMIM or GENECARDS and grab a list. But this challenge requires them to go behind that and use the available data and make predictions. My team attempted, on my suggestion, some brainstorming techniques to ensure a more creative solution than they could come up with individually or just jumping into normal group dynamics. It seemed to work, their solution was quite creative and we will find out today how that worked.

That was my long way of saying, in the process we came across many databases of gene-disease information. above you will find a video of rat gene disease associations from RGD, often used of course to investigate human gene disease associations.

Below you will find a list of some excellent databases and resources to find similar lists:

Gene Association Database  http://geneticassociationdb.nih.gov/

G2D http://g2d2.ogic.ca

OMIM http://www.omim.org

Diseases http://diseases.jensenlab.org/

GeneCards http://genecards.org

DisGeNET http://ibi.imim.es/web/DisGeNET/

Several NCBI resources http://www.ncbi.nlm.nih.gov/guide/howto/find-gen-phen/

UCSC Genome Browser’s tracks for disease and phenotype http://genome.ucsc.edu

There are several others I’m sure, if you have a favorite not on this list, please comment.

Reference for RGD:
Laulederkind S.J.F., Hayman G.T., Wang S.J., Smith J.R., Lowry T.F., Nigam R., Petri V., de Pons J., Dwinell M.R. & Shimoyama M. & (2013). The Rat Genome Database 2013–data, tools and users, Briefings in Bioinformatics, 14 (4) 520-526. DOI:

Video Tip of the Week: MetaPhlAn and Galaxy

CPB Using Galaxy 2 from Galaxy Project on Vimeo.

for loading and using datatypes and  the OpenHelix Galaxy tutorial for getting familiar with Galaxy interface and usage.

Metagenomics analysis can be a bit daunting at times, but there are a good number of tools out there to assist a researcher in analysis.  Integrated Microbial Genomes at JGI has some excellent tools such as IMG/M and IMG HMP M. (OpenHelix tutorialThere are other excellent tools that I suggest you check out. QIIME is an excellent tool also.

But the above is not per se a metagenomics tutorial, rather it’s some short screencast of how to use the Galaxy interface for loading data and datatypes. Why? Because another excellent set of tools to use for metagenomic analysis is MetaPhlAn from the Huttenhower lab at Harvard.

The MetaPhlan tools can be downloaded and used ‘offline’, but they also have an excellent Galaxy interface to the tools. If you walk yourself through the MetaPhlAn tutorials on their site, including their Galaxy module one, after familiarizing yourself with Galaxy above, that should help you get started on some excellent metagenomics analysis.

To get a feel of these and other tools and workflows, you might want to browse through this excellent slide set from Surya Saha, Research Associate at Cornell University, from last year.

Quick Links:

Galaxy

Nicola Segata, Levi Waldron, Annalisa Ballarini, Vagheesh Narasimhan, Olivier Jousson & Curtis Huttenhower (2012). Metagenomic microbial community profiling using unique clade-specific marker genes Nature Methods (9), 811-814 : doi:10.1038/nmeth.2066

Video Tip of the Week: TrioVis for family genome data sets

I’m always interested in new strategies to visualize data. So when I saw discussion about a tool to help analyze family genomic data, I went to have a look. TrioVis is a new software tool that offers nice visualization and filtering strategies for exploring parent and child trio data sets. These data sets will become increasingly common as families seek out information for uncharacterized medical situations that may be affecting their kids. But they are being widely used already in many research situations.

TrioVis relies on the common VCF or Variant Call Format files that are generated from sequencing data. You can have a look at the types of information they carry at the 1000 Genomes project site. These files are created for each parent and the child in a trio situation, and then they are visualized with TrioVis in this manner:

The user interface consists of five sections: the main table (Fig. 1A), the global variant count bar graphs (Fig. 1B), the variant frequency sliders (Fig. 1C), the coverage sliders (Fig. 1D) and the histogram view (Fig. 1E). Each section focuses on a specific aspect of trio data and offers specific interactive features to calibrate the thresholds. Father, mother and child are colour-coded in green, orange and blue, respectively.

You can read the paper for more details on their goals and strategies. They also point to some 1000 Genomes project sample data you can use to run their tool.

But I also want to commend the TrioVis folks for putting a screencast of their tool right in their abstract. So their video is what I’d like you to view as this week’s Tip of the Week:

TrioVis from Ryo Sakai on Vimeo.

Right now there isn’t a web interface to use, but I noticed in their paper that they plan to integrate this into Galaxy. I think that’s another great idea on their part.

So if you find yourself exploring family trio data sets, consider a look at TrioVis.

Hat tip to Justin Johnson for drawing my attention to this paper and resource.

Quick links:

TrioVis software: https://bitbucket.org/biovizleuven/triovis/wiki/Home

TrioVis video: http://vimeo.com/user6757771/triovis

Reference:

Sakai, R., Sifrim, A., Vande Moere, A., & Aerts, J. (2013). TrioVis: a visualization approach for filtering genomic variants of parent-child trios Bioinformatics DOI: 10.1093/bioinformatics/btt267

Galaxy Intro Webinar follow-up post (July 19)

We’ll be having our July 19th Galaxy webinar today, and we find there are questions to follow up afterwards that are often better handled in discussions on the blog.

If there are questions we didn’t have time to get to–or things we want to expand on with more detail–we can discuss them in this thread.

Or if you have other things you’ve been meaning to ask, let us know.

If have registered for the webinar, the same material will be available  in the training movie, slides, and exercises tutorial suite: http://www.openhelix.com/galaxy. You can also sign up to be informed of future webinars coming up on these topics, UCSC, ENCODE and others.

Some questions asked in today’s webinar, with answers:

1) Galaxy seems to downloadable in addition to the PSU portal and the cloud at Amazon. How would you choose?

Each has it’s purposes. From the Galaxy Wiki:
Install your own Galaxy if you want to,

a) Develop it further
b) Add new tools
c) Plug-in new datasources,
d)Run a local production server for your site because you have
Sensitive data (e.g., clinical) or
Large datasets or processing requirements that are too big to be processed on Main

Use the Cloud:

“With sporadic availability of data, individuals and labs may have a need to, over a period of time, process greatly variable amounts of data. Such variability in data volume imposes variable requirements on availability of compute resources used to process given data. Rather than having to purchase and maintain desired compute resources or having to wait a long time for data processing jobs to complete, the Galaxy Team has enabled Galaxy to be instantiated oncloud computing infrastructures”

2) Can I use Galaxy to analyze protein data?

Yes, there are a few tools for analysis on the main instance, but also you can add your own tools to a local instance.

3) What kind of local server? Can you describe the PSU instance as an example? server size, storage. filesystem , etc. ?

Check out this link for needs.

4) Can we use galaxy to align the whole genome sequences of rice to get SNPs?

This link might help.

5) Is there a link to the toolshed from the galaxy interface?

Not that I know, but this is it: http://toolshed.g2.bx.psu.edu/

6) How secure is the data we run on galaxy.psu?

 From the site (emphasis added in answer):

This is a free, public, internet accessible resource. Data transfer and data storage are not encrypted. If there are restrictions on the way your research data can be stored and used, please consult your local institutional review board or the project PI before uploading it to any public site, including this Galaxy server. If you have protected data, large data storage requirements, or short deadlines you are encouraged to setup your own local Galaxy instance or run Galaxy on the cloud.

 

Tip of the Week: Galaxy Tool Shed

This week I attended and gave a talk at ISMB in Long Beach. While there I had the opportunity to attend a session on Galaxy where Jeremy Goecks spoke on Galaxy Visualizations and Greg Von Kuster spoke about the “first biomedical AppStore,” the Galaxy Toolshed. As always, I learned a few new things.

Today’s tip is a quick introduction to the Galaxy Tool Shed. The Tool shed is a place to share tools you’ve developed or to find tools that other developers have developed for your local instance of Galaxy. This is a quick introduction. I won’t be going into the mechanics and specifics of the toolshed, it’s not specifically for the experimental biologist end user, but rather for developers of tools for use in Galaxy. That said, it can be useful for the end user to know what tools might be available and get them into their local installation. If you or your institution is installing a local instance of Galaxy, you might want to check out the extensive documentation on how to use the toolshed.

There are a lot of tools available in the tool shed, over 1800 at last count. They range through many different categories. Though it’s only been a couple years since the implementation of the toolshed, some published tools such as CodonLogo which is a logo-based viewer for codon patterns in aligned sequences, have been added to the toolshed.

If you want to learn more about Galaxy.

We have a  webinar tomorrow (July 19, 2012 at 11am PDT)  introducing Galaxy (free).

We have an online tutorial (fee)

And we’ve done tips (free of course) on Galaxy visualization, getting flanking sequences and converting genome coordinates using Galaxy,  and Galaxy pages. And we’ve tipped and blogged a lot of Galaxy-related stuff.

Quick Links:
Galaxy Main Instance
Galaxy Tool Box
Galaxy Tool Box How-to
Setting up a local instance

 

Sharma V, Murphy DP, Provan G, & Baranov PV (2012). CodonLogo: a sequence logo-based viewer for codon patterns. Bioinformatics (Oxford, England), 28 (14), 1935-6 PMID: 22595210

Video Tip of the Week: Visualizing the Galaxy


An antennae galaxy

Well, not that kind of galaxy (though visualizing those are quite nice), this kind of Galaxy. Galaxy is an excellent tool to analyze, reproduce and share genomics data and the Galaxy folks are always updating, improving and adding features to the tool. We have a tutorial for Galaxy to help you get started using this tool. As you might have guessed from the previous sentence, Galaxy is a moving target. The basics (and that’s what the tutorial is for) are the same, but the tutorial is in the process of being updated to reflect some of those changes. That update should be out sooner rather than later, but that said, we just can’t fit everything into the tutorial. The relatively new visualization tool is something that will not be in the tutorial. As there are no tutorials on visualization at the Galaxy site that I can find (if you know of any, link them here!), I’ve included a quick intro to visualizations using Galaxy in this tip of the week.

There are other ways to visualize the data analyzed at Galaxy. Galaxy datasets can often be viewed directly at UCSC Genome Browser, Ensembl, RViewer or in GeneTrack within Galaxy. Those are all excellent tools and powerful ways to view and explore your analysis in depth. In addition, the Galaxy visualization tool is a way to quickly visualize your data to help  discovery,  direct further analysis and share what you’ve found. It is obviously not a full fledged browser, but is very useful in doing a simple visualization of your data from within Galaxy. Today’s tip gives a quick introduction to Galaxy visualization.

Quick Links:
Galaxy (OH tutorial-subscr.)
UCSC Genome Browser (OH tutorials-free)
Ensembl (OH tutorials-subscr.)
RViewer
GeneTrack

P.S. You might here some bird song in the background. I am in, and working from, Hawaii for the next month (yeah, it’s tough work but someone has got to do it). No way to get those birds (or the frogs at night) to be silent for a bit.

UPDATE: Galaxy servers are ̶d̶o̶w̶n̶ semi-up (they know). Other mirrors or sites

UPDATE: Galaxy is up–but…

Be nice–don’t run giant projects right now…and it might not be entirely stable anyway. If you can wait, it might be wise.

++++++++++++++++++++++++

I saw a notice earlier, but figured it would be short term. However, just now I saw this:

You can follow the Galaxy twitter feed for updates: @GalaxyProject

Here are links to some mirrors or other servers you can use if you need one at BioStars: list of public Galaxy servers

I suspect this also means that the GenomeSpace one from today’s tip would also be down, as that’s a test server there.

This is just a PSA–I remember one time UCSC Genome Browser went down (they had a cable cut by construction work–not an earthquake that time), and the traffic to our mirrors post was astounding. So I thought people might be looking for this kind of info as well, and it’s hard to get the word out if your site is out of service…

 

Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

And one last special item:

PhD The Movie is now available for streaming–check out the details here:

http://www.phdmovie.com/

Video Tips of the Week: Annual Review IV, 2nd half

As you may know, we’ve been doing these video tips-of-the-week for FOUR years now. We have completed around 200 little tidbit introductions to various resources from last year, 2011 (yep, it’s 2012 now). At the end of the year we’ve established a sort of holiday tradition: we are doing a summary post to collect them all. If you have missed any of them it’s a great way to have a quick look at what might be useful to your work.

You can see past years’ tips here: 2008 I2008 II2009 I2009 II2010 I2010 II. The summary of the first half of 2011 is available from last week.

July 2011

July 6: Prioritizing genes using the Gene Prioritization Portal

July 13: PolySearch, searching many databases at once

July 20: Human Epigenomics Visualization Hub

July 27: The new SIB Bioinformatics Resource Portal

 

August 2011

August 3: SNPexp, correlation between SNPs and gene expression 

August 10: CompaGB for comparing genome browser software

August 17: CoGe, comparing genomes revisited

August 24: Domain Draw for quick motif diagrams

August 31: From UniProt to the PSI SBKB and back again

 

September 2011

September 7: Plant comparative genomics using Plaza

September 14: phiGENOME for bacteriophage genome exploration

September 21: Getting flanking sequences of genomic locations

September 28: Introduction to R statistical software 

 

October 2011

October 5: VnD resource for genetic variation and drug information

October 12: Track Hubs in UCSC Genome Browser

October 19: Mitochondrial Transcriptome GBrowser 

October 26: Variation data from Ensembl

 

November 2011

November 2: MizBee Synteny Browser

November 9: The new database of genomic variants: DGV2

November 16: MapMi, automated mapping of microRNA loci

November 23: BioMart’s new central portal

November 30: Phosphida, a post-translational modification database

December 2011

December 7: VarSifter, for identifying key sequence variations

December 14: Big changes to NCBI’s genome resources

December 21: eggNOG for the Holidays (or to explore orthologous genes)

December 28: Video Tips of the Week: Annual Review IV (first half of 2011)