Tag Archives: galaxy


Friday SNPpets

This week’s SNPpets offer both science and humor. I think people get a little punchy around the holidays/end of semester. There’s software for assembly, a bioinformatics network for African topics, bioethics of gene editing, cancer and personalized medicine,  Dilbert comments on big data and health, interesting tools for open- and evidence-based medicine, microbiome concepts and a metagenomic journey. Most curious thing this week: the mouse poem constructed entirely from paper titles.

SNPpets_2Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…


Friday SNPpets

This week’s SNPpets include a dinoflagellate genome; artificial base pairs in use; the relative joy of entrepreneurship graphed; hidden Galaxy features; free image resources from NIH; new and updated tools for various purposes–as always; a trait-matching challenge; and veterans who continue to do service by offering their DNA to researchers. Oddest thing: an ear built with van Gogh’s DNA. And more….

SNPpets_2Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…


Friday SNPpets

This week’s SNPpets include a FlyBook from FlyBase; deep thoughts on the cloud; “Sequence to Medical Phenotypes” software; DNA.land relaunching; reference genomes info; updated feature in ExAC; legal looking at “secret DNA computer source code” and secret DNA to thwart art theives; Rabbit Turds at Galaxy (really); bad documentation behavior. Also good communication behavior for genetic counselors. And more….

SNPpets_2Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…


Friday SNPpets

Everybody was full of vim and vigor this week, and a lot of great things were going around. Oddest thing was “fitbit cages for laboratory mice”. But I can see the value. Most heated post will probably be “Myths of Bioinformatics Software.” Other things included recommended data repositories from @PLOS journals, sequencing kids’ genomes, non-coding variants affecting drug activity, a new LD tool, DNA contamination issues, cancer data challenge, microbial databases (including IMG that we love), Galaxy talks, and more.

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…


Friday SNPpets

This week’s SNPpets include mining disease genes with PPI and co-regulation networks, DNA and the law, great posts on germline genetic engineering moratorium discussions, a bioinformatics “middle class”, new human genome assembly models, and more….

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

Video Tip of the Week: Genome assemblers and #Docker

Last fall there was a tip I did on Docker, which was starting to pick up a lot of chatter around the genoscenti. It was starting to look like a good solution for some of the problems of reproducibility and re-use of software in genomics–containerize it. Box it up, hand it off. There’s certainly a lot of interest and appeal in the community, but there are still some issues to resolve with rolling out Docker everywhere. However, my impression is that the Docker team and community seems interested and active in evolving the tools to be as broadly useful as possible.

So when this tweet rolled through the #bioinformatics twitter column on my Tweetdeck, I was excited to see this talk by Michael Barton (who has the best twitter handle in the field: @bioinformatics). It’s a terrific example of how Docker can be aimed at some of the problems in the bioinformatics tool space. It’s not the only option, or course. Some workflow resources like Galaxy can cover other features of genomics researchers’ needs. But as a general solution to the problems of comparing software and distributing complete working containers, Docker seems to developing into a very useful strategy.

Here’s the video:

Although this is longer than our typical “tips”, I’d recommend that you carve out some time to watch if you are new to the idea of Docker. In case you don’t have time right now for the talk, here’s a summary. For the first 10 minutes, there’s a gentle introduction for non-genomics nerds about what sequencing is like right now. Then Michael describes how the assembler literature works–with completing claims about the “better” assembler as each new paper comes along. This includes a sample of the types of problems that assemblers are trying to tackle with different strategies.

Around 14min, we begin to look at what it’s like to be the researcher who needs to access some assembler software. Then he describes how different lab groups–like remote islands–can instantly ship their sequence data around today. But that biologists are like “longshoremen for data”: they have to unload, unpack, install, try to get all the right pieces together to make it work in a new lab. We are doing “break bulk” science right now. That was a really terrific assessment of the state of play, I thought.

If you are ok with the other pieces, you can skip to around 16min, where we get to know about a specific example of the benefits of Docker for this type of research. Michael goes on to describe how Docker has helped him to build a system to catalog and evaluate various assemblers. He developed the project called nucleotid.es (pronounced just as “nucleotides”),  which he goes on to describe. It offers details about various assemblers, which have been put into containers that are easy to access and to use to compare different software. There are examples of benchmarks, but you can also use these containers for your own assembly purposes. You can explore the site for more detail and a lot of data on the assembler comparisons that they have already. A good overview of the reasons to do this can also be found in the blog post over there:  Why use containers for scientific software?

At about 25min, some of the constraints and problems they are noted. Fitting Docker into existing infrastructure, and incentivising developers to create Docker containers, can be issues.  But the outcomes–having a better strategy than traditional publication for reproducibility, having ongoing access to the software, and the “deduplication of agony” seems to be worth investigating, for sure. deduplication_of_agony Then Barton describes what the pipeline could look like for a researcher with some new sequence–you can use the data from a variety of assemblers to make decisions about how to proceed, rather than sifting through papers or just using what the lab next door did. And if you have a new assembler, you can use this setup to benchmark it as well.

So if you’ve been hearing about Docker, and have been concerned about access and reproducibility issues around genomics data and software, have a look at this video. It nicely presents the problems we face, and one possible solution, with a concrete example. There may be other useful methods as well–like offering a central portal for uses to access multiple tools, like AutoAsssemblyD has described–but that’s really for a different subset of users. But for the more general problem of software comparisons, benchmarking, and access to bioinformatics tools, Docker seems to offer a useful strategy. And I did a quick PubMed check to see if Docker is percolating through the traditional publication system yet, and found that it is. I found that ballaxy (“a Galaxy-based workflow toolkit for structural bioinformatics”) is offered as a Docker image, which means that having a grasp of Docker going forward may really be useful for software users rather quickly….

Quick links:

nucleotid.es: http://nucleotid.es

Docker: http://www.docker.com

References (and in this case the slide deck):

And other useful and related items from this post:

Automating the Selection Process for a Genome Assembler, JGI Science Highlights. October 17, 2014. http://jgi.doe.gov/automating-selection-process-genome-assembler/

Veras A., Pablo de Sá, Vasco Azevedo, Artur Silva, Rommel Ramos, Institute of Biological Sciences, Federal University Pará, Belém, Pará & Brazil (2013). AutoAssemblyD: a graphical user interface system for several genome assemblers, Bioinformation, 9 (16) 840-841. DOI: http://dx.doi.org/10.6026/97320630009840

Hildebrandt A.K.,  D. Stockel, N. M. Fischer, L. de la Garza, J. Kruger, S. Nickels, M. Rottig, C. Scharfe, M. Schumann, P. Thiel & H.-P. Lenhof & (2014). ballaxy: web services for structural bioinformatics, Bioinformatics, 31 (1) 121-122. DOI: http://dx.doi.org/10.1093/bioinformatics/btu574

What’s The Answer? (F1 of Biostars x Galaxy version!)

Offspring of original Biostars site, Galaxy Biostars replaces their support mailing list.

Offspring of original Biostars site, Galaxy Biostars replaces their support mailing list.

Generally each week we highlight a post from the main Biostars site, which answers some question or offers discussion of bioinformatics tools or analyses across many arenas. But this week I want to give you a look at the offspring of Biostar–Galaxy Biostar!

I’m calling it the F1 of Biostars x Galaxy, rather than the gendered “son of Galaxy”. There’s a post over at the Galaxy Biostar support site that describes a transition away from their traditional mailing-list based support to this new format. I’ll link part of that here, but it’s long so you should go read the whole thing over there.

Forum: Welcome to Galaxy Biostar

Dear Galaxy Community,
Galaxy has teamed up with Biostar to create a Galaxy User support forum at https://biostar.usegalaxy.org!

We want to create a space where researchers using Galaxy can come together and share both scientific advice and practical tool help.  Whether on usegalaxy.org, a Cloudman instance, or any other Galaxy (public or local), if you have something to say about Using Galaxy, this is the place to do it!

[has a lot more detail–go read the whole thing over there]

Jennifer Hillman Jackson

As I noted over there, I’ve been using mailing lists happily for a long time because that’s what we had. But I think this is a great way to transition to support now instead of email lists. Go check it out!


Video Tip of the Week: list of genes associated with a disease

I am currently in Puerto Varas, Chile at an EMBO genomics workshop. The workshop is mainly for grad students and the instructors are, for the most part, alumni of the Bork group. I gave a tutorial on genomics databases.

Anyway, the last two days of the workshop is a challenge, in teams of 3-4 advised by an instructor, students are to develop a list of genes associated with epilepsy. Obviously, this could be a trivial task, just go to OMIM or GENECARDS and grab a list. But this challenge requires them to go behind that and use the available data and make predictions. My team attempted, on my suggestion, some brainstorming techniques to ensure a more creative solution than they could come up with individually or just jumping into normal group dynamics. It seemed to work, their solution was quite creative and we will find out today how that worked.

That was my long way of saying, in the process we came across many databases of gene-disease information. above you will find a video of rat gene disease associations from RGD, often used of course to investigate human gene disease associations.

Below you will find a list of some excellent databases and resources to find similar lists:

Gene Association Database  http://geneticassociationdb.nih.gov/

G2D http://g2d2.ogic.ca

OMIM http://www.omim.org

Diseases http://diseases.jensenlab.org/

GeneCards http://genecards.org

DisGeNET http://ibi.imim.es/web/DisGeNET/

Several NCBI resources http://www.ncbi.nlm.nih.gov/guide/howto/find-gen-phen/

UCSC Genome Browser’s tracks for disease and phenotype http://genome.ucsc.edu

There are several others I’m sure, if you have a favorite not on this list, please comment.

Reference for RGD:
Laulederkind S.J.F., Hayman G.T., Wang S.J., Smith J.R., Lowry T.F., Nigam R., Petri V., de Pons J., Dwinell M.R. & Shimoyama M. & (2013). The Rat Genome Database 2013–data, tools and users, Briefings in Bioinformatics, 14 (4) 520-526. DOI:

Video Tip of the Week: MetaPhlAn and Galaxy

CPB Using Galaxy 2 from Galaxy Project on Vimeo.

for loading and using datatypes and  the OpenHelix Galaxy tutorial for getting familiar with Galaxy interface and usage.

Metagenomics analysis can be a bit daunting at times, but there are a good number of tools out there to assist a researcher in analysis.  Integrated Microbial Genomes at JGI has some excellent tools such as IMG/M and IMG HMP M. (OpenHelix tutorialThere are other excellent tools that I suggest you check out. QIIME is an excellent tool also.

But the above is not per se a metagenomics tutorial, rather it’s some short screencast of how to use the Galaxy interface for loading data and datatypes. Why? Because another excellent set of tools to use for metagenomic analysis is MetaPhlAn from the Huttenhower lab at Harvard.

The MetaPhlan tools can be downloaded and used ‘offline’, but they also have an excellent Galaxy interface to the tools. If you walk yourself through the MetaPhlAn tutorials on their site, including their Galaxy module one, after familiarizing yourself with Galaxy above, that should help you get started on some excellent metagenomics analysis.

To get a feel of these and other tools and workflows, you might want to browse through this excellent slide set from Surya Saha, Research Associate at Cornell University, from last year.

Quick Links:


Nicola Segata, Levi Waldron, Annalisa Ballarini, Vagheesh Narasimhan, Olivier Jousson & Curtis Huttenhower (2012). Metagenomic microbial community profiling using unique clade-specific marker genes Nature Methods (9), 811-814 : doi:10.1038/nmeth.2066

Video Tip of the Week: TrioVis for family genome data sets

I’m always interested in new strategies to visualize data. So when I saw discussion about a tool to help analyze family genomic data, I went to have a look. TrioVis is a new software tool that offers nice visualization and filtering strategies for exploring parent and child trio data sets. These data sets will become increasingly common as families seek out information for uncharacterized medical situations that may be affecting their kids. But they are being widely used already in many research situations.

TrioVis relies on the common VCF or Variant Call Format files that are generated from sequencing data. You can have a look at the types of information they carry at the 1000 Genomes project site. These files are created for each parent and the child in a trio situation, and then they are visualized with TrioVis in this manner:

The user interface consists of five sections: the main table (Fig. 1A), the global variant count bar graphs (Fig. 1B), the variant frequency sliders (Fig. 1C), the coverage sliders (Fig. 1D) and the histogram view (Fig. 1E). Each section focuses on a specific aspect of trio data and offers specific interactive features to calibrate the thresholds. Father, mother and child are colour-coded in green, orange and blue, respectively.

You can read the paper for more details on their goals and strategies. They also point to some 1000 Genomes project sample data you can use to run their tool.

But I also want to commend the TrioVis folks for putting a screencast of their tool right in their abstract. So their video is what I’d like you to view as this week’s Tip of the Week:

TrioVis from Ryo Sakai on Vimeo.

Right now there isn’t a web interface to use, but I noticed in their paper that they plan to integrate this into Galaxy. I think that’s another great idea on their part.

So if you find yourself exploring family trio data sets, consider a look at TrioVis.

Hat tip to Justin Johnson for drawing my attention to this paper and resource.

Quick links:

TrioVis software: https://bitbucket.org/biovizleuven/triovis/wiki/Home

TrioVis video: http://vimeo.com/user6757771/triovis


Sakai, R., Sifrim, A., Vande Moere, A., & Aerts, J. (2013). TrioVis: a visualization approach for filtering genomic variants of parent-child trios Bioinformatics DOI: 10.1093/bioinformatics/btt267