Friday SNPpets (with special holiday oomycete song)

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

Um. Yeah. You gotta love a research organism holiday song.

I make it up to you with 5 Golden Mice.

What’s the Answer? (FANTOM5 promoter atlas)

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

This week’s highlighted question at Biostar is something that rang a bell. I’ve been meaning to take a look at this resource, but it got buried on my desk under a pile of other stuff and I forgot to get back to it.

Question: FANTOM5 Promoter Atlas

Hi All,

I went through this paper on promoter atlas in FANTOM5 http://www.nature.com/nature/journal/v507/n7493/full/nature13182.html#the-fantom5-promoter-atlas , my question is that do they have cell wise CAGE dataset or is it global , as I cannot see the cell wise CAGE expression on their website.

Thanks in advance.

Aishwarya.Kulkarni

The thing that reminded me about that paper was in the answer–Taylor Raborn noted that this can be found in the ZENBU resources. In both March and in May I started tip-of-the-week posts about ZENBU as I can tell from my draft folder, but other stuff came up. I really have to visit that in the new year. If people could stop developing new resources for a while, I can catch up…? Right, that will happen.

Until I have a chance to get back to it (we have our annual special summary posts over the next two weeks and other stuff already in the hopper for early next year), you’ll have to settle for the ZENBU wiki details on their genome browser.

References:
Forrest A.R.R., Hideya Kawaji, Michael Rehli, J. Kenneth Baillie, Michiel J. L. de Hoon, Vanja Haberle, Timo Lassmann, Ivan V. Kulakovskiy, Marina Lizio, Masayoshi Itoh & Robin Andersson & (2014). A promoter-level mammalian expression atlas, Nature, 507 (7493) 462-470. DOI: http://dx.doi.org/10.1038/nature13182

Severin J., Marina Lizio, Jayson Harshbarger, Hideya Kawaji, Carsten O Daub, Yoshihide Hayashizaki, Nicolas Bertin & Alistair R R Forrest (2014). Interactive visualization and analysis of large-scale sequencing datasets using ZENBU, Nature Biotechnology, 32 (3) 217-219. DOI: http://dx.doi.org/10.1038/nbt.2840

Video Tip of the Week: yEd Graph Editor for visualizing pathways and networks

This week’s video tip of the week closes out a series that began last month. I started to explore one gene co-expression tool, which led me to another tool for visualization, and so on. This week’s tool is the final piece that you need to know about if you want to create the kind of interaction/network diagrams used in the modeling of a system that I covered last week.

The yEd Graph Editor is different than some of the tools. As a corporate product, it doesn’t have

yFiles layouts options in Cytoscape

yFiles layouts options in Cytoscape

the kind of scientific paper trail that some academic tools will. But if you search Google Scholar for “yED Graph Editor” you’ll see people from a wide range of disciplines have used it for their research projects. I first learned about yEd when I was using Cytoscape, and saw that some of the choices for layouts were based on the yEd features. This short overview video from the yWorks folks will explain what some of those layout styles are.

As you can see in this video, the use of yEd is not only for biological interactions–it can do a whole lot of graphing that is entirely unrelated to biology. But the features work for biological networks, and you can customize the graphics to represent your own topic of interest.

There are longer videos with more detail on the use cases for yEd. This one uses a sample flow chart to illustrate the basic editing features. It quickly covers many helpful aspects of establishing and editing a visualization.

You can also find videos from folks who use yEd for their projects on YouTube, some of which might be more specific for a given field of research. But these should give you the basics of why yEd can be used for the types of projects that you saw in the previous tips with Virtually Immune and BioLayoutExpress3D. And like I noted with Virtually Immune, you can get your hands on the files in the Pathway Models collection, and launch a yEd file to go into the features with a detailed example. The complexity you can generate with these models is astonishing.

There was no reference specifically for yEd that I was able to locate, but you can find that lots of people use yEd graph editor on a wide range of research topics in Google Scholar. So if you are looking to see if someone in your research area has used yEd, you may find some examples. If you are going to consider exploring the BioLayout and Virtually Immune tools, it will help to understand the framework. And also as I mentioned in Cytoscape–understanding yEd helped me to grasp the layout options there too. So try out yEd for pathway and network visualization if you have needs for those types of representations in your research and presentations. It’s free to download and use.

Quick links:

yED Graph Editor: http://www.yworks.com/yed

yEd Graph Editor Manual: http://yed.yworks.com/support/manual/index.html

References:

Wright D.W., Tim Angus, Anton J. Enright & Tom C. Freeman (2014). Visualisation of BioPAX Networks using BioLayout Express3D, F1000Research, DOI: http://dx.doi.org/10.12688/f1000research.5499.1

Smoot M.E., K. Ono, J. Ruscheinski, P.-L. Wang & T. Ideker (2010). Cytoscape 2.8: new features for data integration and network visualization, Bioinformatics, 27 (3) 431-432. DOI: http://dx.doi.org/10.1093/bioinformatics/btq675

Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

What’s the Answer? (tidy data format)

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

This week’s highlighted post at Biostar is about “tidy data”. Ah, quite the concept. The day when data becomes tidy will be one to celebrate. Anyway, I think it’s a worthwhile discussion to have, and I’m looking forward to the comments as this develops. If you have thoughts, please bring them over there too.

Usually I highlight most of the question here, but this time there are pieces that are too large–examples of format issues–so I’ll just give you the bullet and send you over to Biostar to read the whole thing.

Forum: Principles of Tidy Data (Hadley Wickham) and the VCF format

Hadley Wickham, the author of ggplot and many other popular R packages, has recently published a very good paper regarding the principles of tidy data. This article introduces a new library called tidyr, and also proposes a standard for formatting and organizing data before data analysis.

I personally think that the principles proposed in the article are very good, and that they help a lot in data analysis. Some of these are already adopted by many ggplot2/plyr users, as you need a data frame in a long format in order to produce most of the plots.

My question is whether it would make sense to apply these principles to bioinformatics. In particular, if we look at the VCF format, it fails at least two of the rules mentioned in the paper:

- “3.1. Column headers are values, not variable names”  (because individuals are encoded on distinct columns)

- “3.2. Multiple variables stored in one column” (because each genotype column contains the status of one or more alleles, plus its coverage etc…

For example, let’s take the example from the 4.0 specs of VCF:

[examples here]

[More discussion of the issues within samples, so go read over there]

What do you think? Will we all convert to tidy VCF in the far future?

–Giovanni M Dall’Olio

So, tidy VCF. What do you think? Some people are already musing about it. Discuss over there.

Reference:
Wickham H.W. (2014). Tidy Data, Journal of Statistical Software, 59 (10). http://www.jstatsoft.org/v59/i10

Video Tip of the Week: “Virtually Immune” computational immune system modeling

This week’s video tip of the week is the next in a series. It began when I took a look at GeneFriends, and their option to output the data for use in BioLayout Express3D. So of course we had to then take a look at BioLayout. While I was exploring BioLayout, I came across Virtually Immune. This project contains intricate network diagrams of immune-system related responses which you can load into BioLayout and explore. It is a very neat way to get further in your understanding of BioLayout functions, as well as being an amazing example of how to model a key system for human health. Here is their video overview:

Virtually Immune is developing computational models of the behavior of immune system responses, in part to help reduce the use of animal models. As part of a CrackIT project challenge, they developed a model of Influenza A lifecycle and macrophage responses that you can explore to help understand the goals of the project. On their “about” page, the overall goal includes:

By enabling scientists to run in silico experiments we hope to help them to model infectious and inflammatory disease-associated processes and thereby accelerate the development of of therapeutic agents. In so doing we hope this resource will assist in the reduction and refinement of the use of animals in immunological research.

Their text-based tutorial walks you through the basic steps of building the kinds of models they have: read the literature, draw the pathway you want to represent, initialize the conditions, and then simulate with BioLayout3D. The last step–Verify–means you go back to the bench and see if your computational model predictions make sense. Hopefully refining your ideas computationally can streamline the work in the lab.

To get the best sense of the capabilities of this project, go to their Pathway Models page. From here you can load up any of the examples in BioLayout and look around. When you hover over a pathway a “Show Me” button will appear near the bottom, and clicking that will load up the data in a larger format that you can explore it. On the bottom of the new page, you can click the BioLayout button to visualize this in 3D.

If you aren’t researching immune system features, that’s fine. But it will still help you to understand how pathways relevant to your work could be modeled.

Quick links:

Virtually Immune: http://www.virtuallyimmune.org/

Virtually Immune tutorial: http://www.virtuallyimmune.org/tutorial/

BioLayout Express3D: http://www.biolayout.org/

Reference:

[can't find one for Virtually Immune yet; will attach one if I find it in the future]

Enright, A., & Ouzounis, C. (2001). BioLayout–an automatic graph layout algorithm for similarity visualization Bioinformatics, 17 (9), 853-854 DOI: 10.1093/bioinformatics/17.9.853

Theocharidis A., Stjin van Dongen, Anton J Enright & Tom C Freeman (2009). Network visualization and analysis of gene expression data using BioLayout Express3D, Nature Protocols, 4 (10) 1535-1550. DOI: http://dx.doi.org/10.1038/nprot.2009.177 *cough* access from their publications page…

Wright D.W., Tim Angus, Anton J. Enright & Tom C. Freeman (2014). Visualisation of BioPAX Networks using BioLayout Express3D, F1000Research, DOI: http://dx.doi.org/10.12688/f1000research.5499.1

Oxford plots from the gibbon genome paper

A while back I talked about the software in the gibbon genome paper. I went through to try to pull out as much of the software as I could as sort of a catalog of a representative genome project. Of course, there was a lot in there. Some of it, though, consisted of unpublished code.

fig2_dotplotsOne of the figures I liked very much because it contained a lot of information quickly was this Figure 2 from the main paper, with the Oxford plots for comparison, and then the view of the phylogenetic tree. I mused about whether this was available somewhere, and I contacted the team to find out. Javier Herrero has been really terrific about answering my questions and getting back to me with more details. The plot code was an internal script, and the tree layout wasn’t a special tool, but just a graphical arrangement done by hand later.

So knowing my interest in this software, Javier let me know the other day that he’s put that code for the plots on Github. You can access it yourself there. Note–it requires eHive and Kent libraries. And this makes the dot plots, but you still would have to lay out the tree by hand.

But now you can plot these types of comparisons if you want to try it out.

Quick link:

Oxford plots: https://github.com/jherrero/oxford-plots

Reference:

Carbone L., R. Alan Harris, Sante Gnerre, Krishna R. Veeramah, Belen Lorente-Galdos, John Huddleston, Thomas J. Meyer, Javier Herrero, Christian Roos, Bronwen Aken & Fabio Anaclerio & al. (2014). Gibbon genome and the fast karyotype evolution of small apes, Nature, 513 (7517) 195-201. DOI: http://dx.doi.org/10.1038/nature13679

Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

What’s The Answer? (missing applications, revisited)

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

This week’s highlighted post is actually a trip down memory lane. It floated up to the top recently because the someone raised the question again:

HI all,

Since last post in this thread is almost 4 years old, I am just curious. What was already sold, what has changed and more important, Which Application Is Truly Missing In Bioinformatics today?

One of the things I see is still the need for some data format standards. Another one is related to lack of global standards how to build data analysis pipelines.

I am curious about your thoughts.

klemen

Bioinformatics moves very fast in some ways, yet in other ways the same old problems remain. It was kind of interesting to look over the things we all desired years ago, and think about where we are since then.

Original post question:

Question: Which Application Is Truly Missing In Bioinformatics?

It’s a simple & straight questions. Just think about an app that when you found it, you first thought would be – “OMG!!! That’s it” – or smth like – “I wish I could have found/written/idealized it before”. Don’t need to be a bioinformatical swiss knife or a McGuyver paper clip. Just smth that would make your life much happier/easier.

My example is quite simple. I really wish that some sort of Monte Carlo Simulator of Generic Urn Models (population genetics rlz!) just appear in the net, with a nice, clean and well documented API (written in C) and bindings for my favorite scripting languages. That’s what I really miss, right now. What’s your story?

Jarretinha

So go over and walk down memory lane. This is kind of an interesting way to have the sort of institutional memory of a specialist group to look back on, stuff that you don’t necessarily capture in the formal science routes.

Video Tip of the Week: BioLayout Express3D for network visualizations

My previous Video Tip of the Week highlighted the GeneFriends tool. With GeneFriends you can search for co-expression of genes in RNA-seq data sets. But you can take these results further and visualize them with the BioLayout Express3D tool, so I wanted to bring more details about BioLayout in this tip since we haven’t covered it before.

BioLayout isn’t a new tool, it’s been around for some time. The first published report of it appeared in 2001. Their publications page reflects their progress over the years, including a new paper recently put out for open peer review (very nifty, kudos on that). BioLayout keeps getting new features as it is under active development, and it keeps incorporating the key standards like BioPax that are important for interoperability of tools in this space. You can learn more about BioPax and related standards from the The ‘COmputational Modeling in BIology’ NEtwork (COMBINE) site.

This video tip will highlight their overview video to give you a taste of what BioLayout Express can do. But they have a page with more videos that can take you further on understanding and using the features of the software.

There’s a Nature Protocols paper that they produced a few years back that helped me to grasp what they want to accomplish and how to work with BioLayout. Although some of the details will have changed, I like these kinds of papers as a way to approach the concepts of working with the tools, so I’ve included that below as well. You can access it from their publications page.

BioLayout Express can handle very impressive numbers of data sets and the corresponding nodes and edges. Their publications page also offers a look at how some researchers have used their tool to advance their research. I like when tool providers offer these kinds of published examples, it helps to see how people really are using the tools.

Quick links:

BioLayout Express3Dhttp://www.biolayout.org/

References:

Enright, A., & Ouzounis, C. (2001). BioLayout–an automatic graph layout algorithm for similarity visualization Bioinformatics, 17 (9), 853-854 DOI: 10.1093/bioinformatics/17.9.853

Theocharidis A., Stjin van Dongen, Anton J Enright & Tom C Freeman (2009). Network visualization and analysis of gene expression data using BioLayout Express3D, Nature Protocols, 4 (10) 1535-1550. DOI: http://dx.doi.org/10.1038/nprot.2009.177 *cough* access from their publications page…

Wright D.W., Tim Angus, Anton J. Enright & Tom C. Freeman (2014). Visualisation of BioPAX Networks using BioLayout Express3D, F1000Research, DOI: http://dx.doi.org/10.12688/f1000research.5499.1