Friday SNPpets

This week’s SNPpets includes updates to existing tools, like Reactome and IMG, the biggest JBrowse display I ever saw, various tidbits from #ASHG15 as word clouds, gene editing, slices through personalized medicine and rare diseases, and a Virtual Machine catalog. Most unusual thing this week: recent evolutionary genomics work gets onto The Big Bang Theory. And more….

During the week we come across a lot of links and reads that we think are interesting, but don't make it to a blog post. Here they are for your enjoyment…

If the survey is closed by the time you come across this, see the answers (and the twitter discussion thread is hilarious)


What’s the Answer? (novel pathways in silico)

reddit_iconThis week’s highlighted discussion is a different take on pathway and network tools. This is about the design of novel metabolic pathways in silico, not just exploring existing pathways to look for where your favorite genes play roles.

reddit question iconIs there free software available for designing metabolic pathways? (self.bioinformatics)

submitted  by learnedidiot

I am interested in designing novel metabolic pathways in silico, and was curious if there is a good software packages that allows for finding the shortest path in terms of enzymes, regardless of species, for a given input to final output?

Although we have talked about COPASI before, in conjuction with GenoCAD, and we’ve done training on STRING, some of the other tools were new to me and I was pleased to learn of them. If you know of others you can offer the suggestions. Go have a look at the discussion.

What’s The Answer? (color genes in a pathway)

BioStar is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the community and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

This question was started a while ago, but it had a recent answer that floated it back up. But it’s a question that people ask a lot as they are dealing with big lists of genes from various experiments or mining exercises. And lately I was highlighting some pathway stuff at BioStar recently so I thought this would be a nice companion piece.

Question: Color specific proteins/genes on a signaling pathway

I have a list of genes/protiens. I would like to color my list of proteins/genes on a signaling pathway. I know KEGG has a tool for doing this (the search and color pathway map). Is they any other alternative that is publicly available? If not, how can I accomplish this task (any pointer or heads up). Thanks


I knew about the KEGG method, but really need to look harder at Biotapestry. Check out the answers to see more on that.


Video Tip of the Week: GenoCAD for Synthetic Biology

The field of synthetic biology has been simmering for quite a while. It occasionally takes a big leap, such as when Venter’s team published about their work on M. genitalium, and it took a big leap recently with the paper about modeling a lot of the cellular processes in a simple cell that I talked about, and in the virtual journal club we had soon after on Google+ .

But there have been folks who have been building and working on supporting software for this field for quite a while, and I wanted to explore one of those tools today. For this week’s tip we’ll explore GenoCAD. Computer-aided Design (CAD) for genomics will be necessary at some point–you’ll be able to mock up some system, enter some parameters, and test out the equations you’ve established. But as the recent paper showed–the foundation of this has to be extensive amounts of benchwork and existing information from publications and databases. But that information has to be assembled, corralled, and coded in new ways to make analysis and predictions possible.

GenoCAD will let you begin to construct and test cellular activities of this type. Here’s how they describe themselves:

GenoCAD is an open-source computer-assisted-design (CAD) application for synthetic biology. The foundation of GenoCAD is to consider DNA as a language to program synthetic biological systems. GenoCAD includes a large database of annotated genetic parts which are the words of the language. GenoCAD also includes design rules describing how parts should be combined in genetic constructs. These rules are used to build a wizard that guides users through the process of designing complex genetic constructs and artificial gene networks.

You’ll find it has a nice organization with 3 major steps: you can create and select the parts you want to use in step 1. In step 2 you assemble them into a construct. And if you want to take it further and model the activity of this item you can do that as a third step.

With the GenoCAD options you can select from parts lists that exist or upload more (including BioBricks and items from their Parts Registry) and use existing grammar rules for organizing your parts, but you aren’t limited to those. You an add your own parts and create your own grammar rules to accompany your components. You can quickly see how flexible and easy it is to construct these features. You can save them with an account there, and build on them on the future.

In this tip we’ll only have time to examine the basic features of the interface, but you can go back and explore it yourself with further details from their help and documentation. There are also a number of helpful publications that I’ve linked below in the references sections.

There are related tools that were recently explored in an article in The Scientist: Move Over, Mother Nature. GenoCAD is one of the featured tools, but you can read about others as well. Soon it will no longer be sufficient to use the software to find existing information (although that will still be crucial), but you’ll be modeling your ideas with some software to help refine and extend your research too. One of the really fascinating aspects of that Whole Cell Simulation paper recently was how there were cases where the models weren’t reflecting the biology, and the  refinement of the models and a better understanding of the biology came out of that.

Start modeling your favorite biological systems in silico.  Or at least start thinking about it.

Quick links:

GenoCAD site:  http://www.genocad.org/

Handy new collection of PLoS papers on synthetic biology: http://blogs.plos.org/everyone/2012/08/15/plos-one-launches-synthetic-biology-collection/

Biology’s Master Programmers: http://www.technologyreview.com/featured-story/428187/biologys-master-programmers/

Move Over, Mother Nature: http://the-scientist.com/2012/07/01/move-over-mother-nature/


Tip of the Week: Whole Cell Simulation software

My twitter feed exploded over the last few days with conflama around the new publication about the simulation of the biological processes in a cell. Most of the ire was aimed at a fawning NYT piece that hyped the paper. And yes–the newspaper article is flawed; Jonathan Eisen spoke to that, and then filleted it in 7 ways (1, 2, 3, 4, 5, 6, 7) and created a Storify about related chatter. Some of the jokes were funny too–my favorite was a future forum help request:

RT @pathogenomenick: Synthanswers.com 2035: Hello I am trying to run M. genitalium on my PC but it doesn’t compile, please help.

That said, I actually think some of this is unfair, and I hope it doesn’t distract from the actual work that was done. Lead author Jonathan Karr and the folks from JCVI and Stanford have accomplished some important and necessary work. Modeling various features of whole cell activities, and developing suitable software to assess, test, predict, and visualize these aspects is going to be hugely important as we continue to move beyond the important–but largely represented as linear–genome sequence data. There have been some pathway and modeling tools that I have explored in the past and are also nice directions. (My previous favorite was CellDesigner, and I know PathCase has modeling tools, but there have been a lot of others as well–but these have been aimed mostly at individual pathways and processes, subsets of whole cells.)

But this new paper goes further and loads their software with a number of cellular processes that you can then run and visualize. An accompanying commentary paper describes it as:

It is in this spirit that the authors have produced a first-draft framework for asking and answering systems-level questions by using quantitative cell-scale models.

For this tip I’ll show you one of the movies they generate with this software, and direct you to the project pages for more details and access to the software and data sets. And you can set up and run your own simulations–but that’s beyond the scope of my usual tips, so you’ll have to explore that yourself.

They use 28 different models of cellular process activities–some are suited for metabolic pathways, some for DNA replication, RNA processing, protein decay, and various other things. The supplement is really key here–S1 has the models, the math, and the other crucial things you need to have to understand what was done. Don’t forget to download that if you buy this paper!

The amount of collection, collation, curation, and synthesis required to organize and put these models in the supplement is really impressive–nevermind the software. I actually had to look back at the author list by p. 88 of the supplement, and I still can’t figure out why there aren’t several dozen more people on this paper. Of course, they stood on the shoulders of hundreds of others: “Our model is based on a synthesis of over 900 publications and includes more than 1,900 experimentally observed parameters.” And all that stuff in the databases–more than two dozen databases required by my current count.


Once you have the system programmed, you can then do simulated gene knockouts. When they did this they found that they had 79% correspondence with previous observations in vivo for viability. They could also use these types of studies in growth-rate comparisons. Those data were interesting especially when they didn’t match the models: the models are challenged, the biology further investigated, and you could determine where refinements would be needed to account for properties that weren’t already part of the configuration.

The movies they provide are cool, but almost too much really. I was dreaming of a dashboard where I could slow pieces down and incrementally move along. I would also sort of like to have a Caleydo-style multi-panel view where I could focus on one or two in a couple of panels, with another panel of gene or structure or something.

I had a flash of the future of publication with this too… Some day a microbial species or tissue/cell type will be characterized in a number of ways, and a little video will accompany that paper that you will need to understand to evaluate their data. But that video will require copious amounts of data underneath, quality controlled, and curated. I was reading the supplemental information (S1, yes, the 122 page supplement that you also need to evaluate the 28 models that they used) and they acknowledge that they relied on: CMR, BioCyc, KEGG, NCBI, and UniProt for annotations, and a lot of other databases for different aspects as well (oh, and two pages of 3rd party software). These need to exist and to be correct–this new software is not swooping down and saving us from this type of important work. But–that said, the new data won’t be in the papers anymore, and it will require even more than a browser (but will probably also require a browser still anyway). And you will need to know how the software works to evaluate the biology effectively.

Some people are criticizing this work because it’s not really the first, and it’s not complete (only 28 modules), and the commentary paper by Freddolino and Tavazoie speak to that too:

As the authors themselves acknowledge, the present model must be seen as a first draft, more important as a starting point for future refinement than as a productive model in its own right.

We can get to the future from here, but it won’t be a straight line and sometimes having a dartboard is a good way to start. So throw darts, it is what we do in science–but I hope some of them will be constructively aimed.

If you want to explore the paper further, there’s also going to be a virtual journal club hosted on Google+ to discuss the paper.

An exciting new article just published in the journal Cell describes an integrated effort to simulate the inner workings of a cell.  This Friday, +Stephen Larson will walk through the paper on a Google+ Hangout and explain its implications.  Please join us for a discussion on this interesting new development.

I’m going to try to attend this as well. The host of that–Stephen of OpenWorm–was part of a group I encountered when I talked about that 3D-worm software not too long ago. Video Tip of the Week: A 3D Worm! Because of the excellent resources around worm biology and development they are going to be well ahead of other communities in modeling. And note about the journal club: the paper’s lead author saw the invitation and is planning to attend–and has provided links to the papers and supplement over there if you want access and don’t have it now.

Quick links:

Paper about the simulation: A Whole-Cell Computational Model Predicts Phenotype from Genotype

Commentary paper in same issue: The Dawn of Virtual Cell Biology

Software available here: https://simtk.org/home/wholecell

Stanford team project site: M. genitalium Whole-Cell Model & Knowledge Base


What’s the Answers? (build gene networks from literature)

BioStar is a site for asking, answering and discussing bioinformatics questions. We are members of thecommunity and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those questions and answers here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

Recently someone came by to ask about ways to extract useful information from the literature and construct a network from that. Respondents offered helpful software–but also nice guidance on thinking about how the networks need to be thought about in an organism:

Question: Building gene regulatory networks from literature

Hi everyone!!

Does anyone know of any available software that would build gene regulatory networks from literature alone?



The first answer offered some software to consider, but the second answer (at least as currently ordered) provided thoughtful commentary on aspects of this to consider beyond the literature.  I thought that was a nice balance. Check out the discussion.

Friday SNPpets

During the week we come across a lot of links and reads that we think are interesting, but don't make it to a blog post. Here they are for your enjoyment…

  • This is not a bad idea. Boston Sci-Geek Tours. I used to work for the Park Service. Hmmm… RT @YishaiKnobel: Fascinating tour and lecture on genomics at Broad Institute today.  Broad should be turned into a Boston tourist attraction. [Mary]
  • RT @BioCatalogue: The BioCatalogue iPhone/iPad app by @manniet3 is now out! http://bit.ly/p2tUQF Please do let us know what you think. [Mary]
  • Includes KEGG, iPATH 2.o, PathwayProjector, metaSHARK, MEGAN 4,  HUMAnN: RT @phylogenomics: A Survey of Metabolic Reconstruction Tools for Metagenomic Datasets » The Bioinformatics Knowledgeblog http://shar.es/HK0U5 [Mary]
  • A Special Symposium Celebrating the 40th Anniversary of the Protein Data Bank, October 28 – 30, 2011, Cold Spring Harbor, NY – Poster Abstract deadline: August 15 [Jennifer]
  • Oh yes, plz: RT @nutrigenomics: like RT @grapealope: Bioinformatics is not just about building tools. We know our tools; we should use them first. @atulbutte #singularityu [Mary]
  • Giggle. I’ve done this with several species, not just plants… : @pcronald: Botanist holds up the entire salad bar. http://onion.com/pojv2t [Mary]
  • RT @wahwahnyc: On PubMed Central: The PathOlogist: an automated tool for pathway-centric analysis. BMC Bioinformatics. http://1.usa.gov/oDyBpc [Mary]
  • Looking for some geeky fun? The Twenty-First 1st Annual Ig Nobel Prize ceremony will occur Thursday, September 29, 2011 and tickets are on sale now. Note: “The Ig Nobel Prizes honor achievements that first make people laugh, and then make them think.” [Jennifer]
  • RT @genetics_blog: . @PLoS and @mendeley_com Call for Apps: http://bit.ly/oc2NGL and http://bit.ly/nHYqNa [Mary]
  • Just saw this in Nature News about Google and Microsoft: Computing giants launch free science metrics [Jennifer]
  • FameLab, a science and engineering communication competition – I haven’t seen an uninteresting one yet… [Jennifer]

Reactome Webinar coming up; Wed Feb 2

We were on the road last week doing workshops, so this is a few days old now. But if you aren’t on the GO Friends mailing list it’s possible it’s new for you. A quick word about GO Friends list: because so many tools rely on Gene Ontology and have some kind of GO components, there are quite a range of things that come over that mailing list. It’s not just for GO developers per se. You might want to check it out.

Anyway, what I wanted to focus on today is this notice about the upcoming Reactome webinar. There have  been BIG changes to the interface, but the underlying coolness and high quality of all those biological pathways remains intact, of course! Reactome is a tool we have loved for a long time, and we’ve coordinated with the Reactome folks around the next updates for our tutorial. We’re working on that update now.

If you want to learn more about Reactome and these new changes, there’s going to be a webinar soon. You have to register, and I’ll only give some of the details here. Head over to the GO Friends message link to see the rest.

The Ontario Genomics Institute (OGI) and the Ontario Institute for Cancer Research (OICR) are co-hosting a one hour web conference/webinar about the Reactome Pathway Database (http://www.reactome.org) – a freely available, manually-curated resource of core biological pathways. The Reactome database offers pathway data encapsulating areas of human biology ranging from basic pathways of metabolism to complex events such as GPCR signaling and apoptosis.

This follow up webinar will introduce the updated website with a more intuitive user interface and a new suite of data analysis tools. Learn to use this database through case studies from various research groups.

The presentation will be given by Dr. Robin Haw, Manager of Reactome Outreach, OICR, and will cover how to use the updated Reactome resource for:

• Browsing and searching pathway knowledge,
• Integrating network and pathway data,
• Using Pathway and Expression Analysis tools to analyze experimental datasets,
• Annotating experimental datasets with Reactome BioMart,
• Discovering network patterns related to cancer and other diseases using the Reactome Functional Interaction Network Cytoscape plug-in,
• Introducing use cases for Reactome data and analysis tools.


Go to the link for the registration details. I’ll be listening in (if we don’t schedule a workshop for that day!)

Friday SNPpets

During the week we come across a lot of links and reads that we think are interesting, but don't make it to a blog post. Here they are for your enjoyment…

Cytoscape releases v2.7

Just got an announcement from the Cytoscape mailing list.  One of my favorite tutorials* that we’ve developed was Cytoscape–It was an interesting challenge for us because most of our focus has been tools with web interfaces and we had to address installation with this one.  But it was such a nice tool with great features that I really simply enjoyed working with it.

Anyway, here’s what’s new:

Hi everyone,

The Cytoscape team is proud to announce the release of Cytoscape 2.7!  This release includes some exciting new features:

* Nested Networks: A node may now have a reference to another Network, which allows us to capture the relationships between networks in networks themselves.

* CyCommandHandlers: Addition of a mechanism to the core to provide inter-plugin communication.

* New Edge Types: Several new edge types between solid and dashed have been added.

* Newlines and list editing in attribute browser: The attribute browser has been updated to allow newline characters to be added by pressing the “Enter” key. List editing is now also enabled.

* Automatic label wrap: A new visual property has been added that sets the width of a label. Any label extending beyond this width will be automatically wrapped.

* Arrow color optionally locked to edge color: Arrow color may now be bound to the edge color by checking a box in the Dependencies pane of the Default Appearance Browser in the VizMapper, which avoids the necessity of creating separate-yet-identical mappings for edge, source, and target arrows.

* BioPAX Level 3 support.

You can get the release here: http://cytoscape.org

Please let us know if you have any questions!

The Cytoscape Team

*The Cytoscape tutorial we have in part of our subscription package. For freely available sponsored tutorials you can click here.