Tip of the Week: Whole Cell Simulation software

My twitter feed exploded over the last few days with conflama around the new publication about the simulation of the biological processes in a cell. Most of the ire was aimed at a fawning NYT piece that hyped the paper. And yes–the newspaper article is flawed; Jonathan Eisen spoke to that, and then filleted it in 7 ways (1, 2, 3, 4, 5, 6, 7) and created a Storify about related chatter. Some of the jokes were funny too–my favorite was a future forum help request:

RT @pathogenomenick: Synthanswers.com 2035: Hello I am trying to run M. genitalium on my PC but it doesn’t compile, please help.

That said, I actually think some of this is unfair, and I hope it doesn’t distract from the actual work that was done. Lead author Jonathan Karr and the folks from JCVI and Stanford have accomplished some important and necessary work. Modeling various features of whole cell activities, and developing suitable software to assess, test, predict, and visualize these aspects is going to be hugely important as we continue to move beyond the important–but largely represented as linear–genome sequence data. There have been some pathway and modeling tools that I have explored in the past and are also nice directions. (My previous favorite was CellDesigner, and I know PathCase has modeling tools, but there have been a lot of others as well–but these have been aimed mostly at individual pathways and processes, subsets of whole cells.)

But this new paper goes further and loads their software with a number of cellular processes that you can then run and visualize. An accompanying commentary paper describes it as:

It is in this spirit that the authors have produced a first-draft framework for asking and answering systems-level questions by using quantitative cell-scale models.

For this tip I’ll show you one of the movies they generate with this software, and direct you to the project pages for more details and access to the software and data sets. And you can set up and run your own simulations–but that’s beyond the scope of my usual tips, so you’ll have to explore that yourself.

They use 28 different models of cellular process activities–some are suited for metabolic pathways, some for DNA replication, RNA processing, protein decay, and various other things. The supplement is really key here–S1 has the models, the math, and the other crucial things you need to have to understand what was done. Don’t forget to download that if you buy this paper!

The amount of collection, collation, curation, and synthesis required to organize and put these models in the supplement is really impressive–nevermind the software. I actually had to look back at the author list by p. 88 of the supplement, and I still can’t figure out why there aren’t several dozen more people on this paper. Of course, they stood on the shoulders of hundreds of others: “Our model is based on a synthesis of over 900 publications and includes more than 1,900 experimentally observed parameters.” And all that stuff in the databases–more than two dozen databases required by my current count.

 

Once you have the system programmed, you can then do simulated gene knockouts. When they did this they found that they had 79% correspondence with previous observations in vivo for viability. They could also use these types of studies in growth-rate comparisons. Those data were interesting especially when they didn’t match the models: the models are challenged, the biology further investigated, and you could determine where refinements would be needed to account for properties that weren’t already part of the configuration.

The movies they provide are cool, but almost too much really. I was dreaming of a dashboard where I could slow pieces down and incrementally move along. I would also sort of like to have a Caleydo-style multi-panel view where I could focus on one or two in a couple of panels, with another panel of gene or structure or something.

I had a flash of the future of publication with this too… Some day a microbial species or tissue/cell type will be characterized in a number of ways, and a little video will accompany that paper that you will need to understand to evaluate their data. But that video will require copious amounts of data underneath, quality controlled, and curated. I was reading the supplemental information (S1, yes, the 122 page supplement that you also need to evaluate the 28 models that they used) and they acknowledge that they relied on: CMR, BioCyc, KEGG, NCBI, and UniProt for annotations, and a lot of other databases for different aspects as well (oh, and two pages of 3rd party software). These need to exist and to be correct–this new software is not swooping down and saving us from this type of important work. But–that said, the new data won’t be in the papers anymore, and it will require even more than a browser (but will probably also require a browser still anyway). And you will need to know how the software works to evaluate the biology effectively.

Some people are criticizing this work because it’s not really the first, and it’s not complete (only 28 modules), and the commentary paper by Freddolino and Tavazoie speak to that too:

As the authors themselves acknowledge, the present model must be seen as a first draft, more important as a starting point for future refinement than as a productive model in its own right.

We can get to the future from here, but it won’t be a straight line and sometimes having a dartboard is a good way to start. So throw darts, it is what we do in science–but I hope some of them will be constructively aimed.

If you want to explore the paper further, there’s also going to be a virtual journal club hosted on Google+ to discuss the paper.

An exciting new article just published in the journal Cell describes an integrated effort to simulate the inner workings of a cell.  This Friday, +Stephen Larson will walk through the paper on a Google+ Hangout and explain its implications.  Please join us for a discussion on this interesting new development.

I’m going to try to attend this as well. The host of that–Stephen of OpenWorm–was part of a group I encountered when I talked about that 3D-worm software not too long ago. Video Tip of the Week: A 3D Worm! Because of the excellent resources around worm biology and development they are going to be well ahead of other communities in modeling. And note about the journal club: the paper’s lead author saw the invitation and is planning to attend–and has provided links to the papers and supplement over there if you want access and don’t have it now.

Quick links:

Paper about the simulation: A Whole-Cell Computational Model Predicts Phenotype from Genotype

Commentary paper in same issue: The Dawn of Virtual Cell Biology

Software available here: https://simtk.org/home/wholecell

Stanford team project site: M. genitalium Whole-Cell Model & Knowledge Base

References:

Karr, J., Sanghvi, J., Macklin, D., Gutschow, M., Jacobs, J., Bolival, B., Assad-Garcia, N., Glass, J. & Covert, M. (2012). A Whole-Cell Computational Model Predicts Phenotype from Genotype, Cell, 150 (2) 401. DOI: 10.1016/j.cell.2012.05.044

Freddolino, P. & Tavazoie, S. (2012). The Dawn of Virtual Cell Biology, Cell, 150 (2) 250. DOI: 10.1016/j.cell.2012.07.001