This week’s SNPpets include transcription factor binding site evolution–with their secret partners transposable elements; PrecisionFDA coming along; bad habits of bioinformaticians; new synthetic biology tools and rock star status; consumer reluctance to share their health data; Russian genomes on the way. And more, including the XKCD on DNA in case you missed it.
Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…
Upshot: subs + indels too slow, inefficient to evolve new TFBS, let alone enhancer.
So: call TEs to the rescue!
This week’s highlighted discussion is a different take on pathway and network tools. This is about the design of novel metabolic pathways in silico, not just exploring existing pathways to look for where your favorite genes play roles.
Although we have talked about COPASI before, in conjuction with GenoCAD, and we’ve done training on STRING, some of the other tools were new to me and I was pleased to learn of them. If you know of others you can offer the suggestions. Go have a look at the discussion.
Although it’s already posted in our news feed, I just wanted to add a reminder about our upcoming webinar on GenoCAD: open-source computer-assisted design software for synthetic biology. You can see the time and registration details here:
If you want to download the slides beforehand (so you don’t have to write everything down), you can access the slides and slide handouts for the Introduction here: www.openhelix.com/GenoCAD
But in case you can’t make it for schedule reasons, you can still access the same materials from our web site on our GenoCAD suite pages. You can watch the video, download slides and handouts, and also try out the accompanying exercises.
There’s also a second tutorial with more advanced topics, and you can watch that video or access associated training materials for that version too: www.openhelix.com/genoCAD2
We’ll also leave this post open for questions from the webinar–if we don’t have time to get to the issues we’ll tackle them here. Or if you go off and try some things yourself, and come up with something else you want to know or get stuck, we’ll get you the answers or guidance.
If you are the type to want some publications to read to better understand the resource, be sure to check out the GenoCAD team’s page with many references that will help you to understand the foundations of the development and features, and their directions going forward.
Reference: Wilson M.L., Hertzberg R., Adam L. & Peccoud J. (2011). A step-by-step introduction to rule-based design of synthetic genetic constructs using GenoCAD., Methods in enzymology, PMID: 21601678
Earlier this year I did a tip on the 3D Virtual Worm browser from the OpenWorm team. At that time, there wasn’t a publication to accompany that post. And since these tools get much more visibility when the posts go to ScienceSeeker and Research Blogging I wanted to revisit this now that there is a publication for this project.
One of the things I keep harping on is how much we need better visualization for the tremendous volumes of data we are going to have. Imagine the day when we can code the series of transcription-factor binding events followed by transcription evidence as cells differentiate into neurons. I can imagine it being just amazing to watch, but also providing insights into basic processes. That was true when the Virtual Cell team recently was coding up the cellular processes. Things they saw in the model helped refine the understanding of the biology, and vice versa. Understanding the initial steps better can also help us to look for ways to enhance tissue repair potentially. We’re a long way from this now, but some teams are working their way towards these strategies. It’s interesting to see what the data requirements are, and how crucial public data sets are for these projects. And projects like modENCODE will offer some of the necessary data foundations for that for worm, and I can see ENCODE and later projects doing that for virtual tissue culture cells someday.
In this paper the describe the work that has brought them to the point we find ourselves with the virtual worm. The schematic diagrams and the subsequent worm images are really compelling to look at and consider. And to note the crucial underlying benchwork studies that enable this to happen is important. Sometimes there’s tension between the bench and the computational perspectives on projects like “virtual” cells or organisms. But there’s undoubtedly room for both.
Here’s an earlier prototype of the features of this work from this team. It gives you a better sense of the innards of the process:
Certainly we are in an early phase of the work. But it’s important to head there, and it’s nice to see people thinking through the issues and working it out. And the worm 3D browser will continue to improve as groups like this team bring their work to the project.
Palyanov, A., Khayrulin, S., Larson, S.D., & Dibert, A. (2012). Towards a virtual C. elegans: A framework for simulation and visualization of the neuromuscular system in a 3D physical environment In Silico Biology, 11 (3), 137-147 DOI: 10.3233/ISB-2012-0445
But there have been folks who have been building and working on supporting software for this field for quite a while, and I wanted to explore one of those tools today. For this week’s tip we’ll explore GenoCAD. Computer-aided Design (CAD) for genomics will be necessary at some point–you’ll be able to mock up some system, enter some parameters, and test out the equations you’ve established. But as the recent paper showed–the foundation of this has to be extensive amounts of benchwork and existing information from publications and databases. But that information has to be assembled, corralled, and coded in new ways to make analysis and predictions possible.
GenoCAD is an open-source computer-assisted-design (CAD) application for synthetic biology. The foundation of GenoCAD is to consider DNA as a language to program synthetic biological systems. GenoCAD includes a large database of annotated genetic parts which are the words of the language. GenoCAD also includes design rules describing how parts should be combined in genetic constructs. These rules are used to build a wizard that guides users through the process of designing complex genetic constructs and artificial gene networks.
You’ll find it has a nice organization with 3 major steps: you can create and select the parts you want to use in step 1. In step 2 you assemble them into a construct. And if you want to take it further and model the activity of this item you can do that as a third step.
With the GenoCAD options you can select from parts lists that exist or upload more (including BioBricks and items from their Parts Registry) and use existing grammar rules for organizing your parts, but you aren’t limited to those. You an add your own parts and create your own grammar rules to accompany your components. You can quickly see how flexible and easy it is to construct these features. You can save them with an account there, and build on them on the future.
In this tip we’ll only have time to examine the basic features of the interface, but you can go back and explore it yourself with further details from their help and documentation. There are also a number of helpful publications that I’ve linked below in the references sections.
There are related tools that were recently explored in an article in The Scientist: Move Over, Mother Nature. GenoCAD is one of the featured tools, but you can read about others as well. Soon it will no longer be sufficient to use the software to find existing information (although that will still be crucial), but you’ll be modeling your ideas with some software to help refine and extend your research too. One of the really fascinating aspects of that Whole Cell Simulation paper recently was how there were cases where the models weren’t reflecting the biology, and the refinement of the models and a better understanding of the biology came out of that.
Start modeling your favorite biological systems in silico. Or at least start thinking about it.
Wilson ML, Hertzberg R, Adam L, & Peccoud J. (2011). A step-by-step introduction to rule-based design of synthetic genetic constructs using GenoCAD. Methods Enzymol. , 498, 173-188 DOI: 10.1016/B978-0-12-385120-8.00008-5
Czar, M.J., Cai, Y. & Peccoud, J. (2009). Writing DNA with GenoCADTM, Nucleic Acids Research, 37 (Web Server) W47. DOI: 10.1093/nar/gkp361
Cai, Y., Wilson, M.L. & Peccoud, J. (2010). GenoCAD for iGEM: a grammatical approach to the design of standard-compliant constructs, Nucleic Acids Research, 38 (8) 2644. DOI: 10.1093/nar/gkq086
My twitter feed exploded over the last few days with conflama around the new publication about the simulation of the biological processes in a cell. Most of the ire was aimed at a fawning NYT piece that hyped the paper. And yes–the newspaper article is flawed; Jonathan Eisen spoke to that, and then filleted it in 7 ways (1, 2, 3, 4, 5, 6, 7) and created a Storify about related chatter. Some of the jokes were funny too–my favorite was a future forum help request:
RT @pathogenomenick: Synthanswers.com 2035: Hello I am trying to run M. genitalium on my PC but it doesn’t compile, please help.
That said, I actually think some of this is unfair, and I hope it doesn’t distract from the actual work that was done. Lead author Jonathan Karr and the folks from JCVI and Stanford have accomplished some important and necessary work. Modeling various features of whole cell activities, and developing suitable software to assess, test, predict, and visualize these aspects is going to be hugely important as we continue to move beyond the important–but largely represented as linear–genome sequence data. There have been some pathway and modeling tools that I have explored in the past and are also nice directions. (My previous favorite was CellDesigner, and I know PathCase has modeling tools, but there have been a lot of others as well–but these have been aimed mostly at individual pathways and processes, subsets of whole cells.)
But this new paper goes further and loads their software with a number of cellular processes that you can then run and visualize. An accompanying commentary paper describes it as:
It is in this spirit that the authors have produced a first-draft framework for asking and answering systems-level questions by using quantitative cell-scale models.
For this tip I’ll show you one of the movies they generate with this software, and direct you to the project pages for more details and access to the software and data sets. And you can set up and run your own simulations–but that’s beyond the scope of my usual tips, so you’ll have to explore that yourself.
They use 28 different models of cellular process activities–some are suited for metabolic pathways, some for DNA replication, RNA processing, protein decay, and various other things. The supplement is really key here–S1 has the models, the math, and the other crucial things you need to have to understand what was done. Don’t forget to download that if you buy this paper!
The amount of collection, collation, curation, and synthesis required to organize and put these models in the supplement is really impressive–nevermind the software. I actually had to look back at the author list by p. 88 of the supplement, and I still can’t figure out why there aren’t several dozen more people on this paper. Of course, they stood on the shoulders of hundreds of others: “Our model is based on a synthesis of over 900 publications and includes more than 1,900 experimentally observed parameters.” And all that stuff in the databases–more than two dozen databases required by my current count.
Once you have the system programmed, you can then do simulated gene knockouts. When they did this they found that they had 79% correspondence with previous observations in vivo for viability. They could also use these types of studies in growth-rate comparisons. Those data were interesting especially when they didn’t match the models: the models are challenged, the biology further investigated, and you could determine where refinements would be needed to account for properties that weren’t already part of the configuration.
The movies they provide are cool, but almost too much really. I was dreaming of a dashboard where I could slow pieces down and incrementally move along. I would also sort of like to have a Caleydo-style multi-panel view where I could focus on one or two in a couple of panels, with another panel of gene or structure or something.
I had a flash of the future of publication with this too… Some day a microbial species or tissue/cell type will be characterized in a number of ways, and a little video will accompany that paper that you will need to understand to evaluate their data. But that video will require copious amounts of data underneath, quality controlled, and curated. I was reading the supplemental information (S1, yes, the 122 page supplement that you also need to evaluate the 28 models that they used) and they acknowledge that they relied on: CMR, BioCyc, KEGG, NCBI, and UniProt for annotations, and a lot of other databases for different aspects as well (oh, and two pages of 3rd party software). These need to exist and to be correct–this new software is not swooping down and saving us from this type of important work. But–that said, the new data won’t be in the papers anymore, and it will require even more than a browser (but will probably also require a browser still anyway). And you will need to know how the software works to evaluate the biology effectively.
Some people are criticizing this work because it’s not really the first, and it’s not complete (only 28 modules), and the commentary paper by Freddolino and Tavazoie speak to that too:
As the authors themselves acknowledge, the present model must be seen as a first draft, more important as a starting point for future refinement than as a productive model in its own right.
We can get to the future from here, but it won’t be a straight line and sometimes having a dartboard is a good way to start. So throw darts, it is what we do in science–but I hope some of them will be constructively aimed.
An exciting new article just published in the journal Cell describes an integrated effort to simulate the inner workings of a cell. This Friday, +Stephen Larson will walk through the paper on a Google+ Hangout and explain its implications. Please join us for a discussion on this interesting new development.
I’m going to try to attend this as well. The host of that–Stephen of OpenWorm–was part of a group I encountered when I talked about that 3D-worm software not too long ago. Video Tip of the Week: A 3D Worm! Because of the excellent resources around worm biology and development they are going to be well ahead of other communities in modeling. And note about the journal club: the paper’s lead author saw the invitation and is planning to attend–and has provided links to the papers and supplement over there if you want access and don’t have it now.
Karr, J., Sanghvi, J., Macklin, D., Gutschow, M., Jacobs, J., Bolival, B., Assad-Garcia, N., Glass, J. & Covert, M. (2012). A Whole-Cell Computational Model Predicts Phenotype from Genotype, Cell, 150 (2) 401. DOI: 10.1016/j.cell.2012.05.044