I’ve been following the development of KBase for a long time. A couple of times I started draft blog posts to highlight them. But it always seemed like they were just about ready to launch some new features–so I figured I’d wait–or they had a beta interface in testing and about to change a lot of things. So I back-burnered my posts.
But I was notified of a new video on their YouTube channel recently, so I decided to have another look. It appears that there is a new drive to show folks what they have, and there is now a lot of mature documentation and outreach. So it seemed to be time to have another look. Their new intro is this week’s Video Tip of the Week.
As they describe themselves on their homepage:
KBase is an open platform for comparative functional genomics and systems biology for microbes, plants and their communities, and for sharing results and methods with other scientists.
KBase’s ultimate goal is to enable systems biology researchers to predict and even design biological function.
They have a framework that offers a large number of data sources, but they stress that they aren’t just a database. Their landing page notes that at this time they have “23,000 plant and microbial genomes and over 15,000 metagenomic datasets”.
KBase is not just a database–it includes tools and user interfaces as well as data. We have integrated data from a wide range of public resources. It is also possible for users to upload their own data and decide who to share it with.
But in one of the most definitive warnings I’ve seen ever on genomics software, there’s this part of the terms and conditions:
I was worried what would happen if I ran an analysis on a human protein by accident. A lot of my go-to examples are things I know well, and that have lots of annotation around them because they are medically relevant human examples. But this scares me off a bit. Jes’ sayin’.
Anyway, have a look at the KBase setup. And if you have non-human analyses to do, try out some of those tools. Look for more help on their metabolic modeling tools if that’s something else you might find useful, there’s a video specifically on that feature as well.
Benedict, M., Mundy, M., Henry, C., Chia, N., & Price, N. (2014). Likelihood-Based Gene Annotations for Gap Filling and Quality Assessment in Genome-Scale Metabolic Models PLoS Computational Biology, 10 (10) DOI: 10.1371/journal.pcbi.1003882
Wilke, A., Bischof, J., Harrison, T., Brettin, T., D’Souza, M., Gerlach, W., Matthews, H., Paczian, T., Wilkening, J., Glass, E., Desai, N., & Meyer, F. (2015). A RESTful API for Accessing Microbial Community Data for MG-RAST PLoS Computational Biology, 11 (1) DOI: 10.1371/journal.pcbi.1004008
My twitter feed exploded over the last few days with conflama around the new publication about the simulation of the biological processes in a cell. Most of the ire was aimed at a fawning NYT piece that hyped the paper. And yes–the newspaper article is flawed; Jonathan Eisen spoke to that, and then filleted it in 7 ways (1, 2, 3, 4, 5, 6, 7) and created a Storify about related chatter. Some of the jokes were funny too–my favorite was a future forum help request:
RT @pathogenomenick: Synthanswers.com 2035: Hello I am trying to run M. genitalium on my PC but it doesn’t compile, please help.
That said, I actually think some of this is unfair, and I hope it doesn’t distract from the actual work that was done. Lead author Jonathan Karr and the folks from JCVI and Stanford have accomplished some important and necessary work. Modeling various features of whole cell activities, and developing suitable software to assess, test, predict, and visualize these aspects is going to be hugely important as we continue to move beyond the important–but largely represented as linear–genome sequence data. There have been some pathway and modeling tools that I have explored in the past and are also nice directions. (My previous favorite was CellDesigner, and I know PathCase has modeling tools, but there have been a lot of others as well–but these have been aimed mostly at individual pathways and processes, subsets of whole cells.)
But this new paper goes further and loads their software with a number of cellular processes that you can then run and visualize. An accompanying commentary paper describes it as:
It is in this spirit that the authors have produced a first-draft framework for asking and answering systems-level questions by using quantitative cell-scale models.
For this tip I’ll show you one of the movies they generate with this software, and direct you to the project pages for more details and access to the software and data sets. And you can set up and run your own simulations–but that’s beyond the scope of my usual tips, so you’ll have to explore that yourself.
They use 28 different models of cellular process activities–some are suited for metabolic pathways, some for DNA replication, RNA processing, protein decay, and various other things. The supplement is really key here–S1 has the models, the math, and the other crucial things you need to have to understand what was done. Don’t forget to download that if you buy this paper!
The amount of collection, collation, curation, and synthesis required to organize and put these models in the supplement is really impressive–nevermind the software. I actually had to look back at the author list by p. 88 of the supplement, and I still can’t figure out why there aren’t several dozen more people on this paper. Of course, they stood on the shoulders of hundreds of others: “Our model is based on a synthesis of over 900 publications and includes more than 1,900 experimentally observed parameters.” And all that stuff in the databases–more than two dozen databases required by my current count.
Once you have the system programmed, you can then do simulated gene knockouts. When they did this they found that they had 79% correspondence with previous observations in vivo for viability. They could also use these types of studies in growth-rate comparisons. Those data were interesting especially when they didn’t match the models: the models are challenged, the biology further investigated, and you could determine where refinements would be needed to account for properties that weren’t already part of the configuration.
The movies they provide are cool, but almost too much really. I was dreaming of a dashboard where I could slow pieces down and incrementally move along. I would also sort of like to have a Caleydo-style multi-panel view where I could focus on one or two in a couple of panels, with another panel of gene or structure or something.
I had a flash of the future of publication with this too… Some day a microbial species or tissue/cell type will be characterized in a number of ways, and a little video will accompany that paper that you will need to understand to evaluate their data. But that video will require copious amounts of data underneath, quality controlled, and curated. I was reading the supplemental information (S1, yes, the 122 page supplement that you also need to evaluate the 28 models that they used) and they acknowledge that they relied on: CMR, BioCyc, KEGG, NCBI, and UniProt for annotations, and a lot of other databases for different aspects as well (oh, and two pages of 3rd party software). These need to exist and to be correct–this new software is not swooping down and saving us from this type of important work. But–that said, the new data won’t be in the papers anymore, and it will require even more than a browser (but will probably also require a browser still anyway). And you will need to know how the software works to evaluate the biology effectively.
Some people are criticizing this work because it’s not really the first, and it’s not complete (only 28 modules), and the commentary paper by Freddolino and Tavazoie speak to that too:
As the authors themselves acknowledge, the present model must be seen as a first draft, more important as a starting point for future refinement than as a productive model in its own right.
We can get to the future from here, but it won’t be a straight line and sometimes having a dartboard is a good way to start. So throw darts, it is what we do in science–but I hope some of them will be constructively aimed.
An exciting new article just published in the journal Cell describes an integrated effort to simulate the inner workings of a cell. This Friday, +Stephen Larson will walk through the paper on a Google+ Hangout and explain its implications. Please join us for a discussion on this interesting new development.
I’m going to try to attend this as well. The host of that–Stephen of OpenWorm–was part of a group I encountered when I talked about that 3D-worm software not too long ago. Video Tip of the Week: A 3D Worm! Because of the excellent resources around worm biology and development they are going to be well ahead of other communities in modeling. And note about the journal club: the paper’s lead author saw the invitation and is planning to attend–and has provided links to the papers and supplement over there if you want access and don’t have it now.
Karr, J., Sanghvi, J., Macklin, D., Gutschow, M., Jacobs, J., Bolival, B., Assad-Garcia, N., Glass, J. & Covert, M. (2012). A Whole-Cell Computational Model Predicts Phenotype from Genotype, Cell, 150 (2) 401. DOI: 10.1016/j.cell.2012.05.044
Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…
Shout out to Daniel MacArthur and Paul D. who sought us out at ASHG to say hello. It was great to meet you guys. [Mary]
While I was away last week an update email came from DGV–Database of Genome Variants. They alert us to this: “The Database of Genomic Variants (DGV) has just been updated, with data added from four new studies. A total of 12,497 new variants have been added, including data from the HapMap III Consortium.” And you can learn more in their newsletter (a PDF). [Mary]
We spend a lot of time exploring genomic data, variations, and annotations. But of course a linear perspective on the genes and sequences is not the only way to examine the data. Understanding the pathways in which genes and molecular entities interact is crucial to understanding systems biology.
There are a number of tools which can help you to visualize and explore this kind of data. KEGG is one of the most venerable tools in bioinformatics, BioCyc is well known and used, Reactome is one of our favorites. Recently NCBI BioSystems has come along, and the BioModels tool at EBI provides more data of this type as well. Pathway Interaction Database is another place to try. What you’ll find is that each one has different emphasis, species focus or data sets available, and different tools to use to graphically display the databases. The ways to customize or interact with the data will vary as well. So you may need to try several to find the one you want for your purposes.
But for today’s tip of the week I will highlight PathCase, a Pathways Database System from Case Western Reserve University. This is a tool I’ve had my eye on for a number of years, and they continue to add new features and data sets to their visualization and search interface which are very nicely done.
PathCase offers you several ways to browse and search for pathways, processes, organisms, and also molecular entities (such as ATP, ions, etc) as well as genes and proteins. It’s all integrated into the system, so when you find an item of interest you can move to the other related pieces. For example, from the Pathways you can find genes and learn more about the genes. From genes you can load the pathways in which they participate.
When you have the pathway graphics loaded, you can interact with that pathway by clicking, dragging, re-organizing and more. Right-clicking offers more details about the items and ways to visualize the data. One option I didn’t have time to show in the movie is that you can use the H20/CO2 box to load up pathways that are linked to the one you are looking at and load those up, going even further along any route that you might be interested in. Here’s just a quick sample of that: from the NARS2 gene page I loaded the alanine pathway, and then added the fatty acid metabolism pathway. Now I can explore both of them with all the standard PathCase tools and understand many of their relationships. Once you start exploring these pathways you be amazed at how complex visualizations are possible.
So if you are interested in biological pathways, exploring them and representing them, check out PathCase.
Recently while watching the #bioinformatics tag on Twitter I saw Khader Shameer mention Caleydo. I was instantly hooked at the very clever visualization strategy that they are using to provide more surface area for examining the data you are interested in viewing. Their specific topics are pathways and gene expression, but it got me thinking about various data types that I would like to see connected in this way. This week’s Video Tip of the Week is about this sofware.
Caleydo delivers a 3D representation of the expression and pathway data. The main user interface has an area that is a box. They call it a bucket, but in my head buckets are round, so I think of this as a box. On the floor of the box you have a graphic. But because you also have 4 interior surfaces of the box you have 4 more places to display and link the data. You can have a heat map microarray representation on one side, and various pathways associated with the genes in that microarray on the other sides.
There’s a short systems biology Application Note in Bioinformatics that describes the framework and gives an overview of the tool. But there’s also a more detailed paper over at their publication site that will get you started (that 2010 paper for the Visualization conference in Taipei).
My computer is a bit underpowered, but I was able to load their webstart version and begin to look around. They provide some sample data you can select and examine. For the movie this week, though, I was unable to load that and run the recording software at the same time. So mostly it’s an introduction to the concept and the site. You’ll have to go over and load it up yourself to try it out. If the webstart version doesn’t work for you, there are a couple of other download options for different platforms.
The Caleydo team has also done a YouTube overview of the features that you can examine.
From a systems biology mailing list I’m on, I got this announcement. As I no longer qualify as…ah…young…apparently…I’m going to pass this along to young investigators interested in this area. Personally if I was starting out a career in biology/bioinformatics/genomics at this time I would move in this direction. There’s some very exciting stuff developing there, and it looks to me like there’s room to find a niche.
Anyway, here’s the announcement, reprinted with permission. There’s no web site yet, the tell me they’ll let me know when one is live.
The European Commission is funding the FutureSysBio project, a coordination and support action that aims at shaping and predicting the future development of the field Systems Biology. We are now planning for the first focused workshop, which will discuss the topic “What is needed for SysBio to enter the clinic”?
Focus will be very much on open discussions and brainstorming around specific topics and questions The outcome of the workshop will be documented and disseminated in various reports to different target groups (funding agencies, industry, the scientific community, media, the general public) and possibly as a publication in a suitable scientific journal. Social events will also be organised.
The workshop will take place in Gothenburg, Sweden, with arrival in the morning of Nov. 19 and departure in the afternoon of Nov. 21.
We are looking for ten young scientists (max five years after PhD) who would like to actively participate in the workshop. Please send by e-mail (martin.markstrom[AT]gu.se): (i) a one-page letter of application describing your interests and why you would like to participate; (ii) a one page CV; (iii) the name and email address of a senior scientist at your department who can act as your reference.
Your application should reach us by mail no later than Oct. 20. Successful applicants will be notified by Oct. 24. The FutureSysBio project will cover participants’ costs as far as they are within the usual range (economy class ticket).
With the best wishes,
Stefan Hohmann and Jens Nielsen
Check it out. Pass it along to interested colleagues.
We are thinking quite a bit about pathway tools these days. I got a jolt of input on them from the ICSB meeting recently, too. As I continue to progress through my meeting notes I’ll be checking out more tools and writing about them.
One of the things that seemed new and important (well, to me at least) was the first release of a set of standards for drawing pathway diagrams. During this meeting they announced the release of 1.0 of the SBGN notations (also known as Level 1; more levels are anticipated as this progresses). SBGN is Systems Biology Graphical Notation. You can access the SBGN site here: http://www.sbgn.org/
The idea is that if we can standardize our representations of pathways we can all be sure that the meanings are the same for arrows, and boxes, and so on. For example, if I draw a pathway with an arrow, my arrow means the same thing as your arrow in your pathway diagram.
What the SBGN team says on their homepage:
SBGN defines a comprehensive set of symbols with precise semantics, together with detailed syntactic rules defining their use and how diagrams are to be interpreted.
So this is kind of like Gene Ontology for systems and pathway diagrams. Not only is it increasing the clarity of the diagrams, it will guide software development so that software for generating diagrams and analytical tools can work in this framework as well.
There are examples on the web site. Check out the document on the specifications (a big PDF), too–lots of detail on what, how, and why this is important. They want feedback on this. You can also check out the associated SBGN wiki with more details and the SBGN forum for interaction with the team.
Still enjoying the ICSB meeting, and it is a gorgeous fallish morning in Göteborg. What a great city and terrific people here. Not entirely sure I want to come home….
My brain is approaching “full” already, and there are still several days to go. I’ll have a lot of tools to talk about in the coming weeks as I check more of them out. But I wanted to talk about a couple of neat tools that I have heard about so far. First–CellDesigner 4.0, that I mentioned the other day, was a good choice of tutorials to attend for sure. You can access their tutorial material here. Turns out they are also about to release a web-based version of this that will be a collaborative community editing tool for networks and pathways. It is called Payao–which I’m told means a type of “fish-aggregating device” according to their poster. I was unable to catch the poster authors so far to discuss it further, but it looked like a neat tool. I can’t find a release on the web yet and there was no URL on the poster. I’ll try to track them down again today.
Another fun tool (which I haven’t had a chance to use much yet) is BioMyn. The idea behind BioMyn is that it is something like a Google search for systems biology and other relevant biological data types. It aggregates a lot tools into a single search, here’s a partial list: ensembl, MINT, GAD, HPRD, Corum, InterPro, PDB, OMIM, GO, Reactome, KEGG, UniProt, HiMap, IntAct, GNF, and DIP. I spoke to Fidel Ramirez, the creator, about this tool and he was very eager to have users and feedback on this new beta phase. He was saying that people have suggested the results link should be re-organized a bit. If you do a search you get a list of results and some context. The link at the top goes to your “best” resource hit–leaving BioMyn. But if you click the link at the bottom of the result ( View all annotations for …) you go to the aggregated results in BioMyn. Organized into a collection of data in tabs, you can find a wealth of information on this gene. You can find gene links, of course, but also diseases, pathways, interactions, GO terms, and on and on. Anyway–check it out. And keep in mind it is beta. Feel free to offer feedback here and I will pass it along to the developers–they don’t have a feedback link on the site yet. But they do have a blog, I suppose you could put comments over there. In fact, I’ll suggest that to the team today if I see them.
I’m in Göteborg today, at the ICSB International Conference on Systems Biology. I’m taking a couple of the tutorials today. My first one is on the Edinburgh Pathway Editor or EPE (description of this can be found in this PDF of the session). They have made available a 2.0 version for the attendees here. I’m eager to learn more about this.
The next session I signed up for is CellDesigner 4.0. You can learn more about it from the session PDF here.
Both of these tools require downloading and installation. We usually focus more on web-based software, but these systems biology tools often require you to install and run it locally. You can get the software directly from these sites:
The Jackson Lab courses are really terrific–I took a couple of them while I was up there for my post-doc. This announcement just came over the mailing list and I thought I would pass it along. I’m thinking a lot about systems biology and the tools for it right now (more details on that later), and so I’m going to need more people generating data so I can play with the tools. And September in Maine is a nice time, btw:
Applications continue to be accepted for the Short Course on Systems Genetics being held September 23-29, 2008 at The Jackson Laboratory in Bar Harbor, Maine.
This one-week course will cover computational and experimental approaches to genetic studies that utilize the whole genome approaches. Lectures and computer workshops are designed to accommodate students with a wide variety of backgrounds. Biologists seeking to gain a deeper understanding of statistical and computational methods as well as quantitative scientists desiring exposure to biological problems are welcome. Topics to be covered include genetic mapping, gene expression microarray analysis and computational modeling of complex systems.