Tag Archives: vbrc

(re)Funding Databases II

ResearchBlogging.orgSo, I wrote about defunding resources and briefly mentioned a paper in Database about funding (or ‘re’funding) databases and resources. I’d like to discuss this a bit further. The paper, by Chandras et. al, discusses how databases and, to use their term, Biological Resource Centers (BRCs) are to maintain financial viability.

Let me state first, I completely agree with their premise, that databases and resources have become imperative. The earlier model of “publication of experimental results and sharing of the reated research materials” needs to be extended. As they state:

It is however no longer adequate to share data through traditional modes of publication, and, particularly with high throughput (‘-omics) technologies, sharing of datasets requires submission to public databases as has long been the case with nucleic acid and protein sequence data.

The authors state, factually, that the financial model for most biological databases (we are talking the thousands that exist), has often been a 3-5 year development funding, that once runs out, the infrastructure needs to be supported by another source. In fact, this has lead to the defunding of databases such as TAIR and VBRC (and many others), excellent resources with irreplaceable data and tools, that then must struggle to find funding to maintain the considerable costs of funding infrastructure and continued development.

The demands of scientific research, open, shared data, require a funding model that maintains the publicly available nature of these databases. And thus the problem as they state:

If, for financial reasons, BRCs are unable to perform their tasks under conditions that meet the requirements of sceintfic research and the deamnds of industry, scientists will either see valuable information lost or being transferred into strictly commercial environment with at east two consequences: (i) blockade of access to this information and/or high costs and (ii) loss of data and potentioal for technology transfer for the foreseeable future. In either case the effect on both the scientific and broader community will be detrimental.

Again, I agree.

They discuss several possible solutions to maintaining the viability of publicly available databases including a private-public dual tier system where for-profits paid an annual fee and academic researchers have free access. They mention Uniprot, which underwent a funding crisis over a decade ago, as an example. Uniprot (then Swissprot) went back to complete public funding in 2002. There are still several other databases that are attempting to fund themselves by such a model. BioBase is one where several databases have been folded. TransFac is one. There is a free, reduced functionality, version that is available to academics through gene-regulation.com and the fuller version for a subscription at BioBase. This former version allows some data to be shared, as one could see at VISTA or UCSC. I am not privy to the financials of BioBase and other similar models, and I assume that will work for some, but I agree with the authors that many useful databases and resources would be hard-pressed to be maintained this way.

Other possibilities include fully  including databases under a single public institution funding mechanism. The many databases of NCBI and EBI fit this model. In fact, there is even a recent case of a resource being folded into this model at NCBI. Again, this works for some, but not all useful resources.

Most will have to find variable methods for funding their databases. Considering the importance of doing so, it is imperative that viable models are found. The authors reject, out of hand, advertising. As they mention, most advertisers will not be drawn to website advertising without a visibility of at least 10,000 visitors per month. There might be some truth to this (and I need to read the reference they cite that use to back that up).

But the next model they suggest seems to me to have the same drawback. In this model, the database or resource would have a ‘partnership of core competencies.’ An example they cite is MMdb (not to be confused with MMDB). This virtual mutant mouse repository provides direct trial links to Invitrogen from it’s gene information to the product page. They mention that though 6 companies were approached, only one responded. It would seem that this model has the same issues as directly selling advertising.

They also mention that, at least for their research community of mouse functional genomics, “Institutional Funding” seems the best solution for long-term viability and open access. Unfortunately, until institutions like NIH and EMBL are willing or able to fund these databases, I’m not sure that’s thats a solution.

As they mention in the paper, the rate of growth of the amounts and types of data that is being generated is exponential. I am not sure that government or institutional funding can financially keep up with housing the infrastructure needed to maintain and further develop these databases so that all the data generated can remain publicly and freely accessible.

Information is should be free, but unfortunately it is not without cost. It will be interesting to see how funding of databases and resources evolves in this fast growing genomics world (and imperative we figure out solutions).

PS: On a personal note, the authors use their resource, EMMA (European Mouse Mutant Archive), as an example in the paper. I like the name since it’s the name of my daughter, but it just goes to prove that names come in waves. We named our daughter thinking few would name their daughter the same. When even databases name the same name, you know that’s not the case.

Chandras, C., Weaver, T., Zouberakis, M., Smedley, D., Schughart, K., Rosenthal, N., Hancock, J., Kollias, G., Schofield, P., & Aidinis, V. (2009). Models for financial sustainability of biological databases and resources Database, 2009 DOI: 10.1093/database/bap017

Tip of the Week: ViralZone at ExPASy

viralzone_tip For today’s tip I am going to continue with the theme I started yesterday – obtaining swine flu information. There are many wonderful viral resources that are publicly available. I  linked to NCBI’s Influenza Virus Resource in yesterdays post, and you can watch past tips that Trey has done tips on the Viral Bioinformatics Resource Center (VBRC) here and here. But for todays tip I’ve like to introduce you to the ViralZone, which is available from the ExPASy Proteomics Server.  ViralZone is  created by the Swiss Institute of Bioinformatics (SIB) and provides a really clear, concise introduction to a virus as well as links to any variety of detailed information.

Cold genomes

coldvirusRecently, we are learning a lot about the cold virus. The genomes of many have now been sequenced (that is a subscription-required Science report, you can read more about the report here).

You can find more genomic information at the picornaviridae.com at the NCBI’s Entrez Genomes and some structural information at MMDB. (just a side note, rhinovirus is now classified as enterovirus).

Bioinformatics in the local news

I’m on a few local mailing lists, including the one for MassHighTech.  I was perusing the news today and saw this tidbit:

URI taps De Groot to head new vaccine research center

….The purpose of the new program, called the Institute for Immunology and Informatics, is to create vaccines to prevent AIDS, malaria, tuberculosis, dengue fever and other diseases. Researchers will use cutting-edge bioinformatics tools to speed up creating treatments and cures for these illnesses, stated URI. This includes using immunomics — informatics, genomics and immunology –- to design better vaccines, diagnostics and therapeutics…..

Immunomics. Another -omic.  Just what I needed.

I think this is neat, actually, though.  I would like to keep an eye on what they are doing.  As an undergrad in microbiology I was talked out of infectious diseases as a career–I was told that funding was dissipating and there wasn’t much interest in that anymore.  But that was probably the most memorable course I took.

Oh, and if you find yourself here because you are looking for some resources for infectious disease genomes, I’ll add a few here.  Feel free to add your other favorites.

VBRC: http://athena.bioc.uvic.ca/

IMG: http://img.jgi.doe.gov/

GSID: http://www.gsid.org/

CMR: http://cmr.jcvi.org/

VectorBase: http://www.vectorbase.org/

EUPathDB: http://eupathdb.org/eupathdb/

BREAKING (really):  Court says vaccine not to blame for autism

Tip of the Week: A quick annotation of a genome

gatu tip thumbnailHey, say you’ve got a bacterial genome you just sequenced in your spare time (hey, the way technology is going, it’s not far off) and you need to do a quick and dirty annotation to get you started. Well, there are several tools out there to do that, predict genes, annotate regions, etc. I’d like to show you one in this tip that you might not have thought of but could be a useful tool to get started. It’s GATU (Genome Annotation Transfer Utility) at VBRC. As the name suggests, this doesn’t do any major gene predicting, what it does is take your genome and compare it to a closely related genome (the closer the better of course) and transfers all the annotation from the characterized genome. This is from a viral resource (VBRC) but it works just as well with bacterial genomes, something that might not have been obvious and puts another tool in your belt.


Tip of the Week: Tweaking those alignments

vbrcbbb thumbThis week’s tip introduces a nice feature and tool of the Viral Bioinformatics Resource Center (VBRC). There are a lot of great tools at the VBRC to search and analyze hundreds of viral genomes. Most, if not all, of the tools can be used for searching and analyzing bacterial genomes also. The tool we are introducing in this tip is Base by Base. This tip actually came from a question from one of our readers in our weekly “WYP” feature a few weeks back. Reader Azalea asked:

I’m looking for a pairwise sequence alignment tool which can anchor specific nucleotides to be arbitrarily aligned.I just hope to fix certain positions to be aligned, which will change the whole alignment.

Chris Upton at VBRC suggested Base by Base. I’ve had the opportunity to use Base by Base and it’s a useful tool for working with pairwise alignments (could probably be used for any two sequences, not just bacterial and viral) and  looks like a tool that Azalea might be able to use. Today’s tip shows you quickly how to add two sequences, align part by hand and select another region to align by algorithm (choice of T-Coffee, ClustalW or MUSCLE).


Quick intro to Viral Bioinformatics Resource Center

I just spent a bit over an hour getting some one-on-one time with Chris Upton at Viral Bioinformatics Resource Center. He was showing me the tools and resources they have (we used the new screensharing iChat feature of Leopard OS 10.5, that alone was worth the cost of upgrading to 10.5 this week, or even dumping your PC and getting a Mac ;-)… but I digress…) and they look quite useful. You can analyze a large number of viral genomes in their database (or upload your own, or a bacterial genome for that matter) in many different ways. Their webpage navigation and look is going to change soon (and I’ll inform you when it does), but the software and tools will remain the same (they are mainly Java programs), so if you want, you can go check them out. I suggest starting with VOCS which is a sort of an advanced search/filter/browser for viral genomes and from which you can access several other of the tools. We’ll be looking at these tools more in depth (I have a couple tips planned for the near future, tutorial), but thought I’d point it out to you now. Quite nice.