Tag Archives: paper

(re)Funding Databases II

ResearchBlogging.orgSo, I wrote about defunding resources and briefly mentioned a paper in Database about funding (or ‘re’funding) databases and resources. I’d like to discuss this a bit further. The paper, by Chandras et. al, discusses how databases and, to use their term, Biological Resource Centers (BRCs) are to maintain financial viability.

Let me state first, I completely agree with their premise, that databases and resources have become imperative. The earlier model of “publication of experimental results and sharing of the reated research materials” needs to be extended. As they state:

It is however no longer adequate to share data through traditional modes of publication, and, particularly with high throughput (‘-omics) technologies, sharing of datasets requires submission to public databases as has long been the case with nucleic acid and protein sequence data.

The authors state, factually, that the financial model for most biological databases (we are talking the thousands that exist), has often been a 3-5 year development funding, that once runs out, the infrastructure needs to be supported by another source. In fact, this has lead to the defunding of databases such as TAIR and VBRC (and many others), excellent resources with irreplaceable data and tools, that then must struggle to find funding to maintain the considerable costs of funding infrastructure and continued development.

The demands of scientific research, open, shared data, require a funding model that maintains the publicly available nature of these databases. And thus the problem as they state:

If, for financial reasons, BRCs are unable to perform their tasks under conditions that meet the requirements of sceintfic research and the deamnds of industry, scientists will either see valuable information lost or being transferred into strictly commercial environment with at east two consequences: (i) blockade of access to this information and/or high costs and (ii) loss of data and potentioal for technology transfer for the foreseeable future. In either case the effect on both the scientific and broader community will be detrimental.

Again, I agree.

They discuss several possible solutions to maintaining the viability of publicly available databases including a private-public dual tier system where for-profits paid an annual fee and academic researchers have free access. They mention Uniprot, which underwent a funding crisis over a decade ago, as an example. Uniprot (then Swissprot) went back to complete public funding in 2002. There are still several other databases that are attempting to fund themselves by such a model. BioBase is one where several databases have been folded. TransFac is one. There is a free, reduced functionality, version that is available to academics through gene-regulation.com and the fuller version for a subscription at BioBase. This former version allows some data to be shared, as one could see at VISTA or UCSC. I am not privy to the financials of BioBase and other similar models, and I assume that will work for some, but I agree with the authors that many useful databases and resources would be hard-pressed to be maintained this way.

Other possibilities include fully  including databases under a single public institution funding mechanism. The many databases of NCBI and EBI fit this model. In fact, there is even a recent case of a resource being folded into this model at NCBI. Again, this works for some, but not all useful resources.

Most will have to find variable methods for funding their databases. Considering the importance of doing so, it is imperative that viable models are found. The authors reject, out of hand, advertising. As they mention, most advertisers will not be drawn to website advertising without a visibility of at least 10,000 visitors per month. There might be some truth to this (and I need to read the reference they cite that use to back that up).

But the next model they suggest seems to me to have the same drawback. In this model, the database or resource would have a ‘partnership of core competencies.’ An example they cite is MMdb (not to be confused with MMDB). This virtual mutant mouse repository provides direct trial links to Invitrogen from it’s gene information to the product page. They mention that though 6 companies were approached, only one responded. It would seem that this model has the same issues as directly selling advertising.

They also mention that, at least for their research community of mouse functional genomics, “Institutional Funding” seems the best solution for long-term viability and open access. Unfortunately, until institutions like NIH and EMBL are willing or able to fund these databases, I’m not sure that’s thats a solution.

As they mention in the paper, the rate of growth of the amounts and types of data that is being generated is exponential. I am not sure that government or institutional funding can financially keep up with housing the infrastructure needed to maintain and further develop these databases so that all the data generated can remain publicly and freely accessible.

Information is should be free, but unfortunately it is not without cost. It will be interesting to see how funding of databases and resources evolves in this fast growing genomics world (and imperative we figure out solutions).

PS: On a personal note, the authors use their resource, EMMA (European Mouse Mutant Archive), as an example in the paper. I like the name since it’s the name of my daughter, but it just goes to prove that names come in waves. We named our daughter thinking few would name their daughter the same. When even databases name the same name, you know that’s not the case.

Chandras, C., Weaver, T., Zouberakis, M., Smedley, D., Schughart, K., Rosenthal, N., Hancock, J., Kollias, G., Schofield, P., & Aidinis, V. (2009). Models for financial sustainability of biological databases and resources Database, 2009 DOI: 10.1093/database/bap017

Tip of the Week: Scitable Classrooms

scitable classroomTwo weeks ago I pointed out our paper in Scitable on Genomics resources. This week’s tip is about Scitable itself. Scitable is a collaborative education site with reviewed content and lots of opportunity for experts, students and the public to interact and learn some biology. This tip will show you how to create a classroom in Scitable. This could be used by a faculty member to teach a group of students they have using both Scitable and outside content, or it could be used as a great place to do a study group. I’m actually intrigued by that latter possibility. Check out this week’s tip to learn to quickly set up a classroom on Scitable and then check Scitable out.

Tip of the Week: Genomic Data Resources

scitable paper thumgToday’s tip isn’t on just one resource or a part of one, it’s a whirlwind tour of many! In fact, in less than 5 minutes I’ll put you through ALL of the resources online on genomics. Ok, well, not all. Not even 1% I don’t think, but we’ll go through some categories and discuss very briefly some “Challenges and Promises.” Actually, it’s my way of introducing you to a paper we just wrote for Nature Education’s site, Scitable entitled, you guessed it, “Genomic Data Resources, Challenges and Promises.” Hopefully, after watching this short movie as I whisk you through some of the sites and topics we raise in the paper, you’ll be interested in going and reading it and checking out Scitable. Also, below the fold, you’ll find a nice table of links to all the resources we discuss in the paper, which by necessity are only a small portion of the thousands of resources available.

Continue reading

OpenHelix and UCSC authors publish introduction to the use of the UCSC Genome Browser

Mary Mangan and other OpenHelix authors, along with Donna Karolchik of UCSC,  have published an introduction to the vast amount of information available from the UCSC Genome Browser in the journal, Biotechnology Annual Review. In the review entitled UCSC Genome Browser: Deep support for molecular biomedical research” the authors describe the diverse types of annotated sequence data, the “complex database queries [that] are also easily achieved with the Table Browser interface” and many of the associated tools available at the site. As the authors state,

The volume and complexity of genomic sequence data, and the additional experimental data required for annotation of the genomic context, pose a major challenge for display and access for biomedical researchers. Genome browsers organize this data and make it available in various ways to extract useful information to advance research projects.

The UCSC Genome Browser is one such excellent genome browser and this review is a step toward helping users find and use the great amount of data available to them at this UCSC resource.

MANGAN, M., WILLIAMS, J., LATHE, S., KAROLCHIK, D., LATHEIII, W. (2008). UCSC Genome Browser: Deep support for molecular biomedical research. Biotechnology Annual Review, 14, 63-108. DOI: 10.1016/S1387-2656(08)00003-3

Addtionally, to find out more about the UCSC Genome Browser, users can freely access the online tutorials, exercises, slides, Quick Reference Cards and other training materials created by OpenHelix scientists and sponsored by UCSC at UCSC Genome Browser tutorial landing page. To find additional free and sponsored tutorial suites, visit  OpenHelix Sponsored Tutorials. Dozens of additional tutorials suites are also available for a subscription at the OpenHelix Tutorial Catalog. Do not forget to visit the  OpenHelix Blog for up-to-date information on genomics.

About OpenHelix
OpenHelix, LLC, (http://www.openhelix.com) provides the genomics knowledge you need when you need it. OpenHelix currently provides online self-run tutorials and on-site training for institutions and companies on the most powerful and popular free, web based, publicly accessible bioinformatics resources. In addition, OpenHelix is contracted by resource providers to provide comprehensive, long-term training and outreach programs.