From Deepak Singh:
Scientists spend years collecting and generating increasing amounts data. The data ranges from raw instrument data, “finished” data (e.g. a
genome sequence which is constructed after aligning all the short reads from a next-gen sequencer), and annotated data, which has been marked up to add additional information. We have repositories where a lot of this data goes, RCSB, NCBI, etc. In many cases there is clarity in these
destinations and for the better part, resources like RCSB and NCBI are well funded and long lived (although I am always nervous about RCSB). However, many data repositories are dependent on funding, with no guarantees that the funding will be renewed. Given the size of some of these data resources, shouldn’t we be thinking of a more sustainable model for funding? This is a general problem for infrastructure resources, given the cost and the fact that you shouldn’t be looking at these from a 3-5 year perspective. This especially baffles me when libraries come into play. Shouldn’t the timescale there be in the 10’s of years?mndoci.com, The disconnect in funding data resources, Oct 2009
You should read the whole article.
A recent example of this is the arabidopsis resource, TAIR.
It is an excellent resource (tutorial), but currently their homepage includes a plea for a change in the funding mechanisms for long-term research data infrastructure mechanisms. Their previous NSF grant expired, and a current one has dramatically decreasing levels of funding (as you can see in the image above). They’ve been encouraged to find other funding sources (subscriber fees), etc. a and the NSF is considering looking at other possibilities to change their funding mechanisms.
Dealing with genomics and biology databases on a daily basis, we have seen this all to often. Funding exists to create and develop an excellent resource, but mechanisms to maintain these resources are hard to come by. As Deepak and the TAIR developers suggest, we researchers need to have a discussion about how build a more sustainable model that keeps this data freely available and accessible to researchers for long periods of time.