Ok, it’s hot now–but it’s something we refer back to all year long, actually. For people who don’t know about the NAR Database Issue, since the mid-90s Nucleic Acids Research has been collecting bioinformatics databases and tools that are of use to a huge range of researchers. We’ve watched it grow over the years and we’ve even graphed it. We’ll have to update that graph with the new data point for this year. But here’s the graph as we published it last year:
(You can get this figure from our paper here, it is Figure 1)
You can see steady growth in the resources collected in the NAR set. But that’s certainly not all of them–others can be found in their server issue in the summer, and some just aren’t listed in a lot of places. We think there are in the range of 3000 tools and resources of some sort around.
A nice overview of the state of play is always provided in the introduction paper for that issue. As they state, this year we are up to 1330 data sources in their list. And they also highlight a couple of editorials that address important issues in this arena. One is about the need for data sources to talk to each other. This is an important point:
these databases risk functioning increasingly as isolated islands in a sea of disparate biological data
And there’s another editorial that speaks to the understanding of the data we have in our hands–and the need to understand it better. It describes COMBREX–a very cool effort:
This project is designed to serve as a clearinghouse, collecting functional predictions from specialists in bioinformatics and functional genomics and then sending these predictions for testing by experimentalists.
This is the kind of thing that makes me wish I still had a lab. There’s so much opportunity here…alas. The road not taken. But a hot opportunity for smart youngsters who might like to carve out a niche with a lab that mines the computational materials and pairs it with great projects for students to do the bench characterizations. And it offers grants to do this work….
Anyway–check out the NAR database issue. It’s worth your time. Really.
EDIT: there’s a fun and interesting crowd-sourced analysis of the NAR databases in the list for features of utility to bioinformatics geeks going on at BioStar.
Williams, J., Mangan, M., Perreault-Micale, C., Lathe, S., Sirohi, N., & Lathe, W. (2010). OpenHelix: bioinformatics education outside of a different box Briefings in Bioinformatics, 11 (6), 598-609 DOI: 10.1093/bib/bbq026
Galperin, M., & Cochrane, G. (2010). The 2011 Nucleic Acids Research Database Issue and the online Molecular Biology Database Collection Nucleic Acids Research, 39 (Database) DOI: 10.1093/nar/gkq1243
Gaudet, P., Bairoch, A., Field, D., Sansone, S., Taylor, C., Attwood, T., Bateman, A., Blake, J., Bult, C., Cherry, J., Chisholm, R., Cochrane, G., Cook, C., Eppig, J., Galperin, M., Gentleman, R., Goble, C., Gojobori, T., Hancock, J., Howe, D., Imanishi, T., Kelso, J., Landsman, D., Lewis, S., Mizrachi, I., Orchard, S., Ouellette, B., Ranganathan, S., Richardson, L., Rocca-Serra, P., Schofield, P., Smedley, D., Southan, C., Tan, T., Tatusova, T., Whetzel, P., White, O., Yamasaki, C., & , . (2010). Towards BioDBcore: a community-defined information specification for biological databases Nucleic Acids Research, 39 (Database) DOI: 10.1093/nar/gkq1173
Roberts, R., Chang, Y., Hu, Z., Rachlin, J., Anton, B., Pokrzywa, R., Choi, H., Faller, L., Guleria, J., Housman, G., Klitgord, N., Mazumdar, V., McGettrick, M., Osmani, L., Swaminathan, R., Tao, K., Letovsky, S., Vitkup, D., Segre, D., Salzberg, S., Delisi, C., Steffen, M., & Kasif, S. (2010). COMBREX: a project to accelerate the functional annotation of prokaryotic genomes Nucleic Acids Research, 39 (Database) DOI: 10.1093/nar/gkq1168