Video Tip of Week: Bioproject, it’s where to start finding data (hint, not the papers so much anymore))

A few months ago, Jennifer did a nice tip on on NCBI’s Genome Resources and the changes there. There she briefly mentioned Genome Project resource moving to a new home, BioProject, just about a year ago. Today, I’d like to give you a quick overview of BioProject. It was described in this year’s issue of NAR’s database issue: ”BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata.From the abstract:

As the volume and complexity of data sets archived at NCBI grow rapidly, so does the need to gather and organize the associated metadata. Although metadata has been collected for some archival databases, previously, there was no centralized approach at NCBI for collecting this information and using it across databases. The BioProject database was recently established to facilitate organization and classification of project data submitted to NCBI, EBI and DDBJ databases.

This is just one step in the process the biological science community will have to do to get a handle of the data deluge. If scientists are to get a handle of the projects and data that is spewing at breakneck speeds, a key is knowing what data is being generated, organizing the projects.

As Mary (and we here at OpenHelix) keep not-so-gently reminding you, the data isn’t in the papers any more. Huge projects like 1000 Genomes, ENCODE and others and reduced sequencing costs produce enough data that finding it is difficult.

BioProject grew out of a need to better organize these large projects’ datasets and metadata and replaces NCBI’s Genome Project resource. These projects produce data which is then deposited in several repositories. BioProject “provides an organizational framework to access metadata about research projects and the data from those projects which is deposited, or planned for deposition, into archival databases.”

Quick Links:

BioProject Help
BioSample (descriptions of biological source materials used in experimental assays)
ENCODE (sponsored tutorial)
1000 Genomes 


Barrett, T., Clark, K., Gevorgyan, R., Gorelenkov, V., Gribov, E., Karsch-Mizrachi, I., Kimelman, M., Pruitt, K., Resenchuk, S., Tatusova, T., Yaschenko, E., & Ostell, J. (2011). BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata Nucleic Acids Research, 40 (D1) DOI: 10.1093/nar/gkr1163