Video Tip of the Week: GOLD, Genomes OnLine Database

Yes, I know some people suffer from YAGS-malaise (Yet Another Genome Syndrome), but I don’t. I continue to be psyched for every genome I hear about. I even liked the salmon lice one. And Yaks. The crowd-funded Puerto Rican parrot project was so very neat. These genomes may not matter much for your everyday life, and may not exactly be celebrities among species. But we’ll learn something new and interesting from every one of them. It’s also very cool that it’s bringing new researchers, trainees, and citizens into the field.

The good news is there is opportunity still for many, many more species. And decreasing costs will make it possible for more research teams to do locally-important species. But–it would be a shame if we wasted resources by doing 30 versions of something cute, rather than tackling new problems. A central registry for sequencing projects may help to manage this. Genomes OnLine Database has been cataloging projects for years, and it would be great if folks would register their research there.

I was reminded of this by a tweet I saw come through my #bioinformatics column. This is what I saw flying by:

As much as I enjoy Twitter and think that science nerds are pretty good at it, it’s hard to know if the right people will see a tweet. Anyway, I suggested that this researcher check out GOLD and BioProject to see if anyone had registered anything.

I realized that although we have talked about GOLD in the past, it hadn’t been highlighted in our Tips of the Week before. So here I will include a video from a talk about GOLD. Ioanna Pagani gives an overview of GOLD, the foundations and the purpose. And then she goes on to demonstrate how to enter project metadata into their registry (~12min). Watching this will help you to understand the usefulness of GOLD, and what you can expect to find there. She describes both single-species project entry, and another option for entering metagenome data projects (~25min).

In the News at GOLD, they mention that their update this summer resulted in some changes to the interface–so the specifics might be a bit different from the video. But the basic structural features are still going to be useful to understand the goals and strategies. It may also help to convey the importance of appropriate metadata for genome projects. If you are involved with these projects, checking out the team’s paper on the structure and use of metadata is certainly worthwhile.

In times of all this sequencing capacity, people are going to start looking for new organisms to cover. Of course, some people will want to look at another strain, isolate, geographical sample for good reasons–but keeping a lot of unnecessary duplication from happening would be nice too. And it would be great if submitters also conformed to the standards for genome metadata–the ‘Minimum Information about a Genome Sequence’ (MIGS, now in the broader collection of standard checklists in the MIxS project) standards being developed by the Genomic Standards Consortium. (You can see how GOLD conformed to this in their other paper below.) Let’s spread the resources around to get new knowledge when we can. I would like to see a more formal mechanism that connects people who have some genome of interest with researchers who might have the bandwidth to do it, as well. Social sequencing?

Quick links:


Genomics Standards Consortium:

Pagani I., J. Jansson, I.-M. A. Chen, T. Smirnova, B. Nosrat, V. M. Markowitz & N. C. Kyrpides (2011). The Genomes OnLine Database (GOLD) v.4: status of genomic and metagenomic projects and their associated metadata, Nucleic Acids Research, 40 (D1) D571-D579. DOI:

Liolios K., Lynette Hirschman, Ioanna Pagani, Bahador Nosrat, Peter Sterk, Owen White, Philippe Rocca-Serra, Susanna-Assunta Sansone, Chris Taylor & Nikos C. Kyrpides & (2012). The Metadata Coverage Index (MCI): A standardized metric for quantifying database metadata richness, Standards in Genomic Sciences, 6 (3) 444-453. DOI:

Field D., Tanya Gray, Norman Morrison, Jeremy Selengut, Peter Sterk, Tatiana Tatusova, Nicholas Thomson, Michael J Allen, Samuel V Angiuoli & Michael Ashburner & (2008). The minimum information about a genome sequence (MIGS) specification, Nature Biotechnology, 26 (5) 541-547. DOI: