Tag Archives: gold

Video Tip of the Week: GOLD, Genomes OnLine Database

Yes, I know some people suffer from YAGS-malaise (Yet Another Genome Syndrome), but I don’t. I continue to be psyched for every genome I hear about. I even liked the salmon lice one. And Yaks. The crowd-funded Puerto Rican parrot project was so very neat. These genomes may not matter much for your everyday life, and may not exactly be celebrities among species. But we’ll learn something new and interesting from every one of them. It’s also very cool that it’s bringing new researchers, trainees, and citizens into the field.

The good news is there is opportunity still for many, many more species. And decreasing costs will make it possible for more research teams to do locally-important species. But–it would be a shame if we wasted resources by doing 30 versions of something cute, rather than tackling new problems. A central registry for sequencing projects may help to manage this. Genomes OnLine Database has been cataloging projects for years, and it would be great if folks would register their research there.

I was reminded of this by a tweet I saw come through my #bioinformatics column. This is what I saw flying by:

As much as I enjoy Twitter and think that science nerds are pretty good at it, it’s hard to know if the right people will see a tweet. Anyway, I suggested that this researcher check out GOLD and BioProject to see if anyone had registered anything.

I realized that although we have talked about GOLD in the past, it hadn’t been highlighted in our Tips of the Week before. So here I will include a video from a talk about GOLD. Ioanna Pagani gives an overview of GOLD, the foundations and the purpose. And then she goes on to demonstrate how to enter project metadata into their registry (~12min). Watching this will help you to understand the usefulness of GOLD, and what you can expect to find there. She describes both single-species project entry, and another option for entering metagenome data projects (~25min).

In the News at GOLD, they mention that their update this summer resulted in some changes to the interface–so the specifics might be a bit different from the video. But the basic structural features are still going to be useful to understand the goals and strategies. It may also help to convey the importance of appropriate metadata for genome projects. If you are involved with these projects, checking out the team’s paper on the structure and use of metadata is certainly worthwhile.

In times of all this sequencing capacity, people are going to start looking for new organisms to cover. Of course, some people will want to look at another strain, isolate, geographical sample for good reasons–but keeping a lot of unnecessary duplication from happening would be nice too. And it would be great if submitters also conformed to the standards for genome metadata–the ‘Minimum Information about a Genome Sequence’ (MIGS, now in the broader collection of standard checklists in the MIxS project) standards being developed by the Genomic Standards Consortium. (You can see how GOLD conformed to this in their other paper below.) Let’s spread the resources around to get new knowledge when we can. I would like to see a more formal mechanism that connects people who have some genome of interest with researchers who might have the bandwidth to do it, as well. Social sequencing?

Quick links:

GOLD: http://www.genomesonline.org

Genomics Standards Consortium: http://gensc.org/

Pagani I., J. Jansson, I.-M. A. Chen, T. Smirnova, B. Nosrat, V. M. Markowitz & N. C. Kyrpides (2011). The Genomes OnLine Database (GOLD) v.4: status of genomic and metagenomic projects and their associated metadata, Nucleic Acids Research, 40 (D1) D571-D579. DOI: http://dx.doi.org/10.1093/nar/gkr1100

Liolios K., Lynette Hirschman, Ioanna Pagani, Bahador Nosrat, Peter Sterk, Owen White, Philippe Rocca-Serra, Susanna-Assunta Sansone, Chris Taylor & Nikos C. Kyrpides & (2012). The Metadata Coverage Index (MCI): A standardized metric for quantifying database metadata richness, Standards in Genomic Sciences, 6 (3) 444-453. DOI: http://dx.doi.org/10.4056/sigs.2675953

Field D., Tanya Gray, Norman Morrison, Jeremy Selengut, Peter Sterk, Tatiana Tatusova, Nicholas Thomson, Michael J Allen, Samuel V Angiuoli & Michael Ashburner & (2008). The minimum information about a genome sequence (MIGS) specification, Nature Biotechnology, 26 (5) 541-547. DOI: http://dx.doi.org/10.1038/nbt1360

What’s the Answer? Haploview and LD plots?

BioStar is a site for asking, answering and discussing bioinformatics questions. We are members of thecommunity and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those questions and answers here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

This week’s highlighted question is….

 Is Haploview the best method for defining LD blocks?

The answer to this question is, not necessarily, but it’s good.

The answer given goes into some detail as to what methods one might want to consider:

I would look at the methods used by Gil McVean and others in determining recombination hotspots across the human genome. I would also read papers by DW Bowden on MYH9 as I know that they narrowed the susceptibility region (end-stage renal disease) with LD and recombination hotspots.

It may be that all three methods are more or less equally informative for your region, thereby giving “fuzzy” boundaries. Recall that LD boundaries are not as precise as a single SNP coordinate – although results from Haploview, HelixTree and others often give that impression.

Mentioned here:

Haploview (a publicly available analysis tool)
HelixTree (for fee analysis tool)
Might also want to check analysis tools listed here or here.

“What’s the answer?” thread

BioStar is a site for asking, answering and discussing bioinformatics questions. We are members of the community and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday* we will be highlighting one of those questions and answers here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

BioStar Question of the Week:

A question that we get all the time from people is related to the number of genomes sequenced, or projects underway, or similar sorts of metrics. There was a question that generated some helpful resources on these things at BioStar:

Exponentially increasing Genomes slide

I always see a slide in talks what shows an increasing number of genomes available in GenBank or other database. Where is this slide from? I have seen an outdated one from Genomes Online but nothing recent.

How can I find this graph and cite it for my own talk?

And the highlighted answer is shown below. But there were various other helpful directions that you might also want to see over there.

You might want to have a look at the statistics from GOLD the ‘Genomes OnLine Database’ here as this has statistics at the genome, not basepair level.  –by Daniel Swan

Check out the full set of answers for more.

Fun with word searching genomes

genomes word search imageIt’s fun day here at Openhelix :). We were noticing that we were getting searches for things like “word search for non infectious diseases” and “Protein word search,” apparently due to this earlier post about searching AA sequences for real words. So we thought we’d run with it :). Using this site, I’ve created a word search using a few (30) of the completed eukaryotic genomes (species’ common names) you can find on GOLD (a list of completed and ongoing genome projects, great resource). So, try it, test your knowledge of which genomes are completed and your ability to see words in weird places. If you want, you can download a pdf of the word search puzzle here (genomes word search) to mark up. Now, I haven’t listed the the species names on the file or here, you know… to make it a bit more challenging. But if you really need to see which words you are looking for you can see them here, and if you just need the key, it’s at the bottom of the linked page.

So, when you are waiting for that experiment or on the bus home, have at it. We might do these occasionally for the fun of it.

A genome a day

I want to say ‘keeps the ? away” but can’t think of anything. This is just a quick post. Mary’s first line on the corn genome post, “sometimes it feels like ‘another day, another genome‘ round here got me to thinking”, it isn’t so far off the mark. According to GOLD there are 905 ongoing eukaryotic genomes (according to Entrez Genome Project,  it’s 225, but those are only ones reported to NCBI). The cost of sequencing and completing a genome has drastically decreased. For example, Illumnia recently reported sequencing a human genome in 1 month for $100,000.

Continue reading