Community Annotation; Beyond Reference Genomes

I’m catching up with some mailing lists and news and I came across this interesting tidbit from our friends in the GMOD community. We are huge supporters of curation by humans for a couple of reasons: 1) we know the quality of that information can be the best and it captures so much of the information biologists need beyond sequence info; and 2) some of us have done curation and we know that it’s underappreciated but far from trivial :) .

We’ve also followed attempts by various groups to get the wider community involved to do a couple of things–to get authoritative and active researchers to put in stuff they know, and to reduce the burden on the overwhelmed professionals.  There have been a variety of ambitious attempts to get people involved in curation. UCSC has a wiki they rolled out. Some journals required Wiki updates. Seeding Wikipedia with some information and encouraging community input has been attempted. Separate new wikis on some topics have been initiated like WikiPathways.  We have been “skeptical optimists” about how some of these would go. We understand the need–but we know that end users of data are busy, they don’t get any work-related credit or time to do this sort of thing, and sometimes they don’t understand the finer points of curation.  But we like to see how the efforts work out and we’d like to see success.

Well–I’ve seen some results on various efforts that you ought to see. The GMOD community recently had a Community Annotation meeting that brought several groups together to discuss their experiences and outcomes. I’m not going to give it away–you need to go read it. One group had a 90% success rate with a strategy they attempted!!  Some groups are using curation as a student project. Others report on things they’ve tried that haven’t had as much success. Anyway: it’s all very interesting to know about–what works and what doesn’t.  And what about communities that don’t have MOD (model organism database)? They touched issues on that too.

There was another meeting too that took on a separate topic: Post Reference Genome Tools.  The premise is this:

How are we going to visualize and exploit (or even cope with) the world three years from now, when small labs may be able to fully sequence 500 individuals or species (or more) in a month? How can we visualize and link together 500, 1000, or 10,000 genomes? Many existing tools assume a reference genome. Will a reference make sense in the future, or will it hold us back?

We know a lot about the volume of data that we’ve already got that so many people aren’t aware of.  As I was just saying the other day: the data’s not in the papers anymore. It’s in these databases and it’s up to you and me to find and deal with it.  But how will the data providers offer it to you? These folks are thinking about this–and it may alter the way you interact with the data.  Again, I’m not giving it away: go read the report.

Thanks for the GMOD community for doing these reports. They are nice to have, and for those of us who can’t be there they offer a really helpful look inside.

Quick links to the reports:

Community Annotation Satellite Meeting Report

Post Reference Genome Tools Satellite Meeting Report


4 thoughts on “Community Annotation; Beyond Reference Genomes

  1. Pingback: Tweets that mention Community Annotation; Beyond Reference Genomes | The OpenHelix Blog --

  2. Mary Post author

    @gsgs: I don’t think that’s going to be sufficient for a lot of the things we are going to want to explore. For example, maybe you end up working on a specific cell line and would want to make that your reference genome–with all of its variations in the context of that cell line. Or maybe you want to have a breast cancer genome as your reference. Or maybe some kind of hypothetical constructed genome…

    But it’s more than that. There may be ways you want to assess 10 or 50 or thousands of genomes at a time, with none of them being a “reference” per se. I can imagine dozens of ways people are going to want to work through data, and I’m sure other people would have dozens of other ideas as well. A list of variations doesn’t capture the complexity, especially if you start going after epigenomic issues of what’s happening in a muscle vs a neuron and all sorts of other perspectives too.

  3. Pingback: “An Abysmal Statistic” | The OpenHelix Blog

Comments are closed.