Speaking of Genbank’s 25th, a few weeks ago Science had a news piece “Proposal to ‘Wikify’ Genbank Meets Stiff Resistance.” Apparently, those in the Mycology research community have found many inaccuracies in the Genbank records and wish to see a change that would allow annotations to be made by the community:

a scheme like those used in herbaria and museums, where specimens often have multiple annotations: listing original and new entries side by side. It would be a community operation, like Wikipedia, in which the users themselves update and add information, but not anonymously.

But the idea is meeting resistance from Genbank’s Managers:

GenBank’s managers are dead set against letting users into GenBank’s files, however. They say there already are procedures to deal with errors in the database, and researchers themselves have created secondary databases that improve on what GenBank has to offer.

But would this work? In Francis Collins’ talk at the GenBank anniversary he mentions the proposal though isn’t sold (video, at 2hr 22min). But Steven Salzberg suggests in an opinion piece in Genome Biology last year that the current solution isn’t working and one possible solution is to add a layer of ‘wiki’ to the database.

“Wikifying” a genome database or Genbank would be one more step beyond even what TAIR is doing. TAIR hopes that authors will curate the database and save on time and money spent on curators. But there is some concern that this might not work as planned.

Though wikis have been quite successful in some instances (of course wikipedia being an example), the internet is littered with dead wikis that never got the community support to keep them going successfully. It is also littered with wikis filled with inaccuracies, spam links, bad information and turf battles. If a wikified GenBank went the way of a lot of other wikis, the solution to fix inaccuracies might be worse than the original problem.

The wikification of GenBank would be allowing only credentialed editors make changes I suppose, but in reaction to a suggestion for academics with research credentials to enroll as identifiable editors of Wikipedia (in order to increase the accuracy of those articles), one blogger writes:

There’s nothing nastier or more tenacious than credentialed scholars squabbling about their area of research.

He might have a point, though this is GenBank and sequences, not the minutiae of population genetics (I’ve seen some quite acrimonious discussions there!) . Perhaps a way could be made for a ‘wikified’ layer of genbank and I can definitely see the usefulness. Wikis can be a great tool for the right community and subject.

    Well it doesn’t have to be full wikification – I think that part is getting a little too much obsession.

    If someone does an analysis (like builds a tree) on existing that shows that a) the species/genus is wrong b) the “uncultured soil clone” is actually a known species then it should be possible to attach this information to a record so that the next time someone does a BLAST search, etc they have a way of building on that knowledge. The Letter to Science was primarily focused on attaching proper or more detailed species identification while Steven’s opinion more on fixing gene structure problems in genbank (which is also a huge problem that also needs help).

    The problem right now is there is no way to update the actual record (or attach a pointer to a corrected one) except if the original author submits the request OR they have turned that permission over to a proxy like all the yeast chromosome sequences have been been proxied to SGD to make the updates.

    just a webpage to list all these comments on sequences wiuld be fine. For each sequence there would be a link to discussion related to it and an improvement suggestion. Databases of “improved” sequences will be available. You chan choose individually which database to trust most

    I understand, and actually agree, the ability to attach information to a record could be highly useful, both in the case of attaching proper/more detailed identification and gene structure problems in genbank.

    In fact, in searching databases like Genbank, I find this one of the more frustrating aspects, finding a sequence or data and then spending an inordinate amount of time confirming that sequence (or not doing it and getting misleading data). I can see this could be very helpful.

    I do see also the issue of moderation, even among scientific research communities, to insure the quality of the attached data.

    It will be interesting to see how this develops over time.

