Gene Wiki?

ResearchBlogging.org PLoS Biology has an article out today entitled “A Gene Wiki for Community Annotation of Gene Function.” The article describes the authors attempts to create a comprehensive gene wiki of gene functions by ‘seeding’ Wikipedia with a foundation of ‘stub’ articles with information from existing databases (such as Entrez Gene). This foundation would then be built upon in Wikipedia fashion by community editing.

The Gene Wiki, like the proposal for the ‘wikification of GenBank‘ and the now online Encyclopedia of Life, is an attempt to harness the power of the community to provide the community with a wealth of annotated information

But, even as the authors admit, the Gene Wiki’s success so far has been muted. Part of the solution to make this a more useful tool is reported in this paper. The authors seeded the gene wiki with entry data from Entrez Gene. This is based on the observation that editors are more likely to add or correct information on a pre-existing article than they are to create a new one. So far the success of this effort of seeding hasn’t been shown (but as the authors suggest, this is recent and as of this paper wasn’t announced).

The authors reference the  Nature article comparing Wikipedia vs. Encyclopedia Britannica showing that Wikipedia stacked up well. I’d caveat that, these were for the most ‘basic’ science comparison, comparisons of scientific concepts that would be expected a college student might get correct. But the authors are suggesting a wiki for something much more complex and deep, the annotation of a gene. Even with the basic comparison, Wikipedia only compared favorably in the sense that it didn’t do nearly as badly as many would expect and only almost as good as Encyclopedia Britannica. Research needs something more than this.

And this would bring me to what I consider the biggest hurdle. The Gene Wiki project is deep and complex science. Wikipedia is doing reasonably well in ‘general science’, but I’m not so sure the model will work for this kind of science. I’m not sure Wikipedia is the venue for this (a similar concern voiced by BBGM) or will be able to bring the level of completeness and accuracy required by scientific research. Wikis have found uses and success, but for every successful wiki, there are a few hundred (thousand) failures littering the internet landscape. For success, a wiki needs a readership knowledgable enough and large enough not only to contribute but to keep data accurate. Too small or too many unknowledgable people adding to an article allows too many errors to creep into and, more importantly in a wiki’s success, to remain in the wiki. I’m not convinced the knowlegable readership of a gene wiki will be large enough.

Perhaps I’d use it like I use Wikipedia now for general use, a starting off point for information and research, but one I never use as a definitive source or reference.

And that is what the authors state in the end:

Importantly, this gene wiki effort  is not meant to be a substitute for existing resources. Gene portals and model organism databases will continue to serve as authoritative references with a specific role for data curation and enforcement of data standards. Moreover, the structured and typed data in gene portals is amenable to incorporation into pipelines and systematic analyses in  a way the information in a gene wiki cannot [22]. Most importantly, because articles are dynamic and not subject to rigorous peer review, the gene wiki is not intended to be a reference that is cited in a traditional peer-reviewed article or used exclusively as a source of gene annotation. Nevertheless, we believe that this gene wiki will be a valuable launch pad for collaboratively summarizing knowledge, and we expect that scientists will synergistically use the gene wiki with traditional gene portals.

Huss, J.W., Orozco, C., Goodale, J., Wu, C., Batalov, S., Vickers, T.J., Valafar, F., Su, A.I. (2008). A Gene Wiki for Community Annotation of Gene Function. PLoS Biology, 6(7), e175. DOI: 10.1371/journal.pbio.0060175

34 thoughts on “Gene Wiki?

  1. Andrew Su

    Hi Trey, Thanks for the thoughtful post on our paper. You raise some valid reasons why the Gene Wiki might not work, and they are not points that are easily refuted. In reply, I can only say that we won’t know its impact and potential for success unless we try it. This model of harnessing the “Long Tail” has obviously been successful in the creation of Wikipedia as a whole, so as biologists, it only makes sense to try to adapt that successful model to gene annotation. (And on that note, I hope everyone reading this will go make one edit on their favorite gene…) Cheers,

  2. Trey

    Thanks for commenting Andrew!

    and you are right, I have no quibbles at all about trying because you are right and you won’t know unless you give it an attempt.

    I do like the idea of ‘seeding’ the wiki and hope that ends up giving it the boost it will need (as perhaps this paper and the blog discussion will too)

    One thing that would help too I would suspect is a link from from specific gene entries of the established portals like Entrez Gene and Ensembl _TO_ the related Gene wiki entries. That might encourage researchers searching the portals then to go and make edits. That might be a hurdle though :D.

  3. Mary

    I agree on the seeding–that’s a great component.

    There’s still no “carrots” though…people are so busy trying to teach students, get grants, and actually doing research that there’s not much incentive to edit a wiki. It just doesn’t count on your NIH grants. Maybe if there was a “community service” line item allowed…?

    I also fear hostile editing. I won’t describe where I saw this, but there is a disease I was researching and I went to look at a disease-specific community forum. There had been a recent paper about a genetic link to this disease and I wanted to see the general public response. There was active hostility to the idea that this disease had a genetic component. I understand it can get edited back….but who has time for those flame wars?

    I’m interested to see how it goes–and I hope it goes well. But I have to say I lean to professional curation.

  4. Andrew Su

    The incentive (or the lack thereof) is a very interesting point. My hope is that if the editing process is easy enough, scientists can make an edit in a minute or less. Then, it’s not something that competes with grant writing or research, but something you can do “just because” over coffee. Even better if you can do it just after you’ve read the article and learned something from another contributor.

    By example, check out the page for Reelin: http://en.wikipedia.org/wiki/Reelin . I commonly use this as an example of what the Gene Wiki could become, for every gene in the human genome. A few things to note:

    - Wikipedia encourages references into the primary literature, which will help scientists gauge the reliability of the statements

    - This page was edited by over 30 editors in over 200 edits since the page was created over 6 years ago. Wikipedia articles aren’t just the product of one person’s heroic efforts to write a complete article. It’s more about a large number of people who make individually small but collectively large contributions.

    - Truth be told, I think the vast majority of edits will be made by undergraduate and graduate students. There are simply more of them, and they have more time. But again, the presence or absence of references allows all users to judge for themselves the reliability of a given statement.

    - For class instructors, can you think about ways to incorporate the work students are already doing in class into Wikipedia gene articles?

  5. Mary

    Ok, I went to reelin. The page is nice. I’m sure hot, competitive, well funded genes will have people actively contributing. I fear it will be uneven, though. Housekeeping genes, not so hot. I searched for the gene I worked on in grad school way back…Map4. I used the mouse symbol though, Mtap4. Not found with the mouse symbol. I would edit it to add the mouse symbol, but I don’t see any place to put it and make that clear.

    But the first link I clicked was broken on the reelin page. But maybe there’s something wrong with HomoloGene today–it is looking flaky to me.

    But that raises a question: if one of the link providers changes format–can someone script a new syntax for the links? Or do editors have to go in and fix each one? What happens in a reference sequence update that changes coordinates–I see the UCSC links are coordinate based? If you aren’t someone who understands HTML or wiki editing or how to create UCSC links properly–who do you tell?

    (I tried the reelin/homologene link with 2 browsers in case it was browser specific, but it wasn’t.)

    I hate to sound so negative–because I wish this could work. I’ve just been involved in too many projects where these sorts of things become issues even for people with paid staff and dedicated time.

  6. Andrew Su

    Hi Mary, your edits to MAP4 look great. Thanks for contributing to the Gene Wiki. :) Not too difficult, right?

    You’re right, adding a new symbol to that right-side infobox is not yet simple. To make a long story short, we hid the code to create that infobox because it’s rather confusing, and we felt most people would be editing the “free text” section of the page. But an “edit” link above the infobox should show up on all pages soon… In the mean time, you could add the info about Mtap4 in the free text section.

    Wikipedia also has the concept of redirects. I just set one up so that http://en.wikipedia.org/wiki/Mtap4 goes to the MAP4 page. The simple code to do it is here: http://en.wikipedia.org/w/index.php?title=Mtap4&action=edit.

    Yes, something fishy seems to be up with Homologene right now. Nothing we can do about that one…

    Good question on updating link syntax. We’ve used Wikipedia “templates” so that if the link syntax changes, it’s just one simple update. For example, the links to OMIM are all constructed using this template: http://en.wikipedia.org/w/index.php?title=Template:OMIM&action=edit. Not that you need to understand how it works, but you can see in there the root of the OMIM URL. UCSC is similar, but slightly more complex.

    Although you may think you sound negative, I think your experience is exactly the one we’re trying to replicate. Except multiplied by a few thousand. After making the first edit, some will be hooked and “adopt” their favorite gene page, some will come back occasionally to edit, and some will not come back at all. But the point I think is that anyone can make that first edit without much difficulty at all… Thanks for giving it a try!

  7. Mary

    Well, I have to say that I didn’t think it wasn’t really easy. I’m embarassed to say that. I’m a professional software trainer with a good grasp of blogging standards and simple html, and a couple of programming courses under my belt.

    If you think this is simple:
    http://screencast.com/t/uWOjq7rKoa then you are seeing very different users than we are. I don’t think it is obvious how to even create a link. I know how, but I only knew that from blogging elsewhere.

    And then it took me another 30 minutes to figure out that I couldn’t fix the ensembl link and the UCSC link which I can see are broken.

    http://en.wikipedia.org/wiki/Mtap4

    Maybe I’m doing it wrong. Or it needs training ;)

    Edit: Oh, and I can see from the subsequent edit that I did do the links wrong. Sigh.

  8. Andrew Su

    Hi Mary, your comments on usability are invaluable. Clearly we’ve been digging around at wikipedia for quite some time, so you’re right, what we consider “easy” may not be for a completely new user.

    I agree that the screenshot above can be a bit intimidating. My suggestion to new users is to use your browser’s “search” function to find the area you want to edit. For example, the default header line on MAP4 is “Microtubule-associated protein 4, also known as MAP4, is a human gene.” If you wanted to add “… is a human gene involved in microtubule depolymerization”, search for “is a human gene”. Then you can simply add the new text after that. That helps to “see past” all of the other gibberish.

    I’ve added the link I promised which allows users to edit the infobox — it now appears above the upper-right corner. The code there is even more intimidating (which is why we decided to hide it one level deep), but the same search strategy above will work well.

    I’ve thought about putting together a screencast to walk people through their first edit. Perhaps your experience will prompt me to do that sooner rather than later.

    But a key point to emphasize! Your first edit doesn’t have to be perfect! If you’re adding useful information, don’t worry so much about the details of how it looks. Someone will probably come by and fix any minor issues, and if you’re “watching” the page you may learn about other tricks on how to edit pages (including adding links, references, images, etc etc.) But the key thing the Gene Wiki allows is two-way communication of information. How would you even have begun to think about adding that new information on MAP4 directly to NCBI’s gene report?

  9. Mary

    Well, NCBI had the correct link to Ensembl on the MAP4 page. And if they didn’t I would have written to the help desk. I write to them all the time. In fact, I am sure they just groan when they see an email come in from me now :) But they have been very responsive to me in the past. I write to databases all the time and have had mostly similar experiences.

    And NCBI also has the GeneRif option, so I could have entered my mapping paper on the Gene page.

    Maybe I’ll do my “tip of the week” next time on editing the Gene Wiki :) I am learning.

  10. Andrew Su

    You’re right, the help desk and GeneRifs are good avenues to provide feedback to NCBI. But these routes also have their limitations. I don’t think NCBI’s page for reelin, for example, will ever be filled with free-text, images, and diagrams like the Wikipedia page is. Nor do they want it to be — Wikipedia is a different kind of resource, and I hope all scientists use it synergistically with other existing tools.

    … and indeed, you are learning, and learning fast at that! Even your edits to the infobox template (an “advanced” form of Wikipedia editing) were spot on…

  11. Pingback: Genomics resource blog roundup | The OpenHelix Blog

  12. Mary

    I just had a new issue. I was reading that new paper on autism genes ( http://www.sciencemag.org/cgi/content/abstract/321/5886/218 ). I went over to Wikipedia to be sure the new info got added to c3orf58. There’s no entry for that. I have no idea how to create one with the right format.

    I tried to create it with the appropriate gene ID, but it didn’t give me a PBB box.
    http://www.ncbi.nlm.nih.gov/sites/entrez?Db=gene&Cmd=retrieve&dopt=full_report&list_uids=205428&log$=databasead&dbfrom=nuccore

    If I could create it, will the bot find it and update my stuff next time?

  13. Andrew Su

    For now, the best thing to do is to use this tool: http://diberri.dyndns.org/cgi-bin/templatefiller/index.cgi?ddb=&type=hgnc_id&id=28490

    As you can see, if you enter an HGNC ID (in your case 28490), this tool will produce properly formatted “wikicode”. You can then cut-and-paste this code into the top of the page you are creating. Note that this is a slightly less rich protein box. We are working on creating a similar interface for our standard Gene Wiki pages.

    Hope that helps…

  14. Andrew Su

    I should also mention that we’ve created ~9000 gene pages in Wikipedia so far, and we started at the most well-cited genes in Pubmed and worked our way down. That’s why a page for c3orf58 wasn’t created earlier.

    Of course, that’s why we want to create that web interface, so that scientists whose favorite genes we’ve neglected can create the pages on their own…

  15. Mary

    Hmmm…ok, thanks–I’m with you on the code. But I can’t seem to figure out how to create a new page in Wikipedia.

    I swear–I read the manual :)

    The Wikipedia tutorial is ok if you want to learn how to edit existing pages, but I’m not succeeding with a new page yet…will continue to play in the sandbox….

  16. Andrew Su

    If you enter a page title in the search box in the left margin and hit “Go”, it will take you to a page with a “Create this page” link. Does that work? (Sorry, brief reply from my phone…)

  17. Pingback: Medicine 2.0 Carnival: Summertime « ScienceRoll

  18. Rhea

    I agree that these articles could be a great tool, not only for the field of genetics, but could possibly be expanded to multiple areas of biology.

    However, I am curious about how general editing might fair for these articles. New and innovative discoveries are always up for debate…and so I wonder if the editing feature may get over used, thus, making it more opinionated than it should??

  19. Andrew Su

    @Rhea, I’ll chime in my two cents, but of course, anyone else’s guess on how this issue will come out is as good as mine. I believe that there will be relatively few controversial pages. After all, we as scientists I think are used to summarizing the literature in a relatively reasoned way (in review articles, for example). And more often than not, even highly competitive areas of science tend to be over who publishes first, and not so much over competing data. The one example of contentious edits that I can think of in my 18 months on Wikipedia was in the field of innate immunity, where advocates of two camps were really fighting for control of a page. In that case, third-party mediators tried to step in to help reach a mutual agreement. Generally, Wikipedia guidelines strive to present an acknowledgment of the controversy, something like “The exact mechanism of this is unclear. Camp A believes such and such, and this view is supported by this and that data. In contrast, Camp B believes this other view, supported by this other data”. Such a summarization I think reduces edit warring, and I think it’s scientifically the best thing to present to readers.

    This I think is how it’s supposed to work in theory — we’ll see how it comes out in practice…

  20. Mary

    Andrew–one of my concerns isn’t that labs will fight with each other. I’m concerned that the general public has certain issues that are complete flamewars around certain health topics. Read the talk here:
    http://en.wikipedia.org/wiki/Talk:Autism
    This article has to get locked because of vandalism and controversy.

    What if there is a paper with genes linked to sexual orientation? Do you think that’s going to easy to keep track of?

  21. Martin

    Yet, in spite of the controversy over autism, that article is a very informative, complete and accurate Wikipedia article. Perhaps it is because of the controversy that this is such a good article, the crowd self-corrects.

  22. Andrew Su

    @Mary, to be sure, there are horrible edit wars all the time on biology/health related articles. Evolution and homeopathy come to mind as particularly egregious examples. I think gene pages will be a little more in the background, but I think your example above is a pretty good hypothetical case. My belief on Wikipedia is that regardless of the short-term peaks and valleys due to edit warring, the long-term trend is toward a complete and accurate article. *And* imperfect, as they are, Wikipedia’s governance usually gets things right (e.g., http://en.wikipedia.org/wiki/Wikipedia:Requests_for_arbitration/Homeopathy).

    Of course, as we noted in the paper, there are other wiki models that take a different point on the “openness of editing” spectrum. For example, many hypothesize that requiring editors to edit under their real name leads to better behavior. So if the WP effort ultimately stagnates or fails, then it would be good to explore other implementations of the same “community intelligence” model.

  23. Mary

    It is because that article is locked that it isn’t full of nastiness. That’s my point. I’m glad someone has time to monitor that, but I don’t have time get into those kinds of flamewars and do those cleanups. I’m glad someone does. But if people are casually coming in, doing a few genes, and not doing regular maintenance I think some genes could get full of anti-science, woo, and in some cases real vitriol.

  24. Andrew Su

    … and Martin raises a great point. I heard an interesting anecdote recently from an IT person at calit2.net. They tried to get all of their faculty to write their own biosketches for the institute website, and of course, compliance was very spotty. So they made it into a research project by taking each investigator’s CV and publications and applying natural language processing to take an automated guess at a biosketch. Needless to say, NLP is far from perfect and so the text wasn’t very good, but they posted these NLP-bios nonetheless. And now that each user was faced with a highly imperfect and incomplete version of something they cared deeply about (their own bios), investigators were much quicker to correct and add new content.

  25. Pingback: Another Wiki, WikiPathways | The OpenHelix Blog

  26. Rhea Miller

    Well, I must say I am very excited to see how things end up. Besides, people know to take things in wiki with a grain of salt…most use the information to initiate their searches…so i’m definitely optimistic.

  27. Pingback: The Spittoon » Gene Wikiality

  28. shawn

    I heard an interesting anecdote recently from an IT person at calit2.net. They tried to get all of their faculty to write their own biosketches for the institute website, and of course, compliance was very spotty. So they made it into a research project by taking each investigator’s CV and publications

  29. Pingback: The Spittoon » Author! Author!

  30. Pingback: Personal Genomes, Candidates, and the NYT | The OpenHelix Blog

  31. Pingback: Paper compares interaction databases | The OpenHelix Blog

  32. Pingback: Community Annotation; Beyond Reference Genomes | The OpenHelix Blog

Comments are closed.