data is not in the papers (nor is analysis): WebMedCentral & post-publication review

WebMedCentral is a “post publication peer review” publication. The purpose of  the site is for the fast, open, transparent  and free dissemination of biomedical data. The process is to publish your research and the the review process happens after publication. You can see more about this on their site’s peer review policy. I can see the value in this type of model, but I also see the serious issues involved. Two that come to mind (and mentioned in that previous link) is review quality and author response. Does worthy research get the review quality it requires and do the authors respond adequately to review criticism? Looking around at the site, my first answer might be “no, not really” for most of the articles. Much like wiki’s and other community sourced sites, the quality relies heavily on the number and expertise of the viewership. Not impossible but a huge hurdle.

That said, there is some pretty good data there as evidenced by “RNA Structures Affected by Single Nucleotide Polymorphisms in transcribed regions of the Genome” by Andrew Johnson, Heather Trumbower and Wolfgang Sadee *.

As of this post, there has not been any reviews of this article. Which is indicative of many possibly useful articles.I’ve read the article and found it interesting. When I get a moment, I plan to take a closer look and possibly review. But that’s part of the crux of the matter, incentive for review and for author revision. It’s there for pre-publication review, but for post-publication. My incentive is that I know and trust the authors and find the research interesting, but beyond that…? Or is that enough?

I was also pointed to these two reviews of the site and post-publication review: What is WebMedCentral? by Journalology and Wiki-Science and Moliere’s Beast from FASEB Journal. The latter is much more critical (to say the least) of the possibility of WebMedCentral and “wiki-publication.” I have to say, I’m not sold that this model will work. Though I found the latter editorial pushed the point a bit much with this:

WebmedCentral promises new discoveries in biomedical science; and its venture into Wiki publication fulfills that promise. One finds on its site that smelling one’s feet can prevent epileptiform seizures (9), that vehicular accidents might induce fibromyalgia (10) and that beachgoers on Cancun have “a very high percentage of sunscreen use” (11). One can also learn about “Uner Tan syndrome” (quadripedal gait) from the evolutionary biologist who modestly named the syndrome by his own name: Uner Tan (12).

As mentioned and linked above, there is relevant and scientifically valid data to be found there. As expected with any ‘open’ system, there is also detritus. I believe our current pre-publication review system the best system among a bunch of bad ones, but the FASEB editorial does forward on one criticism of the system that I have yet to find an answer for:

The most thorough argument for such a sea change appeared in a PLoS article by Young et al. entitled: “Why current publication practices may distort science” (18). They correctly describe the “extreme imbalance,” between the availability of excess supply (the growing global output of biomedical science and clinical studies) and the limited demand for that output by a coterie of reputable scientific journals. The result is that only a small proportion of all research results are eventually chosen for publication, and these results do not truly represent the larger body of results obtained by scientists world-wide. They argue that

… the more extreme, spectacular results (the largest treatment effects, the strongest associations, or the most unusually novel and exciting biological stories) may be preferentially published. Journals serve as intermediaries and may suffer minimal immediate consequences for errors of over- or mis-estimation (18)

This situation results in what economists who study auction behavior call “The Winner’s Curse.”

My colleague and co-blogger has written about this from a different angle, but it highlights the problem, “The data is not in the papers any more, you know.”  As she states:

I was also recently using the International Cancer Genome Consortium site’s new BioMart interface at their Data Coordination Center.  With their recent update they added some new features, I was using the new view of “Affected Genes” on that page. I picked a cancer type, I loaded up the Protein Coding genes, and there I was looking at the genes that had been repeatedly found to be affected in patient after patient. Some of the genes were not a surprise, certainly. But I sat there looking at data that a lot of people don’t know about–because it’s not in the papers yet. And it may not be for a long time.

Or ever. I find myself coming across data that might be interesting, conclusions that are useful and possible analysis that would add to the general understanding (if ever so slightly).

There is a deluge of data, even a deluge of analysis and a limited number and bottleneck of review.

I’m not sure WebMedCentral or like publications, sites or wikis are the answer, but there needs to be one.

*disclosure, Mr. Johnson has written for us on this blog before, and we know Heather Trumbower, this is how I knew of the article. ANd if you have a chance, go review the article :D.

Tip of the Week: BioGPS for expression data and more

This week’s tip introduces BioGPS, or Gene Portal System. We get a lot of questions about two things that BioGPS can help you to tackle: what do I do with a list of genes to find out what they are? And the next question people have after that is: and where are they expressed? BioGPS can help you with both of those problems. It is a tool that integrates and displays many types of data that researchers would be interested in. It also allows you to customize your display with the types of data that are most relevant to you–using their extensive plug-in collection. And it can do so from your browser, or access the basic portal from your iPhone!

Recently there was a question at BioStar about ways to quickly access some human gene expression data. The top rated answer over there was BioGPS, so we thought we’d provide a look at the kinds of things available to users via BioGPS. This 5-minute movie introduces some of the features.

Basically you can search for a gene or a list of genes, you can search with various types of IDs, you can search by keyword, or you can even search by genomic intervals. Your resulting list will quickly link you to all kinds of information from expression data, to annotation details and wikis, and more.  The results are provided in a handy default view with panels of information which may offer what you are looking for.

But you can go further with BioGPS using their customization and plug-in features. You aren’t tied to the default view. The system offers plug-ins: other tools can pipe their information over to BioGPS so you can use it within that framework. You can  register/create a login and then store views that are suited to your research needs.

At the time they wrote the paper provided below, they already had over 150 plug-ins available. As I write this today there are nearly 400 things you could bring in to supplement the views of the genes you are interested in. And the range of plug-ins is tremendous: interaction data sets, SNPs, phylogenetic data… The Figure 2 in their paper gives a partial list of the plug-ins at that time, and the categories they highlight include: literature searching (such as PubMed, iHop, patents, more), gene portals (such as Entrez Gene, UniProt, Gene Cards, more), genetics (dbSNP, HapMap, HuGE, more), pathway tools (KEGG, Reactome, STRING, more) and even reagent providers. But there are more now, and it looks like more will continue to be developed and added. It really depends on what you need and want to display for your searches. You can browse around or search the plug-in collection to explore what’s available to view.

There are other tools you can use to explore expression data specifically. We like the UCSC Gene Sorter for some types of queries. Of course the large repositories of GEO and ArrayExpress can offer expression data as well. But for some users the BioGPS portal may offer integration and customization features that will suit their research needs. Go over and check it out. Register, set up some views, and you’ll be finding all sorts of useful annotations for your genes or regions of interest.

Just to also quickly mention: you can do searches from your lab bench, or from seminars, with the iPhone version of BioGPS as well. I didn’t have time to cover that in the movie but there’s more information over at their site about the tool. I’ve got mine installed and I’ve found it handy during talks!

BioGPS homepage: http://biogps.gnf.org/ EDIT: has moved: http://biogps.org/

BioGPS iPhone app: http://biogps.gnf.org/iphone/

Wu, C., Orozco, C., Boyer, J., Leglise, M., Goodale, J., Batalov, S., Hodge, C., Haase, J., Janes, J., Huss, J., & Su, A. (2009). BioGPS: an extensible and customizable portal for querying and organizing gene annotation resources Genome Biology, 10 (11) DOI: 10.1186/gb-2009-10-11-r130

Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

Community Annotation; Beyond Reference Genomes

I’m catching up with some mailing lists and news and I came across this interesting tidbit from our friends in the GMOD community. We are huge supporters of curation by humans for a couple of reasons: 1) we know the quality of that information can be the best and it captures so much of the information biologists need beyond sequence info; and 2) some of us have done curation and we know that it’s underappreciated but far from trivial :) .

We’ve also followed attempts by various groups to get the wider community involved to do a couple of things–to get authoritative and active researchers to put in stuff they know, and to reduce the burden on the overwhelmed professionals.  There have been a variety of ambitious attempts to get people involved in curation. UCSC has a wiki they rolled out. Some journals required Wiki updates. Seeding Wikipedia with some information and encouraging community input has been attempted. Separate new wikis on some topics have been initiated like WikiPathways.  We have been “skeptical optimists” about how some of these would go. We understand the need–but we know that end users of data are busy, they don’t get any work-related credit or time to do this sort of thing, and sometimes they don’t understand the finer points of curation.  But we like to see how the efforts work out and we’d like to see success.

Well–I’ve seen some results on various efforts that you ought to see. The GMOD community recently had a Community Annotation meeting that brought several groups together to discuss their experiences and outcomes. I’m not going to give it away–you need to go read it. One group had a 90% success rate with a strategy they attempted!!  Some groups are using curation as a student project. Others report on things they’ve tried that haven’t had as much success. Anyway: it’s all very interesting to know about–what works and what doesn’t.  And what about communities that don’t have MOD (model organism database)? They touched issues on that too.

There was another meeting too that took on a separate topic: Post Reference Genome Tools.  The premise is this:

How are we going to visualize and exploit (or even cope with) the world three years from now, when small labs may be able to fully sequence 500 individuals or species (or more) in a month? How can we visualize and link together 500, 1000, or 10,000 genomes? Many existing tools assume a reference genome. Will a reference make sense in the future, or will it hold us back?

We know a lot about the volume of data that we’ve already got that so many people aren’t aware of.  As I was just saying the other day: the data’s not in the papers anymore. It’s in these databases and it’s up to you and me to find and deal with it.  But how will the data providers offer it to you? These folks are thinking about this–and it may alter the way you interact with the data.  Again, I’m not giving it away: go read the report.

Thanks for the GMOD community for doing these reports. They are nice to have, and for those of us who can’t be there they offer a really helpful look inside.

Community Annotation Satellite Meeting Report

Post Reference Genome Tools Satellite Meeting Report

GBrowse: http://gmod.org/wiki/Gbrowse

Tip of the Week: UCSC wiki annotations


In the continuing effort to get scientists and researchers to annotate and curate data and to capture the huge amount of knowledge available, UCSC Genome Browser has added a wiki annotation track to the browser. It’s not the first effort of course, GeneWiki is an effort, with mixed results so far, to annotate gene function information as a community exercise using Wikipedia. Some journals are requiring wiki entries, and several databases have opened wikis for curation. Wikis could be a solution for capturing the exponentially increasing amount of data,

or they could be just another place for adding confusion… or both. I suspect out of the plethora the wikis coming available for annotation and curation of genomic data, something will stick and find that Goldilocks balance of a dedicated community, ease of use, usability, and other aspects that will be needed for this to work.

Perhaps UCSC Genome Browser has that balance. It will remain to be seen, but let’s get started. Today’s tip is introducing the new wiki track in the UCSC Genome Browser.

Gene Wiki?

ResearchBlogging.org PLoS Biology has an article out today entitled “A Gene Wiki for Community Annotation of Gene Function.” The article describes the authors attempts to create a comprehensive gene wiki of gene functions by ‘seeding’ Wikipedia with a foundation of ‘stub’ articles with information from existing databases (such as Entrez Gene). This foundation would then be built upon in Wikipedia fashion by community editing.

Wikification of Genbank

Speaking of Genbank’s 25th, a few weeks ago Science had a news piece “Proposal to ‘Wikify’ Genbank Meets Stiff Resistance.” Apparently, those in the Mycology research community have found many inaccuracies in the Genbank records and wish to see a change that would allow annotations to be made by the community:

a scheme like those used in herbaria and museums, where specimens often have multiple annotations: listing original and new entries side by side. It would be a community operation, like Wikipedia, in which the users themselves update and add information, but not anonymously.

But the idea is meeting resistance from Genbank’s Managers:

