What’s The Answer? (authorship heartbreak)

This week’s highlighted discussion generated a lot more chatter than many of them do. I didn’t see that coming at this site. But apparently it’s an issue that many groups have faced and had opinions on: what is the relative contribution of the wet lab vs bioinformatics side when it comes to the paper, and how to recognize that?

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Thursdays we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

Forum: Do bioinformaticians often break molecular biologists’ hearts by being the first author?

Recently I knew a molecular biologist postdoc who was upset since she was placed by her boss as a second first-author in a paper that she wrote from scratch. Her boss put a bioinformatician/biostatistician as the first first-author, with a reasoning to the postdoc that she would not have anything if they take out the data processing part.

And it turned out that she is not alone. From this postdoc’s facebook, I learned that some of her friends admitted that they also experienced a very similar situation: being a molecular biologist, did all the labworks, wrote the paper, but put as a second author while the first one is a bioinformatician.

Apart from politics in the lab etc., I wonder if this is really quite common. If it is, don’t you think it’s quite unfair for the molecular biologist?


It’s hard to assess from an anecdote what the real story was. I’m sure it was painful for folks involved. And, of course, at a heavily bioinformatics site, popular answers pointed out that often the bioinformaticians are the ones feeling neglected. But I thought it was interesting to highlight the tensions between folks on teams like this. And some pro-active awareness of this for people running labs might be in order.

It generated quite a range of comments. Some aimed at the publishing in general, some project related. Have a look.


Video Tip of the Week: BioRxiv, A preprint server for biology

 Open access to scientific research has been advocated for a long time, even before the advent of the internet. With the internet, the movement grew. Many open access journals, NIH now requires NIH-funded research to be open access within a year of publication, NSF and other agencies are working on similar plans.

A part of that movement to open access to research, “preprint” has also grown. The preprint of scientific research allows for the fast dissemination of research and the open discussion of results before they’ve gone through peer review. Peer-reviewed research can take weeks, months and sometime years to be publicly disseminated through publication. In a modern world of fast-developing and changing science, preprint distribution allows for a much faster access to research

The most well-known preprint server is arXiv. Started in 1991 at the Los Alamos National Laboratory and moving to Cornell University Library, arXiv allows for the open access preprint dissemination of physics, mathematical and computer science research. By many standards, it has been a success in getting research quickly and openly discussed.

There have been previous attempts for such biological research preprints in the past and currently: Nature Precedings (which ceased accepting new manuscripts in 2012), PeerJ Preprints and others.

That a traditional publisher such as Nature (Nature Precedings) had made a foray an open access preprint repository, goes to the need and demand for such as service. The rise of biological research preprints in arXiv has grown rapidly.biorxiv Just last year, a case was made in PLOS Biology for just such a server for life sciences research by Desjardins-Proulx et al.

The first and most often discussed advantage of open preprints is speed. The time between submission and the official publication of a manuscript can be measured in months, sometimes in years. For all this time, the research is known only to a select few: colleagues, editors, and reviewers. Thus, the science cannot be used, discussed, or reviewed by the wider scientific community. In a recent blog post, C. Titus Brown noted how posting a paper on arXiv quickly led to a citation (arXiv papers can be cited), and his research was used by another researcher. The current system of hiding manuscripts before acceptance poses problems for both scientists and publishers. Manuscripts that are unknown cannot be used and thus take more time to be cited. It has been shown that high-energy physics, with its high arXiv submission rate, has the highest immediacy among physics and mathematics.

And now we have it. Above you will find the promo video for a new life sciences open access preprint server: bioRxiv. Science has a introductory post about it from November 2013 (2 days after it was announced).

Like arXiv, bioRxiv is housed and run by Cold Spring Harbor. LIke arXiv, it is open access, preprint and has similar rules. You can learn more about the specifics (such as journal preprint policies) on the about page. Articles can be in most any life sciences topic from biochemistry to zoology, and other fields, such as physics, if the research has direct relevance to life science. It will not, however, publish medical research such as clinical trials.

Articles are placed into three categories:

Articles in bioRxiv are categorized as New ResultsConfirmatory Results, orContradictory ResultsNew Results describe an advance in a field. Confirmatory Results largely replicate and confirm previously published work, whereasContradictory Results largely replicate experimental approaches used in previously published work but the results contradict and/or do not
support it.

The biological research community has asked for it, and here it is. Currently, there are only 200 or so manuscripts submitted, a quick search of ‘retrovirus‘ brings up only 3 results. But, bioRxiv is only 6 months old. Keep an eye on it, better yet, test it out and submit.

Desjardins-Proulx, P., White, E., Adamson, J., Ram, K., Poisot, T., & Gravel, D. (2013). The Case for Open Preprints in Biology PLoS Biology, 11 (5) DOI: 10.1371/journal.pbio.1001563

data is not in the papers (nor is analysis): WebMedCentral & post-publication review

WebMedCentral is a “post publication peer review” publication. The purpose of  the site is for the fast, open, transparent  and free dissemination of biomedical data. The process is to publish your research and the the review process happens after publication. You can see more about this on their site’s peer review policy. I can see the value in this type of model, but I also see the serious issues involved. Two that come to mind (and mentioned in that previous link) is review quality and author response. Does worthy research get the review quality it requires and do the authors respond adequately to review criticism? Looking around at the site, my first answer might be “no, not really” for most of the articles. Much like wiki’s and other community sourced sites, the quality relies heavily on the number and expertise of the viewership. Not impossible but a huge hurdle.

That said, there is some pretty good data there as evidenced by “RNA Structures Affected by Single Nucleotide Polymorphisms in transcribed regions of the Genome” by Andrew Johnson, Heather Trumbower and Wolfgang Sadee *.

As of this post, there has not been any reviews of this article. Which is indicative of many possibly useful articles.I’ve read the article and found it interesting. When I get a moment, I plan to take a closer look and possibly review. But that’s part of the crux of the matter, incentive for review and for author revision. It’s there for pre-publication review, but for post-publication. My incentive is that I know and trust the authors and find the research interesting, but beyond that…? Or is that enough?

I was also pointed to these two reviews of the site and post-publication review: What is WebMedCentral? by Journalology and Wiki-Science and Moliere’s Beast from FASEB Journal. The latter is much more critical (to say the least) of the possibility of WebMedCentral and “wiki-publication.” I have to say, I’m not sold that this model will work. Though I found the latter editorial pushed the point a bit much with this:

WebmedCentral promises new discoveries in biomedical science; and its venture into Wiki publication fulfills that promise. One finds on its site that smelling one’s feet can prevent epileptiform seizures (9), that vehicular accidents might induce fibromyalgia (10) and that beachgoers on Cancun have “a very high percentage of sunscreen use” (11). One can also learn about “Uner Tan syndrome” (quadripedal gait) from the evolutionary biologist who modestly named the syndrome by his own name: Uner Tan (12).

As mentioned and linked above, there is relevant and scientifically valid data to be found there. As expected with any ‘open’ system, there is also detritus. I believe our current pre-publication review system the best system among a bunch of bad ones, but the FASEB editorial does forward on one criticism of the system that I have yet to find an answer for:

The most thorough argument for such a sea change appeared in a PLoS article by Young et al. entitled: “Why current publication practices may distort science” (18). They correctly describe the “extreme imbalance,” between the availability of excess supply (the growing global output of biomedical science and clinical studies) and the limited demand for that output by a coterie of reputable scientific journals. The result is that only a small proportion of all research results are eventually chosen for publication, and these results do not truly represent the larger body of results obtained by scientists world-wide. They argue that

… the more extreme, spectacular results (the largest treatment effects, the strongest associations, or the most unusually novel and exciting biological stories) may be preferentially published. Journals serve as intermediaries and may suffer minimal immediate consequences for errors of over- or mis-estimation (18)

This situation results in what economists who study auction behavior call “The Winner’s Curse.”

My colleague and co-blogger has written about this from a different angle, but it highlights the problem, “The data is not in the papers any more, you know.”  As she states:

I was also recently using the International Cancer Genome Consortium site’s new BioMart interface at their Data Coordination Center.  With their recent update they added some new features, I was using the new view of “Affected Genes” on that page. I picked a cancer type, I loaded up the Protein Coding genes, and there I was looking at the genes that had been repeatedly found to be affected in patient after patient. Some of the genes were not a surprise, certainly. But I sat there looking at data that a lot of people don’t know about–because it’s not in the papers yet. And it may not be for a long time.

Or ever. I find myself coming across data that might be interesting, conclusions that are useful and possible analysis that would add to the general understanding (if ever so slightly).

There is a deluge of data, even a deluge of analysis and a limited number and bottleneck of review.

I’m not sure WebMedCentral or like publications, sites or wikis are the answer, but there needs to be one.

*disclosure, Mr. Johnson has written for us on this blog before, and we know Heather Trumbower, this is how I knew of the article. ANd if you have a chance, go review the article :D.

Briefings in Bioinformatics – our education paper is available now

Back in April I happened to mention that we (OpenHelix) were writing a paper on informal sources of bioinformatics education (in a Friday SNPets item) and we were asked to announce when the paper came out. Well, we got word late last week that the article has been published. The article appears in a special issue of Briefings in Bioinformatics that is devoted to bioinformatics education. I’m not sure if all the articles in the issue are available yet, but it looks like several are in the journal’s Advanced Access area. Bioinformatics education is an area (obviously) that OpenHelix cares deeply about & we are anxiously awaiting our copies of the full issue so we can read all the articles, but I digress…

The title “OpenHelix: bioinformatics education outside of a different box” (if you hit a paywall, or have trouble accessing, we will gladly send a reprint. Just email the corresponding author, Jennifer listed in the abstract or ask from our contact link- Trey) was a cool suggestion from one of the article’s reviewers – my original title was much tamer (ok, more boring). Regardless of the final title, what we wanted to do in the article is to discuss informal sources of bioinformatics education. By education we do mean acquiring applicable information that allows a researcher to operate within the field of bioinformatics. By informal we mean outside of traditional, credit based classes and degrees. Essentially we provide a bit of the knowledge and know-how that we’ve gathered over years of working with hundreds of resources, thousands of workshop attendees, and countless online contacts about where a researcher, or librarian, or whoever can turn for various informational needs in the field of bioinformatics.

Our contention is that not everyone needs to program in order to manage and manipulate their biological data these days. There are SO many fine publicly available databases, algorithms, tools and more, it is just a matter of awareness and training for anyone to be able to reformat and analyze their personal data sets. We maintain that :

…bioinformatics education needs to do a minimum of four things:

1. raise awareness of the available resources
2. enable researchers to find and evaluate resource functionality
3. lower the barrier between awareness and use of a resource
4. support the continuing educational needs of regular resource users

In the paper we walk through each of these – we first describe example needs associated with the point, and then cover possible informal resources that meet the needs. The article includes tables of resources and links to them and many many references. We really hope that is a very useful resource in the field of bioinformatics education.  I am already looking forward to contributing to the next special education issue, both to hone my writing skills and to extend the information we can provide readers. Please do comment, email, whatever and let us know about the resources that you use, what you learned from the article, etc. Oh, here’s the citation info:
Williams, J., Mangan, M., Perreault-Micale, C., Lathe, S., Sirohi, N., & Lathe, W. (2010). OpenHelix: bioinformatics education outside of a different box Briefings in Bioinformatics DOI: 10.1093/bib/bbq026

Our Current Protocols paper on the UCSC Genome Browser is out!

We teamed with Bob Kuhn of the UCSC Genome Browser group to create a series of pragmatic, useful, and (we hope) effective examples of how molecular biologists can use the UCSC Genome Browser to make their benchwork time more efficient, and to represent their discoveries as custom tracks in the browser as well. And that paper is now out on the Current Protocols site:

http://www.currentprotocols.com/protocol/mb1909 (or the short version: http://bit.ly/413iNw)


Thanks to my co-authors for their huge contributions and helpful suggestions.  We are happy to take questions here, or over at the CP site.  Or if you have comments/suggestions for future updates we are open to that as well.  We are encouraged by the publishers to update that manuscript periodically, and if something isn’t flowing for you, or you need more information, be sure to holler.

FYI: we have one in the pipeline on Galaxy as well. We’ll let you know when that is available as well.  For now, check out the free tutorial on Galaxy here.

Full reference and abstract: http://www.ncbi.nlm.nih.gov/pubmed/19816931

Curr Protoc Mol Biol. 2009 Oct;Chapter 19:Unit19.9.

The UCSC genome browser: what every molecular biologist should know.

Mangan ME, Williams JM, Kuhn RM, Lathe WC 3rd.

Electronic data resources can enable molecular biologists to query and display many useful features that make benchwork more efficient and drive new discoveries. The UCSC Genome Browser provides a wealth of data and tools that advance one’s understanding of genomic context for many species, enable detailed understanding of data, and provide the ability to interrogate regions of interest. Researchers can also supplement the standard display with their own data to query and share with others. Effective use of these resources has become crucial to biological research today, and this unit describes some practical applications of the UCSC Genome Browser.

PMID: 19816931