Tag: literature

Tip of the Week: Word Add-In for Ontology Recognition

17 March, 2010 (08:07) | Tip of the Week | By: Jennifer

In today’s tip I want to make you aware of a tool that I think will help researchers to present their own data and publications in an accurate and universally searchable way. I learned of the resource (UCSDBioLit) through an article in one of my recent BioMed Central article alert emails. This resource allows authors to mark-up their own publications with XML tags AS THEY WRITE their papers. This will allow faster and more accurate semantic searching of their research.

A huge problem in science today is the ability to quickly search the vast literature base and to accurately and efficiently find the data that you are interested in. Here at OpenHelix we focus on ways of effectively and efficiently get information out of public databases and resources, but at the other end of the process is the ability for scientific knowledge to be curated into those resources. We have featured biocurators and the phenomenal work that they do several times in the past, but it is work that never ends and can be very labor intensive. It often involves an initial triaging of a field’s literature, some level of automatic information gathering, and then careful manual effort on the part of scientist at the resource to gather and present the information through their site. I know from personal experience that the process of reading a paper, clarifying research details with an author, and then presenting that information to the author’s satisfaction can be a very long & labor intensive process, for both the curator AND the original author.

For years there has been discussion of ‘expert curation’ in which experts in the field author review or summary pages in a resource, or community curation jamborees, etc. And there have been fruits from many of these efforts, but in general participation is low. But who is more of an expert on the research being published other than the author himself? If authors could/would mark up their own papers during the publication process, not only could they be assured that it would be accurate but they would help make their research universally searchable without the lag required for searchability through a specific resource. Thus far document mark-up is has not been an easy process and has largely been deemed ‘not worth the effort’ for the level of attribution/recognition affiliated with it.

The BioMed Central article does a nice job of outlining and discussing many of these issues. It cites many other efforts and resources, explains their motivation and the implementation of their software. A nice feature of the tool is that there are interoperability features, and a real commitment to conforming with existing standards of practice. The article also presents an appendix of resource addresses of other groups involved in semantic searching and literature publication. I especially like this quote from the paper:

The Word add-in presented here will assist authors in this effort using community standards and by making it possible for the author of the document, the absolute expert on the content, to do so during the authoring process and to provide this information in the original source document.

You can also find brief tutorials on using the tool at SciVee: Word Add-in for Ontology Recognition Tutorial (1 of 4): Install Process

As a note, literature mark-up and enabling are currently an active area – Mary found another literature handling resource and paper as well: Check out the tip, the articles & the tools. Tell me what you find/think. Thanks! (OH, and Happy St. Patty’s to ya!)

UCSDBioLit Reference:
ResearchBlogging.org
Fink, J., Fernicola, P., Chandran, R., Parastatidis, S., Wade, A., Naim, O., Quinn, G., & Bourne, P. (2010). Word add-in for ontology recognition: semantic enrichment of scientific literature BMC Bioinformatics, 11 (1) DOI: 10.1186/1471-2105-11-103

Tip of the Week: Managing & sharing references with Mendeley

20 January, 2010 (00:02) | Tip of the Week | By: Trey

This week’s tip is a bit off-topic (as in genomics databases), but it is science/biology-related and something we all need. There are a lot of reference management software possibilities out there like EndNote, some great web 2.0 social networking sites like Connotea (Nature Publishing) and CiteULike (Springer) and some PDF management tools. Mendeley wants to be all three. I like the idea a lot. Instead of having 2 or 3 separate applications, desktop and/or web, etc., you have one to rule them all. Of course EndNote has EndNote web, but it’s not free (Mendeley is free, and the features they offer now will stay free. They will offer new features for professional users later with a fee). You can export your references in Connotea and CiteULIke, but it’s an extra annoying step. My first experiences with Mendeley have been quite positive, so I thought I’d introduce them to you here.

Impact Factor

18 December, 2009 (16:07) | General Science, Genomics Research | By: Trey

I remember considering the “Impact Factor” of journals when submitting research papers, and wondering what the impact factor of a specific paper I published might be out of curiosity. Not particularly seriously, my field was narrow enough in my Ph.D. research that there were just a few journals to even consider, so it was usually pretty simple choosing. And for individual articles, I am pretty sure I knew the 4 people in the world outside my lab that were interested in my research (I jest, a little). During my postdoc, my PI was pretty good and choosing journals based on the article, the journal’s audience… and impact factor.

But impact factor measuring has it’s issues (Article-Level Metrics and the Evolution of Scientific Impact, Neylon and Wu. PLoS Biol 7: e1000242), and there is always a search to measure the impact of journals and individual articles better, or at least differently. Well, one of my favorite science sites and one of my favorite journal publishers ResearchBlogging.org and PLoS, have worked together to measure the impact of journal articles. PLoS has a lot of metrics to see what the ‘impact’ of an article might be, and now they’ve added a metric to see how many times it’s been written about on blogs using blog aggregators like Postgenomic, Blog Lines and Nature Blogs, and now ResearchBlogging.

I like the partnership with ResearchBlogging specifically because whereas the other blog aggregators are not necessarily picking up articles that discuss the science of the article (Postgenomic) or aggregate only a subset of science blogs out there (Nature Blogs), ResearchBlogging is specifically blogs posts discussing the research of  peer-reviewed articles.

Of course I don’t find this particularly useful to compare one article against another (the best articles aren’t always written about, and those that are might not be in the blog aggregators), but I do think this will be great way to carry on the conversation and dig deeper into the research topic.

You can view that metric at PLoS of any article, for example the one I link to above, click on the “metric” tab, scroll down a bit until you see the heading “Blog Coverage.” For that article, you’ll see two ResearchBlogging posts (as of this writing), a metric for this paper about metrics :) .

Tip of the Week: Fable, text mining for literature on human genes

18 November, 2009 (10:57) | Tip of the Week | By: Trey

fable_thumb A couple of weeks ago we brought you a tip of the week about the CHOP CNV Database. The same people who bring you that database also do FABLE (Fast Automated Biomedical Literature Extraction), a literature mining tool. The tool uses an advanced algorithm to find Human genes that are directly related to the keywords search on and then find literature on those genes. The tool has some great features and is a great way to quickly find  the literature of a gene of interest. Today’s tip will give you a quick intro to the tool.

(re)Funding Databases II

16 November, 2009 (17:52) | Genomics Research, Genomics Resource News, New Resource | By: Trey

ResearchBlogging.orgSo, I wrote about defunding resources and briefly mentioned a paper in Database about funding (or ‘re’funding) databases and resources. I’d like to discuss this a bit further. The paper, by Chandras et. al, discusses how databases and, to use their term, Biological Resource Centers (BRCs) are to maintain financial viability.

Let me state first, I completely agree with their premise, that databases and resources have become imperative. The earlier model of “publication of experimental results and sharing of the reated research materials” needs to be extended. As they state:

It is however no longer adequate to share data through traditional modes of publication, and, particularly with high throughput (‘-omics) technologies, sharing of datasets requires submission to public databases as has long been the case with nucleic acid and protein sequence data.

The authors state, factually, that the financial model for most biological databases (we are talking the thousands that exist), has often been a 3-5 year development funding, that once runs out, the infrastructure needs to be supported by another source. In fact, this has lead to the defunding of databases such as TAIR and VBRC (and many others), excellent resources with irreplaceable data and tools, that then must struggle to find funding to maintain the considerable costs of funding infrastructure and continued development.

The demands of scientific research, open, shared data, require a funding model that maintains the publicly available nature of these databases. And thus the problem as they state:

If, for financial reasons, BRCs are unable to perform their tasks under conditions that meet the requirements of sceintfic research and the deamnds of industry, scientists will either see valuable information lost or being transferred into strictly commercial environment with at east two consequences: (i) blockade of access to this information and/or high costs and (ii) loss of data and potentioal for technology transfer for the foreseeable future. In either case the effect on both the scientific and broader community will be detrimental.

Again, I agree.

They discuss several possible solutions to maintaining the viability of publicly available databases including a private-public dual tier system where for-profits paid an annual fee and academic researchers have free access. They mention Uniprot, which underwent a funding crisis over a decade ago, as an example. Uniprot (then Swissprot) went back to complete public funding in 2002. There are still several other databases that are attempting to fund themselves by such a model. BioBase is one where several databases have been folded. TransFac is one. There is a free, reduced functionality, version that is available to academics through gene-regulation.com and the fuller version for a subscription at BioBase. This former version allows some data to be shared, as one could see at VISTA or UCSC. I am not privy to the financials of BioBase and other similar models, and I assume that will work for some, but I agree with the authors that many useful databases and resources would be hard-pressed to be maintained this way.

Other possibilities include fully  including databases under a single public institution funding mechanism. The many databases of NCBI and EBI fit this model. In fact, there is even a recent case of a resource being folded into this model at NCBI. Again, this works for some, but not all useful resources.

Most will have to find variable methods for funding their databases. Considering the importance of doing so, it is imperative that viable models are found. The authors reject, out of hand, advertising. As they mention, most advertisers will not be drawn to website advertising without a visibility of at least 10,000 visitors per month. There might be some truth to this (and I need to read the reference they cite that use to back that up).

But the next model they suggest seems to me to have the same drawback. In this model, the database or resource would have a ‘partnership of core competencies.’ An example they cite is MMdb (not to be confused with MMDB). This virtual mutant mouse repository provides direct trial links to Invitrogen from it’s gene information to the product page. They mention that though 6 companies were approached, only one responded. It would seem that this model has the same issues as directly selling advertising.

They also mention that, at least for their research community of mouse functional genomics, “Institutional Funding” seems the best solution for long-term viability and open access. Unfortunately, until institutions like NIH and EMBL are willing or able to fund these databases, I’m not sure that’s thats a solution.

As they mention in the paper, the rate of growth of the amounts and types of data that is being generated is exponential. I am not sure that government or institutional funding can financially keep up with housing the infrastructure needed to maintain and further develop these databases so that all the data generated can remain publicly and freely accessible.

Information is should be free, but unfortunately it is not without cost. It will be interesting to see how funding of databases and resources evolves in this fast growing genomics world (and imperative we figure out solutions).

PS: On a personal note, the authors use their resource, EMMA (European Mouse Mutant Archive), as an example in the paper. I like the name since it’s the name of my daughter, but it just goes to prove that names come in waves. We named our daughter thinking few would name their daughter the same. When even databases name the same name, you know that’s not the case.

Chandras, C., Weaver, T., Zouberakis, M., Smedley, D., Schughart, K., Rosenthal, N., Hancock, J., Kollias, G., Schofield, P., & Aidinis, V. (2009). Models for financial sustainability of biological databases and resources Database, 2009 DOI: 10.1093/database/bap017

Open Access Publishing… funding it

23 September, 2009 (12:49) | Genomics News | By: Trey

Five universities, Harvard, Cornell, Dartmouth, UC Berkeley and MIT, have compacted to support open-access publishing by funding publishing fees. Many open access journals, because the do not charge readers, use the model of charging for publication. This could be a barrier to publishing in an OA journal, so the compact:

…supports equity of the business models by committing each university to the timely establishment of durable mechanisms for underwriting reasonable publication fees for open-access journal articles written by its faculty for which other institutions would not be expected to provide funds.

Open access isn’t free, someone has to pay for it.. the provider, the user or some other model. I, personally, like the idea behind open access publishing of research. I believe it can be one model, among several, to make access to research free and available to help advance further research. Two years ago, the Consolidated Appropriations Act made it a requirement that NIH-funded research be published in such a manner that it was open after a certain period of time, a boon to open access publishing. Publishers of all stripes are attempting to develop ways to make research available to researchers, and pay for it.

I look forward to seeing how these universities underwrite those fees and which other universities join the compact.

Tip of the Week: PLAN2L for Arabidopsis literature

19 August, 2009 (09:22) | Genomics Research, New Resource, Tip of the Week | By: Mary

plan2L_jingFor this tip of the week we look at a text-mining tool for the Arabidopsis literature, Plan2L, or PLant ANnotation to Literature.  It has a very straightforward interface that permits searching of the paper space, and you can do that with a variety of focal points: the bibliome as a whole, or with emphasis on interactions, regulation, cell cycle, and more.  The results offer links to the PubMed abstracts, and tabular results of the statistics of the term occurance in that area of focus.  Green results indicate positive scores and likely relevance, red are likely to be non-relevant, a graphical guide to quickly finding the data of interest. Links to other resources including the BioCreative server, WikiGenes, iHOP and TAIR are provided as well.

The current emphasis for this resource is Arabidopsis, but it would be quite useful for other species too.  If you are interested in text mining Arabidopisis I would also encourage you to compare the results with the Textpresso installation at TAIR to see what you discover in a different text miner interface as well.

Plan2L site: http://zope.bioinfo.cnio.es/plan2l/plan2l.html

For their recent paper on Plan2L see: http://www.ncbi.nlm.nih.gov/pubmed/19520768 or the full article freely available in PubMedCentral:  http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=19520768

iPhone and research

24 July, 2009 (15:41) | General Science | By: Trey

Ok, so I just got my new iPhone 3Gs. I couldn’t resist. Anyway, my contract on my first generation iPhone was up. So, it was time to reconfigure and explore the huge number of apps out there for the iPhone.

I use the iPhone for a lot of things, directions, finding out what stores are in the area, keeping my grocery list, listening to music, watching shows, browsing the web, keeping my calendar and contacts and a bunch more. Oh, and to make and receive phone calls :) .

I’ve read past posts on other blogs about scientific apps for the iphone, I decided it was time to check out what apps there are now.

I’ve found a few I like, some that might work (I do computational genomics now, so I haven’t tried the ones for the bench), and one that has nothing to do with biology (directly anyway), but I am in love with. Follow me below the fold.

Click to continue reading “iPhone and research”

Tip of the Week: It's a duplicate

22 July, 2009 (00:01) | Tip of the Week | By: Trey

dejavu_thumbThere are a lot of research papers out there, more than ever. Along with the good news (increasing knowledge), comes some bad news: increasing duplication and plagiarism, more often than not going undetected. The developers of eTBLAST, which is a great tool we’ve had a tip on before, have created another tool using an eTBLAST search of Medline and other databases to find highly similar citations: Deja Vu.

These similar citations could be legitimate; a review of a previous article, an author using similar wording of an abstract from a previous paper for new research (the eTBLAST search can only search titles and abstracts), sanctioned duplications, etc, etc. as the author of the post “Deja Boo” points out. There are some real instances of duplications (authors attempting to pad their CVs) and plagiarism (stealing words and research). An earlier example (before Deja Vu) found at Panda’s Thumb is of a creationist attempting to pad a CV and look more legitimate. Errami and Gardner (two of the developers of the tool) published a paper in Nature earlier this year with many such instances of (and another in Science, reported on here with some interesting discussion) duplication and plagiarism.

Still, the database needs to be viewed with caution. Of the 74,792 ‘highly similar and duplicate citations’ found, 92% have not be verified. Of the 8% left that have been verified (this has to be done by manual curation), 65% have been found to be probably legitimate (as stated above) and 35% to be duplicates. But even the duplicates aren’t necessarily nefarious. Since full texts are not available, it is often the case that the duplication might be perfectly understandable (reusing an abstract with some minor changes for new research, etc). Still, it is a tool that, with some work, can help tremendously in that search for true duplicates and plagiarism, and perhaps even just the threat of it might lower the instances?  :D

So, with that in mind, this week’s tip of the week is a quick view of “Deja Vu.”

Adventures in publishing

12 December, 2008 (18:22) | General Science | By: Trey

A new open access journal, Ideas in Ecology and Evolution, has, well, opened. It’s published at Queen’s College in Canada.

“So?” you ask, “there are lots of journals up starting all the time”.

This one is different. It’s experimenting with a lot of things (ok, so there seem to be a lot of journals experimenting with the model lately). The subject matter is not research per se, but ideas. Having been to my share of ecology and evolution conferences and discussion, I can see this journal has opened itself up to some quite lovely discussions.

As explained by Bob O’hara, there are some interesting review process experiments going on here too. Authors pay to get their ideas published, reviewers are paid, reviewers are not anonymous and they get to publish their views of the article as a companion piece. Bob discusses the issues we’ve all heard about the pros of anonymity (and they are valid ones), but this might work in this case. I also agree with Bob on one point, this structure (reviewers publishing their views) will indeed increase discussion, but I’d too like to see some mechanism for a broader discussion. As it is designed now, it will be like watching TV pundits arguing the finer points of health policy, which I guess is informative, but I’d like to see some mechanism that allows a broader discussion of the article. Something like PLoS has, which I think would actually work better in a journal of ideas like this.

Well, we’ll see. Right now there is nothing there but the editorial. I’ll be watching though.

hat tip: Coturnix