Tag Archives: xplormed

Mining figure legends. Huh.

Every so often something comes up in your weekly literature search that makes you go: huh. That happened to me today with a paper on text mining. Now, I have used a variety of text-mining tools (Textpresso, iHOP, PubMatrixXplorMed, etc are among the ones we have subscription tutorials on) and they have all sorts of strengths and weaknesses. And I’m convinced of the utility of them for making new connections, finding related literature, examining over-represented terms, etc. Because of gene nomenclature issues they haven’t always been quite as effective as I’ve always wanted for different sorts of interaction data that I’d love to be able to extract from the literature. That’s still best done by professional curators, IMHO.

When I saw this paper, though, I thought–yeah, figures and figure legends. There could be some real utility there. And I wondered if the mining tools I’ve been using take the figure legends into account? And then it also led me to wonder about the supplemental materials that are becoming so crucial (and overwhelming) from these “big data” projects?  It was one of those realizations that you don’t know what you aren’t looking at….

So this specific paper took thousands of figures from a variety of publications, and mined them:

According to our pathway definition described in the previous
section, we manually checked the 75,350 figures and identified 375
pathway figures to be positive data. Another 11,251 figures other
than pathway figures were randomly selected as negative data.

There were a lot of pieces of the regular text mining strategies (stemming*, decisions trees, weighting, etc). The details of this are provided. And their method is supposedly novel by combining figure text  and the paper body–which gives them improved results for for figure information. But for me the issue was just the awarenesses of 1) the potential value of figures and legends, and 2) the fact that in other text mining tools I’m using I don’t know if those data are in there.

Like all paper components, the quality and depth of figure legends vary, of course.  But it did strike me that especially for pathway data people might assume the figure conveys a lot of useful information that might not be explicitly stated in the body of the paper.

As far as I can tell there’s no web interface around this. One link in the paper that was supposed to have some more info is currently 403, so I’ve written to the team. Their introduction also led me to a different tool called FigSearch that sounded like a web interface for a similar type of analysis, but that doesn’t seem to be available any more. Such is the world of software….sigh.

But still: I like it when a paper gives me a realization that I need to think about what I’m not seeing when I’m using software.  It’s an easy thing to forget.


Ishii, N., Koike, A., Yamamoto, Y., & Takagi, T. (2010). Figure classification in biomedical literature to elucidate disease mechanisms, based on pathways Artificial Intelligence in Medicine, 49 (3), 135-143 DOI: 10.1016/j.artmed.2010.04.005


*The stemming example cracked me up. It appeared to be partially LOLcat: “This algorithm removes suffixes from words and leaves the stem (e.g., pathway or pathways becomes pathwai).”

New and Updated Online Tutorials for GoMiner and XplorMed

Comprehensive tutorials on the publicly available GoMiner and XplorMed databases enable researchers to quickly and effectively use these invaluable resources

Seattle, WA (PRWEB) February 16, 2010 — OpenHelix today announced the availability of a new tutorial on GoMiner, and an updated tutorial suite on XplorMed.

GoMiner is a set of publicly available tools that can enable you to ascribe biological significance to large lists of genes by annotating them with their corresponding Gene Ontology, or GO, categories. XplorMed is a public web-based tool which allows you to access a text-mining algorithm that can improve upon standard PubMed searches by mining abstracts for word frequencies and combinations. These two tutorials, in conjunction with OpenHelix tutorials on PubMatrix, STRING. Gene Ontology and Textpresso, give you a set of training resources help you be efficient and effective at text and literature mining.

With these tutorials, researchers can quickly learn to effectively and efficiently use GoMiner and XplorMed.

The tutorial suites, available through an annual OpenHelix subscription, contain an online, narrated, multimedia tutorial, which runs in just about any browser connected to the web, along with slides with full script, handouts and exercises. With the tutorials, researchers can quickly learn to effectively and efficiently use these resources. The scripts, handouts and other materials can also be used as a reference or for training others.

These tutorials will teach users:


*to use both the downloadable GUI and web-based High-Throughput GoMiner tools
*to understand and manipulate your GO annotated data
*to construct beautiful visuals to display and present your results clearly


*how to extract fascinating relationships among PubMed abstracts
*to use stored sets of abstracts to discover new information
*to start with identifiers of interest and collect relevant abstracts for further examination
To find out more about these and over 85 other tutorial suites visit the OpenHelix Catalog and OpenHelix. Or visit the OpenHelix Blog for up-to-date information on genomics and genomics resources.

About OpenHelix
OpenHelix, LLC, (www.openhelix.com) provides a bioinformatics and genomics search and training portal, giving researchers one place to find and learn how to use resources and databases on the web. The OpenHelix Search portal searches hundreds of resources, tutorial suites and other material to direct researchers to the most relevant resources and OpenHelix training materials for their needs. Researchers and institutions can save time, budget and staff resources by leveraging a subscription to nearly 100 online tutorial suites available through the portal. More efficient use of the most relevant resources means quicker and more effective research.

Peer Bork wins 2009 award

Royal Society and Académie des sciences Microsoft Award was won by Peer Bork this year. The award is funded by Microsoft (250,000 euro) and is given to

recognise and reward scientists working in Europe who have made a major contribution to the advancement of science through the use of computational methods.

It was awarded to Peer Bork for his work on the human microbiome. Peer definitely deserves it, as does his lab.The science and scientists that come from the Bork group are stellar. Ok, so I have a personal interest in this: I worked in his lab for 4 years, from 1999-2003. It was one of the best experiences (science and personal) of my life. Also, BioByte Solutions, started by a Bork lab researcher, has helped put together our new free database and resource search (which we’ll be introducing next week).

Congratulations Peer! Now, what is he going to do with that 368,000 dollars?!

And let me use this opportunity to point out some of the great tools and databases developed by the Bork group:
STRINGAnalysis of known and predicted protein-protein interactions in all known genomes (OpenHelix Tutorial, by subscription)
STITCHDatabase of known and predicted interactions of chemicals and proteins.
SMARTDomain analysis (OpenHelix Tutorial, by subscription)
iTOLan online tool for the display and manipulation of phylogenetic trees.
XplorMedDataming in MedLine (OpenHelix Tutorial, by subscription)

And a whole lot more

Navigating the literature

progress slideWe have a slide we like to present at some trainings showing the rise in the amount of raw sequence data and number of complete genomes over the last 18 years. There is another slide we show that indicates the rise of the number of databasesdatabase growth and analysis tools over the years as listed in the annual database issue of NAR. The number has been doubling every 4 years.

Well, there is another slide we can show too, and this shows the growth of the literature risenumber of abstract entries into PubMed over the last 20 years (from Hunter and Cohen, 2006). Like data and databases, the number of research articles published and indexed just keeps getting larger. This increase in number is both a bane and a boon to researchers. Well, of course not only the number of papers indexed is growing, the amount of text is growing (open access, etc) and is about to grow even more with the signing of the new open access act. Searching, mining and making sense of all this literature is going to be a challenge, it is a challenge now.

Continue reading