Tag Archives: software

More Big Data to Consider: Bioimage Informatics

I’m not sure any more when I signed up for complementary copies of Nature Methods, but just like clockwork my copy arrives each month. If you’d like to get it too, you can apply for a subscription here (Firefox seems to work better than IE, btw). This month’s issue particularly interested me because it contains a focus on Bioimage Informatics. The focus appears to be free to read online.

I found the focus just after having read the Science News article “Blast Injuries Linked to Neurodegeneration in Veterans” by Greg Miller. In Greg’s piece there is a description of a distinctive neuropathology that has been seen in athletes and military veterans who had incurred head injuries. This same distinctive pattern is seen in a mouse model of blast injury & the image of the tangles of tau protein shown in the article struck me as so interesting that I told my husband about it over dinner one night, so I already had bioimages on my mind. I am also always interested in the field of bioinformatics, both personally and as a member of the OpenHelix team.

The commentaries, in the order that they were printed, were what I read initially. The first commentary is by Gene Myers, who was also involved in early genome bioinformatics, and it provided a very interesting perspective on both the current state of bioimage informatics and on the historic use of bioimages in systems genetics.The following quote made me grin:

The field is still in its early days, and there is no such thing as a typical bioimage informatician: they are either computer vision experts looking for new problems, classic sequence-based bioinformaticians looking for the new thing or physicists and molecular biologists whose experiments require them to bite the informatics bullet. … From my perspective, it is very reminiscent of the state of bioinformatics in the early 1980s: the exciting, somewhat chaotic free-for-all that is potentially the birth of something new.”

And the following paragraph stressing the importance of “due diligence of pilot studies” and “optimized protocols” reminded me of my days setting up a Biocore facility without enough funding for either sufficient pilot studies or optimization, which ultimately doomed the utility of the machine to my advisor and department alike. This commentary set the stage well for the rest of the articles. The other commentaries included a description of the difference in goals of the computer vision field and the bioimage informatics field, a plea for usability to be built into bioimaging software, and a historical commentary on the 25 years of NIH Image, now ImageJ.

The usability article sounded many many of the same cries that we make here at OpenHelix – if you want to have usable bioscience software that IS in fact USED, at a minimum you must 1) have funding and a mandate to maintain it over the long run, 2) have motivated developers that are responsive to their users needs and feedback, including fixing bugs and 3) (last but absolutely not least) you must provide awareness and training on your software. And in my opinion, any old training WON”T due – it has to be high quality, up-to-date, and easier to use & absorb than your average dry documentation on programming your VCR clock (OK, I’m dating myself there, but you KNOW what I mean…) I like their suggestion that funding agencies request descriptions of how the software be maintained and documented, and to be prepared to provide funding not just for development, but also for maintenance. (Why reinvent the wheel over & over, just to let each one go flat with disrepair?)

There were also reports on specific software, such as OMERO.searcher, SimuCell, PhenoRipper, Fiji, BioImageXD, and Icy, as well as on the Broad Bioimage Benchmark Collection (BBBC), a collection of microscopy image sets available for the testing and validation of new image-analysis algorithms.

The focus then concludes with a great review of bioimaging software tools, with the goal of providing a “how to” summary of using open-source imaging software for every stage of bioimage informatics. It begins with a discussion of data aquisition & continues through data storage and workflow systems. I might tweek figure one just a bit, but it does visualize that today software is required at every stage of image analysis – from automated image attainment to image retrieval and analysis. The authors also touch on the importance of image annotation and controlled vocabularies, or ontologies. Table 1 provides a nice resource listing including software names, primary function and URL – I have some new resources to check out now! :)

Overall, I’d suggest this focus on bioimage informatics to any life scientist, whether you are analyzing images today or not – I think it is provides a glimpse into an up&coming, exciting field.

Quick Links:
BioImageXD: http://www.bioimagexd.net/

Broad  Broad Bioimage Benchmark Collection (BBBC): http://www.broadinstitute.org/bbbc/

Fiji: http://imagej.nih.gov/ij/

Icy: http://icy.bioimageanalysis.org/

OMERO.searcher: http://murphylab.web.cmu.edu/software/searcher/

PhenoRipper: http://www.phenoripper.org/

SimuCell: http://www.SimuCell.org/


Reference List:
Greg Miller (2012). Blast Injuries Linked to Neurodegeneration in Veterans Science, 336 (6083), 790-791 DOI: 10.1126/science.336.6083.790

Gene Myers (2012). Why bioimage informatics matters Nature Methods, 9, 659-660 DOI: 10.1038/nmeth.2024

Anne E Carpenter, Lee Kamentsky, & Kevin W Eliceiri (2012). A call for bioimaging software usability Nature Methods 9, 9, 666-670 DOI: 10.1038/nmeth.2073

Kevin W Eliceiri, Michael R Berthold, Ilya G Goldberg, Luis Ibáñez, B S Manjunath, Maryann E Martone, Robert F Murphy, Hanchuan Peng, Anne L Plant, Badrinath Roysam, Nico Stuurmann, Jason R Swedlow, Pavel Tomancak, & & Anne E Carpenter (2012). Biological imaging software tools Nature Methods, 9, 697-710 DOI: 10.1038/nmeth.2084

Tip of the Week: World Tour of Genomics Resources

Most weeks our tip is a five-minute movie that quickly introduces you to a new resource, or a cool new function at an established resource. Occasionally we feature one of our full resource tutorial that is being made freely available through resource sponsorship of our training suite. In this week’s tip we provide access to one of our tutorials that is especially near and dear to our heart. It is a World Tour of Genomics Resources in which we explore a variety of publicly-available biomedical, bioinformatics and bioscience databases and other resources.

This tutorial is quite different from our usual ones. Generally we focus on a specific software resource and describe step-by-step how to use its functions such as how to do basic and advanced searches, how to understand and modify displays, where to find specific types of data such as FASTA sequences, etc. and even provide tips on ‘hidden features’ that power users even find useful and informative.  This type of software training is absolutely critical.

But many people need an even earlier step: just the *awareness* that resources are available that might serve their needs. This tutorial fills that niche. We present a sampling of resources, all free to use, from each of 9 categories including: Analysis & Algorithms, Expression, Genome Browsers (for Eukaryotes and for Prokaryotes and Viruses), Genome VariationLiterature, Nucleotides, Pathways and Proteins. After the World Tour, which is the majority of the tutorial, we then describe how to use OpenHelix’s free search and learn portal to find bioscience resources most appropriate for your research needs. From this the tour transitions into a brief discussion of the format of our training materials and how to use them, and then ends with information about other learning resources that we provide.

This tutorial has been wildly popular whenever we’ve done it as a live seminar. At the NIH they actually had to lock the doors because we’d hit the capacity of the room, and people were turned away. In fact, it has been so popular that we decided to produce it as a full tutorial suite and release it as one of our free trainings so that anyone and everyone could learn about the breadth of great public software options available for free use.

In addition to this free tutorial, we also have published a paper entitled “OpenHelix: bioinformatics education outside of a different box” in a special issue of Briefings in Bioinformatics entitled “Special Issue: Education in Bioinformatics“. This paper describes a plethora of sources where researchers can access informal educational sources of learning on publicly available bioinformatics resources. The sources of information include a wide variety of formats including lists of resources, journals that regularly feature tool descriptions, and eLearning resources sources such as the MIT OpenCourseWare effort. If you know of other such resources that aren’t covered in our tour or paper, comment & let us know about them – we love to learn as much as we love to teach! :)

Quick link to World Tour of Genomics Resources tutorial here.

  • Williams, J., Mangan, M., Perreault-Micale, C., Lathe, S., Sirohi, N., & Lathe, W. (2010). OpenHelix: bioinformatics education outside of a different box Briefings in Bioinformatics, 11 (6), 598-609 DOI: 10.1093/bib/bbq026

Friday SNPpets

Welcome to our Friday feature link dump: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

I can haz outreach? Nobody speaks for the end users.

Recently there was much buzz in the #bioinformatics twittersphere over this blog post by Sean Eddy: The next five years of computational genomics at NHGRI

It is a very nice post about some exciting prospects for the future.  The idea of planning “explicitly for sustainable exponential growth” is wise.  There will be no abatement of the flow of data at this point–it’s no longer a big bolus of one species data, or one type of project.  The taps are wide open now, and we just keep adding more taps.

I also love the idea of “democratization“.  In part, it includes:

….To enable individual investigators to make effective use of large datasets, we must create an effective infrastructure of data, hardware, and software. NHGRI has extensive experience in big data, and can lead and catalyze across the NIH….

Now, I know this is a snippet of some thoughts–there may be more to it in the actual planning meetings on this.  But it pushed my buttons because it sounds a lot like what we always hear about big data projects: build it and they will come.

It got a little better in another segment:

Spur better software development. Traditional academia and funding mechanisms do not reward the development of robust, well-documented research software; at the same time, the history of commercial software viability in a narrow, rapidly-moving research area like computational genomics is not at all encouraging….

Well-documented research software.  Sigh.  We probably read more documentation than most people. And even the good documentation can be brutal. Dated. And not particularly effective. But still–if nothing else, please reward time spent on documentation….

But what is missing for me from this–and not just this, but most of these big data types of projects–is a real commitment to outreach and support for end users.  Formal, organized, supported, rewarded, outreach.  Sometimes there is a place to write to with questions.  But we probably send in more questions to projects than most people too–and the success rate for answers varies widely.  But even when we get good answers–that’s not enough.

I know funding is hard.  We can’t fund everything.  Databases and software project have to struggle to even persist.  Curation is frequently not valued enough.  And often curators are expected to do outreach as just one of their tasks…which pushes outreach even further down the priority list.  But without dedicated outreach–formal, quality, active outreach–databases and software projects won’t have so many users, and not many effective users.   Which will make funding agencies wonder if they should keep supporting them.  Which…well, you can see where this spiral goes….

What bugs me, I guess, is essentially this: Nobody speaks for the end users. There’s really no one in these types of meeting that really speaks for the consumers of this software and this data.  I mean people who aren’t directly attached to the data production and management.   The project teams think they are thinking about the users.  They really want users.  But ur not doin’ it rite.

I would like to see outreach and end user support valued, required, and really done right.  No matter how  much hardware and documentation you throw at these projects, if people 1) don’t know it exists, and 2) have no idea how to use it, the project will not yield all the results that it could. A marker paper is nice.  But it’s not sufficient, folks. And it’s nice to have the high-end team members talk at conferences. But that reaches only a tiny subset of the users or potential users.  And another thing about that: a lot of times people are hesitant to ask what sound like naive questions to the high-end representatives of these projects.  I’m jes’ sayin.

Yes, this is fairly self-serving for me to say.  But we see the users when we do outreach.  They crave it.  They love it.  We’ve been lucky to be a part of some great projects that do outreach right.  We have seen it work.  It should be Standard Operating Procedure on software and database projects.  Not an afterthought.

Tip of the Week: PhyloWidget

Got phylogeny? So, you’ve created a phylogeny using some software and would like to draw the tree from the Newick formatted file* you exported. Of course there is no shortage of tree drawing programs out there (or phylogeny generating ones either). We found another one recently; a web-based, API-usable, opensource tree drawing tool that is quite intuitive and functional. It’s called PhyloWidget. The paper describing PhyloWidget is here and you can find some very nicely done step-by-step instructions here. I’ve done a quick 4 minute tutorial here to give you a quick overview. If it looks like something you could use, you might want to delve into it right away, or check out their ‘walk through.’ Happy climbing.

*Here’s the file I use in the tip: ((raccoon:19.19959,bear:6.80041):0.84600,((sea_lion:11.99700, seal:12.00300):7.52973,((monkey:100.85930,cat:47.14069):20.59201, weasel:18.87953):2.09460):3.87382,dog:25.46154);

New GWA viewer

Genome-wide association studies (GWA/GWAS) generate a lot of data that needs to be viewed and analyzed. There are some software tools out there to do that, including UCSC’s Genome Graphs.

I haven’t looked at it in detail yet, but this new downloadable, java viewer was recently developed and reported in Bioinformatics: AssociationViewer (download here). I’m passing it on to you. As I said, haven’t had a chance to give it a test drive, but as the title of the article states, it’s a ” scalable and integrated software tool for visualization of large-scale variation data in genomic context.” At first glance, it looks interesting.

Looking for the Perfect Pedigree Software

Long ago, when our blog was young (less than 2 months old – where does the time go?), Mary wrote a post about the pedigree drawing programs that she knew of, or that were mentioned on the Mouse Genome Informatics (MGI) mailing list. There has been so much interest in that post, as judged by clicks, that we began looking into pedigree analysis tools with the idea of creating one of our trainings on the ‘best’ tool we found. I have been working on finding a great tool to train on – public (free), broad applicability and web-based. I did find a nice little tool named PediDraw, and did a tip of the week on that, but it is so easy to use it doesn’t really warrant a full tutorial.

But all in all, I’m finding the search to be a somewhat difficult slog. So much of the software I have found is fairly old, many of which are no longer supported by their creators. Others are only available commercially and/or have a very focused functionality, are only available on one system or the other, or are in various languages such as Fortran 77 or R. Last week I noticed that Mary’s blog was linked to in a blog post by Gregor Gorjanc, and I must confess that it felt great to know that he was having a similar slog for the ‘right’ pedigree program. Gregor’s post has some nice background information and information on for plotting large, complex animal breeding animal pedigrees. I won’t repeat his information here but his post, as well as others he links to, convinced me I should share some of the gems of knowledge that I have acquired.

The first thing I learned is how easy it was to find useless information (at least to me) by googling for ‘pedigree’ – as in almost 23 million hits with the top I-don’t-know-how-many linking to information or coupons for Pedigree dog food. As I searched, I also learned how many different things people mean by pedigrees – there are pedigrees for keeping track of livestock breeding programs, laboratory mouse strains, dog breeds, and fancy bird crosses. thoroughbred, others are for historians or hobbyists tracing their roots. The term pedigree is also used for business tracking and management software systems. The area that is closest to my personal interests are those that are medically relevant, but even those are amazingly diverse – pedigree can mean everything from a family history given by a patient (riskApps) to e-pedigrees soon to be required by some states for pharmaceutical manufacturers to reduce the chances of dangerous counterfeit chemicals.

Once I got a few promising hits, found some promising sites I tried to extend my searches using the words that were on the sites that I had found. Using search terms such as ‘kinfolk’ and ‘ancestry’ lead me to Anabaptist family databases and presidential genealogies, but not so many drawing programs. The most fruitful/directed search phrase was of course ‘pedigree drawing software’ which retrieved many articles and individual software pages. However, I think THE BEST hit (by far) that I found was the linkage software list at Rockefeller University, and I found it with the search phrase ‘linkage analysis software’. Why do I consider this THE best hit, you ask? Though my searches I have come to believe that pedigree drawing software is somewhat like religion – it is a very personalized thing & only you can know which is best for you. The Rockefeller list is the largest and most comprehensive list that I have anywhere. It is up-to-date and publicly available. It lists over 450 programs in an alphabetized list and provides information such as system availability, recent publications, brief description for each. There is also a searchable version of the list. If I were hunting for the perfect pedigree drawing program for my research, I would search here rather than Google! My hat is off to those who maintain this wonderful list at the Laboratory of Statistical Genetics at Rockefeller University!!

I’ll keep you posted as to my finds as I cull their list. And if you are ‘in the know’, or have found the ‘perfect’ pedigree software PLEASE do comment and add your knowledge here.

Pedigree drawing software

A question across the MGI mailing list this weekend was: “Is there a good, easy to use template/program for drawing pedigrees that someone could recommend?”

So far the suggestions include:
Pedigree Draw: http://www.pedigree-draw.com/ A mac-only tool available for purchase.

Pedigree Viewer: http://www-personal.une.edu.au/~bkinghor/pedigree.htm is a free tool, for Windows.

Another suggestion came for HaploPainter: http://haplopainter.sourceforge.net/html/index.html this for installation on Windows/linux, but also can be installed on Macs with a bit of awareness about the installation process. The diagram with the HaploPainter page looked really nice, so I went to check out the paper. Thiele and Nürnberg were challenged by some genome-wide scans with over 10,000 SNPs that they wanted to display. They created this software to solve their problem, and released it for others use as well. Looks like it could be really useful–we will have to test it out.

If anyone has other suggestions, add them here and I can send them back to the mailing list–or, of course, you can sign up yourself!

EDIT: here’s another nice looking program submitted to the MGI mailing list: http://eyegene.ophthy.med.umich.edu/madeline/index.php Madeline 2.0

Progeny: this company has software that people have told us they like. I just realized they had a free pedigree tool that you can check out: http://www.progenygenetics.com/students/

I just remembered another project that I had heard about–the Surgeon General’s Family Health project has a tool for family medical history pedigrees: https://familyhistory.hhs.gov/ My Family Health Portrait.

PediDraw–found another one, and I’ll keep adding them as I find them.  This is a web-based tool.  The paper accompanying this one is available in PubMedCentral.

Thiele, H. and Nürnberg, P. (2004). HaploPainter: a tool for drawing pedigrees with complex haplotypes. Bioinformatics, 21(8), 1730-1732. DOI: 10.1093/bioinformatics/bth488