Tag Archives: tools

Tip of the Week: Galaxy Tool Shed

This week I attended and gave a talk at ISMB in Long Beach. While there I had the opportunity to attend a session on Galaxy where Jeremy Goecks spoke on Galaxy Visualizations and Greg Von Kuster spoke about the “first biomedical AppStore,” the Galaxy Toolshed. As always, I learned a few new things.

Today’s tip is a quick introduction to the Galaxy Tool Shed. The Tool shed is a place to share tools you’ve developed or to find tools that other developers have developed for your local instance of Galaxy. This is a quick introduction. I won’t be going into the mechanics and specifics of the toolshed, it’s not specifically for the experimental biologist end user, but rather for developers of tools for use in Galaxy. That said, it can be useful for the end user to know what tools might be available and get them into their local installation. If you or your institution is installing a local instance of Galaxy, you might want to check out the extensive documentation on how to use the toolshed.

There are a lot of tools available in the tool shed, over 1800 at last count. They range through many different categories. Though it’s only been a couple years since the implementation of the toolshed, some published tools such as CodonLogo which is a logo-based viewer for codon patterns in aligned sequences, have been added to the toolshed.

If you want to learn more about Galaxy.

We have a  webinar tomorrow (July 19, 2012 at 11am PDT)  introducing Galaxy (free).

We have an online tutorial (fee)

And we’ve done tips (free of course) on Galaxy visualization, getting flanking sequences and converting genome coordinates using Galaxy,  and Galaxy pages. And we’ve tipped and blogged a lot of Galaxy-related stuff.

Quick Links:
Galaxy Main Instance
Galaxy Tool Box
Galaxy Tool Box How-to
Setting up a local instance

 

Sharma V, Murphy DP, Provan G, & Baranov PV (2012). CodonLogo: a sequence logo-based viewer for codon patterns. Bioinformatics (Oxford, England), 28 (14), 1935-6 PMID: 22595210

More Big Data to Consider: Bioimage Informatics

I’m not sure any more when I signed up for complementary copies of Nature Methods, but just like clockwork my copy arrives each month. If you’d like to get it too, you can apply for a subscription here (Firefox seems to work better than IE, btw). This month’s issue particularly interested me because it contains a focus on Bioimage Informatics. The focus appears to be free to read online.

I found the focus just after having read the Science News article “Blast Injuries Linked to Neurodegeneration in Veterans” by Greg Miller. In Greg’s piece there is a description of a distinctive neuropathology that has been seen in athletes and military veterans who had incurred head injuries. This same distinctive pattern is seen in a mouse model of blast injury & the image of the tangles of tau protein shown in the article struck me as so interesting that I told my husband about it over dinner one night, so I already had bioimages on my mind. I am also always interested in the field of bioinformatics, both personally and as a member of the OpenHelix team.

The commentaries, in the order that they were printed, were what I read initially. The first commentary is by Gene Myers, who was also involved in early genome bioinformatics, and it provided a very interesting perspective on both the current state of bioimage informatics and on the historic use of bioimages in systems genetics.The following quote made me grin:

The field is still in its early days, and there is no such thing as a typical bioimage informatician: they are either computer vision experts looking for new problems, classic sequence-based bioinformaticians looking for the new thing or physicists and molecular biologists whose experiments require them to bite the informatics bullet. … From my perspective, it is very reminiscent of the state of bioinformatics in the early 1980s: the exciting, somewhat chaotic free-for-all that is potentially the birth of something new.”

And the following paragraph stressing the importance of “due diligence of pilot studies” and “optimized protocols” reminded me of my days setting up a Biocore facility without enough funding for either sufficient pilot studies or optimization, which ultimately doomed the utility of the machine to my advisor and department alike. This commentary set the stage well for the rest of the articles. The other commentaries included a description of the difference in goals of the computer vision field and the bioimage informatics field, a plea for usability to be built into bioimaging software, and a historical commentary on the 25 years of NIH Image, now ImageJ.

The usability article sounded many many of the same cries that we make here at OpenHelix – if you want to have usable bioscience software that IS in fact USED, at a minimum you must 1) have funding and a mandate to maintain it over the long run, 2) have motivated developers that are responsive to their users needs and feedback, including fixing bugs and 3) (last but absolutely not least) you must provide awareness and training on your software. And in my opinion, any old training WON”T due – it has to be high quality, up-to-date, and easier to use & absorb than your average dry documentation on programming your VCR clock (OK, I’m dating myself there, but you KNOW what I mean…) I like their suggestion that funding agencies request descriptions of how the software be maintained and documented, and to be prepared to provide funding not just for development, but also for maintenance. (Why reinvent the wheel over & over, just to let each one go flat with disrepair?)

There were also reports on specific software, such as OMERO.searcher, SimuCell, PhenoRipper, Fiji, BioImageXD, and Icy, as well as on the Broad Bioimage Benchmark Collection (BBBC), a collection of microscopy image sets available for the testing and validation of new image-analysis algorithms.

The focus then concludes with a great review of bioimaging software tools, with the goal of providing a “how to” summary of using open-source imaging software for every stage of bioimage informatics. It begins with a discussion of data aquisition & continues through data storage and workflow systems. I might tweek figure one just a bit, but it does visualize that today software is required at every stage of image analysis – from automated image attainment to image retrieval and analysis. The authors also touch on the importance of image annotation and controlled vocabularies, or ontologies. Table 1 provides a nice resource listing including software names, primary function and URL – I have some new resources to check out now! :)

Overall, I’d suggest this focus on bioimage informatics to any life scientist, whether you are analyzing images today or not – I think it is provides a glimpse into an up&coming, exciting field.

Quick Links:
BioImageXD: http://www.bioimagexd.net/

Broad  Broad Bioimage Benchmark Collection (BBBC): http://www.broadinstitute.org/bbbc/

Fiji: http://imagej.nih.gov/ij/

Icy: http://icy.bioimageanalysis.org/

OMERO.searcher: http://murphylab.web.cmu.edu/software/searcher/

PhenoRipper: http://www.phenoripper.org/

SimuCell: http://www.SimuCell.org/

 

Reference List:
Greg Miller (2012). Blast Injuries Linked to Neurodegeneration in Veterans Science, 336 (6083), 790-791 DOI: 10.1126/science.336.6083.790

Gene Myers (2012). Why bioimage informatics matters Nature Methods, 9, 659-660 DOI: 10.1038/nmeth.2024

Anne E Carpenter, Lee Kamentsky, & Kevin W Eliceiri (2012). A call for bioimaging software usability Nature Methods 9, 9, 666-670 DOI: 10.1038/nmeth.2073

Kevin W Eliceiri, Michael R Berthold, Ilya G Goldberg, Luis Ibáñez, B S Manjunath, Maryann E Martone, Robert F Murphy, Hanchuan Peng, Anne L Plant, Badrinath Roysam, Nico Stuurmann, Jason R Swedlow, Pavel Tomancak, & & Anne E Carpenter (2012). Biological imaging software tools Nature Methods, 9, 697-710 DOI: 10.1038/nmeth.2084

Video Tip of the Week: Visualizing the Galaxy


An antennae galaxy

Well, not that kind of galaxy (though visualizing those are quite nice), this kind of Galaxy. Galaxy is an excellent tool to analyze, reproduce and share genomics data and the Galaxy folks are always updating, improving and adding features to the tool. We have a tutorial for Galaxy to help you get started using this tool. As you might have guessed from the previous sentence, Galaxy is a moving target. The basics (and that’s what the tutorial is for) are the same, but the tutorial is in the process of being updated to reflect some of those changes. That update should be out sooner rather than later, but that said, we just can’t fit everything into the tutorial. The relatively new visualization tool is something that will not be in the tutorial. As there are no tutorials on visualization at the Galaxy site that I can find (if you know of any, link them here!), I’ve included a quick intro to visualizations using Galaxy in this tip of the week.

There are other ways to visualize the data analyzed at Galaxy. Galaxy datasets can often be viewed directly at UCSC Genome Browser, Ensembl, RViewer or in GeneTrack within Galaxy. Those are all excellent tools and powerful ways to view and explore your analysis in depth. In addition, the Galaxy visualization tool is a way to quickly visualize your data to help  discovery,  direct further analysis and share what you’ve found. It is obviously not a full fledged browser, but is very useful in doing a simple visualization of your data from within Galaxy. Today’s tip gives a quick introduction to Galaxy visualization.

Quick Links:
Galaxy (OH tutorial-subscr.)
UCSC Genome Browser (OH tutorials-free)
Ensembl (OH tutorials-subscr.)
RViewer
GeneTrack

P.S. You might here some bird song in the background. I am in, and working from, Hawaii for the next month (yeah, it’s tough work but someone has got to do it). No way to get those birds (or the frogs at night) to be silent for a bit.

What’s the Answer? (Bioinformatics Tools on Biostar)

BioStar is a site for asking, answering and discussing bioinformatics questions. We are members of thecommunity and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those questions and answers here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

Just about a month ago, BioStar added a tool section as a place to announce and update new and old bioinformatics tools. So, in a sense, today’s post is are answers in search of your questions :):

We just added a new section to the site called **Tools** We designate this section to announcements regarding bioinformatics software tools, both new and old.

There are about 10 tools listed currently for various areas of analysis. This dovetails nicely with a the “obituary” section Mary created at Biostar for no longer supported tools and databases she discussed earlier this week. Circle of life and all that :).

On a Mission for Protein Information

It’s probably just the human brain’s ability to connect dots  &  find patterns, but it can be interesting how many “unrelated” events and information bits accumulate in my head & eventually get mulled into an idea or theory. Take, for example, a recent biotech mixer, bits from an education leadership series & a past Nature article – each “event” has been meandering in my mind and now they are finding their way out as this blog post.

OK, now the explanation: At a recent local biotech event I heard about a company (KeraNetics) purifying keratin proteins & using them to develop therapeutic and research applications. The company & their research sounded very interesting & because a lot of it is aimed at aiding wounded soldiers, it also sounded directly beneficial. The talk was short, only about 20 minutes, so there wasn’t a lot of time for details or questions. I decided I’d venture forth through many of the bioscience databases and resources that I know and love, in order to learn more about keratin.

My quest was both fun and frustrating because of the nature of the beast – keratin is “well known” (i.e. it comes up in high school academic challenge competitions ‘a lot’, according to someone in the know), but is hard to work with (i.e. tough, insoluble, fibrous structural proteins) that is hard to find much general information on in your average protein database (because it is  made of many different gene products, all referred to as “keratin”). I decided to begin my adventure at two of my favorite protein resources, PDB & SBKB, but I found no solved structures for keratin. Because of the way model organism databases are curated and organized, I often begin a protein search there, just to get some basic background, gene names, sequence information, etc. I (of course) found nothing other than a couple of GO terms in the Saccharomyces Genome Database (SGD), but I found hundreds of results in both Mouse Genome Informatics (MGI) (660 genomic features) and Rat Genome Database (RGD) (162 rat genes, 342 human genes). I also found gene names (Krt*), sequences and many summary annotations with references to diseases with links to OMIM. When I queried for “keratin”, in OMIM I got 180 hits, including 61 “clinical synopsises”, in UniProt returned 505 reviewed entries and 2,435 unreviewed entiries, in Entrez Protein 10,611 results and in PubMed 26,430 articles with 1,707 reviews. I got my curiosity about KeraNetics’ research sated by using a PubMed advanced search for Keratin in the abstract or title & the PI’s name as author (search = “(keratin[Title/Abstract]) AND Van Dyke[Author]“).

I ended up with a lot of information leads that I could have hunted through, but it was a fun process in which I learned a lot about keratin. This is where the education stuff comes in. I’ve been seeing a lot of studies go by talking about reforming education to be more investigation driven, and I can totally see how that can work. “Learning” through memorization & regurgitation is dry for everyone & rough for the “memory challenged”, like me. Having a reason or curiosity to explore, with a new nugget of data or understanding lurking around each corner, the information just seems to get in better & stay longer. (OT, but thought I’d mention a related site that I found today w/ some neat stuff: Mind/Shift-How we will learn.)

And I could have done the advanced PubMed search in the beginning, but what fun would that have been? Plus there is a lot that I learned about keratin from what I didn’t find, like that there wasn’t a plethora of PDB structures for keratin proteins. That brings me to the final dot in my mullings – an article that I came across today as I worked on my reading backlog: “Too many roads not taken“. If you have a subscription to Nature you can read it, but the main point is that researchers are still largely focusing on the same set of proteins that they have been for a long time, because these are the proteins for which there are research tools (antibodies, chemical inhibitors, etc). This same sort of philosophy is fueling the Protein Structure Initiative (PSI) efforts, as described here. Anyway, I found the article interesting & agree with the authors general suggestions. I would however extend it beyond these physical research tools & say that going forward researchers need more data analysis tools, and training on how to use them – but I would, wouldn’t I? :)

References:

  • Sierpinski P, Garrett J, Ma J, Apel P, Klorig D, Smith T, Koman LA, Atala A, & Van Dyke M (2008). The use of keratin biomaterials derived from human hair for the promotion of rapid regeneration of peripheral nerves. Biomaterials, 29 (1), 118-28 PMID: 17919720
  • Edwards, A., Isserlin, R., Bader, G., Frye, S., Willson, T., & Yu, F. (2011). Too many roads not taken Nature, 470 (7333), 163-165 DOI: 10.1038/470163a

What’s the Answer: Open Thread (NGS Tools)

BioStar is a site for asking, answering and discussing bioinformatics questions. We are members of the
community and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those questions and answers here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

Question of the week:

Now we are analysing NGS data, and I wonder if you know some collections of bioinformatics tools which can help me (like biopieces).

There were a few good answers with a good suggestion of lists of tools for analysis and preparation of Next-Generation Sequencing tools. Here’s one answer, click the link above for the rest:

(after some advice about mastering scripting language and unix commands…)

  1. learn the most used bioinformatics tools. e.g.:
    • one (or preferably many) short-read aligners
    • samtools
    • an NGS viewer (IGV is a good one to start)
    • Bedtools
    • A means to view and filter your NGS reads
    • and certainly many others depending on your specific focus.
  2. Then start to learn some of the common data repositories. e.g.:

 

“What’s the Answer”

BioStar is a site for asking, answering and discussing bioinformatics question

s. We are members of the community and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those questions and answers here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

Today’s question and answer is:

Recommend easy to use microarray clustering software

The most highly voted answer (was the author who posted the recommendation thread):

One of my favorites is the MEV micro-array data analysis tool. It is simple to use and it has a very large number of features.

Works well for any type of data. You can also load into it data from a file that is in a simple text format:

GENE1, value1, value2
GENE2, value1, value2

Feel free to post your favorite clustering tool.

Several other excellent tools were suggested, you can check them out here.

Have some NGS SAM/BAM files? get a GUI interface

A recent paper on a GUI interface introduces SAMMate. As the paper states:

With just a few mouse clicks, SAMMate will provide biomedical researchers easy access to important alignment information stored in SAM/BAM files.

You might want to check it out if you have Next Generation Sequencing data in the form of BAM/SAM files. A nice feature I haven’t been able to check is that it will export a ‘wiggle’ file for alignment visualization in the UCSC Genome Browser.

Coming up, Guest Posts

Greetings! OpenHelix Blog is instituting a new semi-weekly feature. Every Wednesday we have our “Tip of the Week,” on Thursdays we have our “What’s Your Problem,” and now on an occasional Tuesdays we are going to have our “Provider Guest Post.” These will be posts from providers of genomics tools and database and will be opinions, updates and upcoming features of the resource, whatever the provider of the resource would like to convey to users. We have several lined up for the coming weeks, so keep checking back.

Additionally, if you are a developer or provider of an free, publicly available genomics or biological resource, database or analysis tool and would like to post in our guest feature, be it an introduction to your tool, updates or upcoming features or even an opinion about the current state of genomics research and data, please write us at wlathe AT openhelix DOT com. We would love to put you in the queue for the next guest post.

Our first guest post next Tuesday will be from Inna Dubchak , principal investigator at the LBNL/JGI group, developers of the VISTA comparative genomics resource (who sponsors a tutorial, free to the users). She’ll discuss some new tools at VISTA and give you a quick preview of some new upcoming features.

Tip of the Week: TARGeT

target_thumbToday’s tip is on a TARGeT. TARGeT is, as the the paper’s title in the this year’s NAR’s issue states, “a web-based pipeline for retrieving and characterizing gene and transposable element families from genomic sequences.” There are several things you can do at TARGeT. Using BLAST, PHI BLAST, MUSCLE and TreeBest ,the main function of TARGeT is  to quickly obtain gene and transposon families from a query sequence. The tip today is a quick intro to the tool and a search on an R1 non-LTR transposon.