Tag Archives: bioinformatics


I’ve been using Biostar for the last month and am finding it to be a great resource. Biostar is a Q&A space for genomics and bioinformatics questions. Currently, it is using ‘StackExchange‘, which is a community-driven Q&A forum. Today’s world in biology research was ripe for Biostar (we should know ;-).

The way Biostar works is that you get registered. Once registered you can ask or answer questions concerning bioinformatics or genomics. Questions and answers are rated and discussion can ensue. Questions are organized by tags and you can immediately see if they’ve been answered, how they were rated. Users are given points based on the number of questions and answers given, voting actions and more. This gives the user ‘reputation,’ this is a “rough estimation of how much the biostar community trusts you” as the faq says. The more reputation you receive, the more actions you can do in the community. For example, it takes no reputation to answer or ask, but more to vote, retag questions, etc.

So far the questions (there have been over 600 at the writing of this blog post) have ranged from the complex to straightforward. Some questions are about coding to perform a given bioinformatics task such as extracting sequence from 3GB Fasta file or how to get data from a database or resource. Often there are questions about where to find specific kinds of data. This question a couple weeks ago asked for “web resources to find cancer indication where a given gene is amplified.”  The answers given (including one from our own Mary :) resulted in some interesting resources such as Oncomine. I wasn’t able to try it out since it requires registration and/or subscription, but from the description given seems to be a useful resource. Between searching our database of publicly available databases for cancer-related and asking, a researcher should be able to find the right resource for their needs. Biostar seems to be a great fit for what we do at OpenHelix. Consider comparative genomics; one could use our search to find the right resources, then use  one of our tutorials to get a solid understanding of the resource and then for the more detailed how-to’s once they have the foundation, go to BioStar.

We are considering adding a “Biostar answer” of the week and integrating/linking more closely with Biostar in our WYP threads. We believe this will only enhance the OpenHelix blog’s mission to educate and outreach to researchers on genomics resources. Biostar has grown and matured quite well over the last few months from when I last read about it, I believe it’s developing into quite a nice community.

Briefings in Bioinformatics – our education paper is available now

Back in April I happened to mention that we (OpenHelix) were writing a paper on informal sources of bioinformatics education (in a Friday SNPets item) and we were asked to announce when the paper came out. Well, we got word late last week that the article has been published. The article appears in a special issue of Briefings in Bioinformatics that is devoted to bioinformatics education. I’m not sure if all the articles in the issue are available yet, but it looks like several are in the journal’s Advanced Access area. Bioinformatics education is an area (obviously) that OpenHelix cares deeply about & we are anxiously awaiting our copies of the full issue so we can read all the articles, but I digress…

The title “OpenHelix: bioinformatics education outside of a different box” (if you hit a paywall, or have trouble accessing, we will gladly send a reprint. Just email the corresponding author, Jennifer listed in the abstract or ask from our contact link- Trey) was a cool suggestion from one of the article’s reviewers – my original title was much tamer (ok, more boring). Regardless of the final title, what we wanted to do in the article is to discuss informal sources of bioinformatics education. By education we do mean acquiring applicable information that allows a researcher to operate within the field of bioinformatics. By informal we mean outside of traditional, credit based classes and degrees. Essentially we provide a bit of the knowledge and know-how that we’ve gathered over years of working with hundreds of resources, thousands of workshop attendees, and countless online contacts about where a researcher, or librarian, or whoever can turn for various informational needs in the field of bioinformatics.

Our contention is that not everyone needs to program in order to manage and manipulate their biological data these days. There are SO many fine publicly available databases, algorithms, tools and more, it is just a matter of awareness and training for anyone to be able to reformat and analyze their personal data sets. We maintain that :

…bioinformatics education needs to do a minimum of four things:

1. raise awareness of the available resources
2. enable researchers to find and evaluate resource functionality
3. lower the barrier between awareness and use of a resource
4. support the continuing educational needs of regular resource users

In the paper we walk through each of these – we first describe example needs associated with the point, and then cover possible informal resources that meet the needs. The article includes tables of resources and links to them and many many references. We really hope that is a very useful resource in the field of bioinformatics education.  I am already looking forward to contributing to the next special education issue, both to hone my writing skills and to extend the information we can provide readers. Please do comment, email, whatever and let us know about the resources that you use, what you learned from the article, etc. Oh, here’s the citation info:
Williams, J., Mangan, M., Perreault-Micale, C., Lathe, S., Sirohi, N., & Lathe, W. (2010). OpenHelix: bioinformatics education outside of a different box Briefings in Bioinformatics DOI: 10.1093/bib/bbq026

Galaxy, a stride towards reproducible computational research


Galaxy started out as a very useful tool to do genomics research that was reproducible and sharable. One of my pet peeves in reading research papers that use genomic analysis or online genomics resources is the materials and methods sections. Often the methods and parameters used are mentioned only in a very cursory manner, if at all. I would not be able to reproduce the research. This, along with the ability to easily do and share analysis, is one of the fundamental purpose Galaxy was developed and does a pretty good job of it (I am a bit biased*).

The Galaxy developers have recently published a paper: “Galaxy: a comprehensive approach for supporting accessible, reproducible and transparent computational research in the life sciences” in Genome Biology.

There have been a couple questions or functions I have felt that Galaxy needed to better fulfill the goal of reproducible and transparent computational research. One of the things we’ve been asked in workshops on Galaxy has been how long will ‘histories’ and ‘workflows’ persist. The Galaxy developers insisted these would persist indefinitely (as indefinite as an online world could be). In this paper, the developers answer that question with what seems to me a pretty good, broad approach to persistence:

We are pursuing three strategies to ensure that any Galaxy analysis and associated objects can be made easily and persistently accessible. First, we are developing export and import support so that Galaxy analyses can be stored as files and transferred among different Galaxy servers. Second, we are building a community space where users can upload and share Galaxy objects. Third, we plan to enable direct export of Galaxy Pages and analyses associated with publications to a long-term, searchable data archive such as Dryad.

Another feature that, though I knew this was coming, it’s good to see it in published form and in the beta site, a community of tools and users. It’s mentioned in the quote above, but it’s more than that. It’s an extension of the ability to share histories and workflows:

To help users make better and faster choices within Galaxy, we are extending Galaxy’s sharing model to help the Galaxy user community find and highlight useful items. Ideally, the community will identify histories, workflows, and other items that represent best practices; best practice items can be used to help guide users in their own analyses.

The beta site gives you a look at what’s coming in the “Galaxy Tool Shed,” a place to upload, download and share tools to import into Galaxy installations. Hopefully this will eventually also include the ability to rate and discuss tools. Another aspect I’ll be looking forward to is the ability to share workflows to an open and broader community. Right now there is the excellent ability to share histories and workflows with other users within your network of colleagues, but I would like to see an open community to share and rate workflows. From the comment above, it seems that is coming. It will be a very welcome addition.

One last feature added I’d like to mention is pages:

Galaxy Pages (Figure 4) are the principal means for communicating accessible, reproducible, and transparent computational research through Galaxy. Pages are custom web-based documents that enable users to communicate about an entire computational experiment, and Pages represent a step towards the next generation of online publication or publication supplement. A Page, like a publication or supplement, includes a mix of text and graphs describing the experiment’s analyses.In addition to standard content, a Page also includes embedded Galaxy items from the experiment: datasets, histories, and workflows. These embedded items provide an added layer of interactivity, providing additional details and links to use the items as well.

I tried out the pages (click “user” at the top right of the page, then click “pages” to access pages). I like the ability to basically write what is a methods and materials for computational biology. You can describe what you did, embed histories, datasets and the like. Unfortunately, at the time of this writing I was able to build a page, but unable to view it (server error, I used latest versions Safari and Firefox in Mac 10.5). I am sure this is a temporary glitch.

Galaxy has making huge progress in the last couple years and looks poised to become a go-to tool for computational analysis for experimental biologists. In that vein, you might want to check out their introductory tutorial or screencasts to get acquainted with the tool!

*disclaimer: The Galaxy group contracts with OpenHelix to provide an introductory tutorial on Galaxy (free and open to all users).

Goecks, J., Nekrutenko, A., Taylor, J., & Galaxy Team, T. (2010). Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences Genome Biology, 11 (8) DOI: 10.1186/gb-2010-11-8-r86

Galaxy http://www.galaxyproject.org

Friday SNPpets

Welcome to our Friday feature link dump: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

Tip of the Week: GeneMANIA for Fast Gene Function Predictions

I am happily the lucky scientist here at OpenHelix who has been selected to create a full tutorial on a network resource named GeneMANIA, but I just couldn’t wait to share some of the details about this resource’s great features. GeneMANIA stands for Multiple Association Network Integration Algorithm. Users can either input a single gene and see interaction networks associated with the gene, or (more powerfully) to enter a list of related genes and use GeneMANIA’s ‘state-of-the-art analysis algorithm’ to quickly and easily find additional related genes.

GeneMANIA allows a  scientist with no prior training in computational bioinformatics to easily use their gene list of interest to search a customized list of published functional genomics publications, as well as a wide variety of public databases, and receive the results of their analysis in an easy to interpret PDF report. Users can even upload their own important datasets and affect how the algorithm scores each gene and dataset. I  cannot cover all this functionality in this short tip – you’ll have to see our tutorial for that – but here I will show you how to do a basic search and some of the display features of GeneMANIA.

GeneMANIA stands for Multiple Association Network Integration Algorithm. For more information about it, please see this 2008 GeneMANIA publication, or the GeneMANIA documentation.

OH Launches Genomics Search and Learn Portal

OpenHelix today announced the launch of the new www.openhelix.com web site, a first-of-its kind portal that assists researchers in conducting breakthrough research.  The portal directs scientists to the most relevant publicly available bioinformatics and genomics resources on the web, and then gives them immediate access to tutorials, training materials, and useful tips and information developed by OpenHelix.

“There are now thousands of databases and analysis tools for the researcher to use when doing biological research,” said Scott Lathe, CEO of OpenHelix, “The first problem researchers have is just finding the most relevant tool.  There are catalogs of resources on the web, but they are cumbersome to use and using a keyword search on them more often than not delivers irrelevant results. The OpenHelix portal solves this problem”

The OpenHelix portal searches a highly relevant, curated database of genomics and biological resources, OpenHelix tutorials and training materials, and tips and information on the resources on the OpenHelix blog.  The portal delivers a list of relevant resources for the user using a proprietary relevancy system developed by OpenHelix scientists. Once a user finds the most relevant resources in the results, they can immediately go to training on the resource.

Searching for the most relevant resource is free to all users.  Some of the tutorial suites are also free, as they are sponsored by the resources themselves, such as the UCSC Genome Browser.

“The second, and possibly most significant problem a researcher faces is learning how to use these often complex and changing resources”, explained Lathe “Since OpenHelix has multiple trainers all having PhDs in biological sciences, intimate knowledge of the resources, and having years of experience with on-site and online training, we have a unique ability to provide the solution to this problem.”

With an OpenHelix individual or institution subscription, users can access the complete catalog of OpenHelix tutorial suites, over 80 currently, in categories including pathways, expression, variation, literature and general databases. These tutorials are continuously updated each month with new tutorials added frequently. (For a complete catalog, go to http://www.openhelix.com/cgi/tutorials.cgi)

With a subscription, scientists quickly learn how to use the tools they need when they need them. The online narrated tutorial, which runs in just about any browser, can be viewed from beginning to end or navigated using chapters and forward and backward sliders. The 30-60 minute tutorials highlight and explain all the features and functionality needed to start using the resource effectively. The tutorials are used to introduce a new resource, to view new features and functionality, or simply as a reference tool to refresh a users knowledge on a resource.

In addition to the tutorials, subscribers also receive useful training materials including the animated PowerPoint slides used as a basis for the tutorial, suggested script for the slides, slide handouts, and exercises. This can save a tremendous amount time and effort for teachers and professors to create classroom content.

The new portal was partially funded by NHGRI grant 3R44HG004531.

About OpenHelix
OpenHelix, LLC, (www.openhelix.com) provides the genomics knowledge you need when you need it. OpenHelix provides a bioinformatics and genomics search and training portal, giving researchers one place to find and learn how to use resources and databases on the web.  More efficient use of the most relevant resources means quicker and more effective research.

Galaxy, and writing code

I don’t write code.  There.  I said it.  Yes, I have been a bioinformatics professional for over a decade and I don’t write code. galaxy_logo

I’ve taken the classes and I own the books.  I’m down with the philosophy.  I get the need. But writing code makes me cranky and miserable.  Chasing a stray comma or semi-colon for 45 minutes makes we want to pull my hair out.  I have the ultimate respect for the people who have the patience for this.  But I’m not one of them.

Personally, I’m interested in finding the leads, and answering the biological questions.  That’s what drives me.  I’m essentially a super-end-user.  And that’s what I really like.

Recently we did a training on Galaxy and we were working with some grad students/post-docs who shared my phenotype.  They had been trying really hard to write some scripts to accomplish what they needed to do.  They also reported that comma-crazies affected them too.   After we showed them Galaxy, they were ready to put down those scripting books.  In fact, one of the comments to us was {paraphrased}, “We’ve been trying to teach ourselves programming.  I think that’s over.”  They said it would be more worth it to them to spent time learning Galaxy than learning programming.

That’s what I think, too.  Learning to use Galaxy is worth your time.

Now, I’m glad I know something about programming.  Probably the best back-end thing I ever learned was writing SQL statements. (Thanks, Rick :) At the Jackson Lab I shared an office with a very patient programmer who helped me with that.) Both of these have helped me to converse with professional programmers.  I know what to ask for in development projects better.  And I know enough that when handed some code I can sometimes get it to do what I need (or, ah, find someone who can help me locally….).

Galaxy is developing a user community that is doing that now–bridging the people with the questions and the people who like to build the bridges.  Across the mailing list the other day came word from Ido Tamir about a very helpful series of blog posts he’s going to do that offers help with Galaxy tasks.  The first one is importing data into Galaxy.  (There are some built-in import strategies already, Ido just created one that offers a bit more customization for additional features.)

Visit Ido’s Adventures in the galaxy Pt.1 to learn more–and see the code.  But spend some time learning Galaxy.  Our free tutorial (sponsored by the Galaxy team) will get you started.  We provide an overview of what Galaxy is, and the fundamentals of the interface and how and why to use it.   The great screencasts the team does that address specific tasks will really get you further.

And if you are like me, you can skip the scripting and go right to answering the biological questions. If you are developer helping people like me in your local group–get them using Galaxy and you can build stuff that they can use, and stop bugging you.

Use Galaxy.  It’s worth your time.

New GWA viewer

Genome-wide association studies (GWA/GWAS) generate a lot of data that needs to be viewed and analyzed. There are some software tools out there to do that, including UCSC’s Genome Graphs.

I haven’t looked at it in detail yet, but this new downloadable, java viewer was recently developed and reported in Bioinformatics: AssociationViewer (download here). I’m passing it on to you. As I said, haven’t had a chance to give it a test drive, but as the title of the article states, it’s a ” scalable and integrated software tool for visualization of large-scale variation data in genomic context.” At first glance, it looks interesting.

Advances in Genome Biology and Technology

If, like us, you were not able to go to sunny, warm Florida last week to the Advances in Genome Biology and Technology conference, well… never fear, that’s what blogging is for :D. Several bloggers who attended have given some interesting and informative overviews and highlights of the conference. Anthony Fejes gave a live blogging blow by blow, Daniel MacArthur of Genetic Future writes about the battle lines being drawn at sequencing companies (and some highlights), Dr. Robison discusses a new company planning to offer $5,000 dollar genomes (but only Human) this year (that $1,000 genome isn’t far away!). So, if you want to know some of the goings on at the conference, those are good places to start. The “Complete Genomics” story seems to be a big one.

Free Tutorials on Model Organism Genomic Databases Released by OpenHelix

OpenHelix today announced the free availability of tutorial suites on model organism databases and resources used extensively in research. The first tutorial suites available are GBrowse, Rat Genome Database (RGD), Mouse Genome Informatics (MGI), and WormBase. To be added in the coming weeks are Zebrafish Information Network (ZFIN), FlyBase and Saccharomyces (Yeast) Genome Database (SGD).

The tutorial suites, funded in part by a grant from the National Human Genome Research Institute of the National Institutes of Health, include a self run, narrated tutorial introducing the resource and how to use its feature and functions. Each suite also includes PowerPoint slides, handouts, and exercises that can be used for reference or for training others.

One of the first tutorials available is on GBrowse, developed by the Generic Model Organism Database (GMOD) project, a popular tool used by researchers to develop genome browsers for model organisms, species of interest, and particular topics. By learning how to use this “generic” genome browser, you can leverage that knowledge to use dozens of resources devoted to a wide range of research areas.

“The OpenHelix GBrowse user tutorial is very well done and will be an excellent resource for the many research communities that use GBrowse to visualize genomic data,” said Dave Clements of the National Evolutionary Synthesis Center who runs the GMOD help desk.

Model organisms, such as yeast, mouse, rat, flies, and many others, have long been used by researchers to expand our understanding of biology and to assess the effectiveness and safety of therapies before going to human trial. Many of the genomes of these organisms have been completely sequenced, giving the scientific community even greater insight into the organisms and their relation to human biology. The genome data is now available and searchable on publicly available online databases and resources.

You can view the Model Organism tutorials at http://www.openhelix.com/model_organisms.shtml. OpenHelix provides over 60 other tutorial suites on a number of genomic databases and resources through an individual, group, or institutional subscription. Further information can be found at www.openhelix.com.

About OpenHelix
OpenHelix, LLC, (www.openhelix.com) provides the genomics knowledge you need when you need it. OpenHelix provides online self-run tutorials and on-site training for institutions and companies on the most powerful and popular free, web based, publicly accessible bioinformatics resources. In addition, OpenHelix is contracted by resource providers to provide comprehensive, long-term training and outreach programs.