Category Archives: Tip of the Week

Video Tip of the Week: The New OpenHelix Interface

Generally we like to highlight new features and new tools from bioinformatics software providers. But this week we wanted to introduce some new features of our own OpenHelix site. If you’ve been using the site for a while, you will have noticed that recently we rolled out some changes. All the same tutorial materials and tips are available, but we’ve provided new ways to access them.

The most important thing about the new site is accessing our training suites.

Access the training video, slides, and exercises on the training suite page.

Access the training video, slides, and exercises on the training suite page.

This is now quicker with buttons right from the main page at OpenHelix: full catalog, or list of free tutorials. And when you find a suite you want to watch (like the UCSC Genome Browser one shown here as an example), it loads right on that landing page instead of requiring another click to launch a new window. If you still want the larger original size version in a new window, though, that’s still available from the button below.  Access to our slides, handouts, and exercises is still there–right below the video. And you can still quickly hop over to the site that’s described on the page with the “Visit the Resource” link.

Further down on the page we have the links to related content. This can be other tutorials about this resource (for example, UCSC Genome Browser Advanced Topics), or other genome browsers. We also collect the blog posts related to this resource that may offer new tools or features and link them from this page. And we also text-mine the BiomedCentral open-access publications to seek out those citing this resource–this way you can see what researchers are doing with the tools in their research programs.

Our search feature still provides access to our complete collection of resources that we’ve examined over the years. But the search results are now also refined to let you tab to those with our popular Video Tip of the Week subset if you just want to locate those with short video tips.

With that overview, I’ll also offer this week’s video tip overview of the new site as well.

Our basic philosophy remains the same, as we explained on our paper (linked below).

To accomplish its outreach mandate, bioinformatics education needs to do a minimum of four things:

  1. raise awareness of the available resources
  2. enable researchers to find and evaluate resource functionality
  3. lower the barrier between awareness and use of a resource
  4. support the continuing educational needs of regular resource users

We want to provide introductory training on many of the core resources in bioinformatics, and help educators and trainers elsewhere to provide this to students and staff who will need to access these tools for their research. We hope you like the new look. If you have any issues, let us know.

Quick link:

Main site:


Williams J.M., M.E. Mangan, C. Perreault-Micale, S. Lathe, N. Sirohi & W. C. Lathe (2010). OpenHelix: bioinformatics education outside of a different box, Briefings in Bioinformatics, 11 (6) 598-609. DOI:

There’s a press release associated with this too, with further details: OpenHelix Unveils New Online Training Site, Subscriber Services.

Video Tip of the Week: Protein structure information for public outreach. Really.

This week’s tip isn’t about a specific tool–but a really interesting look at how a tool was used in the context of some general public outreach messaging. Recently I posted about Aquaria, a new tool available to let biologists explore protein structures, mutations, and domains in user-friendly ways. But an interesting example of how the information about protein structures can be used to drive understanding came from a video animation of protein accumulation in Alzheimer’s. Just have a look at the video first and enjoy it. How cool is that clathrin basket pulling the vesicle in?

Description from their brochure at the launch [PDF]:

Christopher Hammang’s “Alzheimer’s Enigma” which explores the neurons of the human brain, and reveals how normal protein breakdown processes become dysfunctional and result in plaque formation during Alzheimer’s disease.

I found out about it as I was looking at the upcoming VIZBI talks and exploring their site for other features. In the VizbiPlus section there are a number of excellent animations of molecular processes, and this video was one of them. Be sure to watch for other tweets with the #vizbi hashtag for the next few days. I bet you’ll see some amazing tools and visualizations, as always.

Recently I mentioned the longer, more comprehensive, video from the Aquaria team, but I didn’t use that for my tip–I just used the short version overview. But the longer version had this extra bonus piece of how their software had been used by this animator. Here is Christopher Hammang, creator of this video, describing how he used the Aquaria information to generate the structural model for his animation:

Often it helps people to see how someone else used a tool for a project to get a better grasp of it. And this seemed like such a compelling and unusual example, I wanted to highlight it.

So again I’ll point you to the Aquaria tool tip from earlier this month to explore more, now with an understanding of an example of its use. But I would also encourage you to have a look at the other animations coming out of VIZBI at the VizbiPlus page. I swear, the animated intestine is way cooler than you might expect. The diabetes + insulin receptor videos are really informative and helpful. A cancer video illustrates a misbehaving p53.  Go look.

Quick links:

VizbiPlus videos:

VizbiPlus Poster from Hammang and team: Alzheimer’s Enigma: Putting the Pieces Together

Vizbi Posters:


O’Donoghue S.I., Kenneth S Sabir, Maria Kalemanov, Christian Stolte, Benjamin Wellmann, Vivian Ho, Manfred Roos, Nelson Perdigão, Fabian A Buske, Julian Heinrich & Burkhard Rost & (2015). Aquaria: simplifying discovery and insight from protein structures, Nature Methods, 12 (2) 98-99. DOI:

Video Tip of the Week: Designing proteins, using Rosetta

As often happens, last week’s tip on visualizing structures led me to some more reading and thinking about creating protein structures. And although it’s important for biologists to be able to use more of the information about protein structures and variations in their work from tools like Aquaria or PDB, it’s also important for some researchers to be on the other end of the pipeline and actually making the protein structures. Further, this also leads to the possibility of better designs of novel proteins as therapeutics–for example, making antibodies like the ones that could possibly battle Ebola.

As I looked around for protein design software to highlight for a tip, it was clear to me that the level of complexity of the problems in designing proteins didn’t really lend itself to short videos. There are some introductory seminars and tutorials on the Rosetta tools, but these certainly require a bit of time to explore. Instead, I’ve decided to highlight this really nice overview on the aspects of protein design that you would have to tackle to make customized proteins.

This iBiology “Introduction to Protein Design” by David Baker is really well done. There’s also a second seminar that is more detailed about designing proteins with new functions to solve many problems in biomedical research and environmental challenges.

This seems incredibly important and useful–but certainly daunting to get started. One way to get a head start on this would be to take an intro workshop. I was recently notified about the opportunity to learn from a couple of researchers who are very skilled with the Rosetta tools–Daisuke Kuroda and Jared Adolf-Bryfogle.

I’m including in the references a nice review of the basics of computational design of antibodies by Kuroda et al. And also a paper by Adolf-Bryfogle and team that covers important aspects of the component parts of antibodies that you would need to predict structures and design new ones, which are stored in the database they’ve created. This should give you a sense of the challenges and opportunities. And give you a good foundation for the concepts.

RosettaCommonsLogoRosetta software has been a powerhouse of protein design for many years. It’s been a leader in the CASP competitions (Critical Assessment of protein Structure Prediction). It’s got a strong user community: Rosetta Commons. You can obtain and use the software in a variety of ways, including some servers for academic use, and one important stop would be the ROSIE servers, “The Rosetta Online Server that Includes Everyone hosts several servers for combined computer power as a free resource for academic users.”

Quick links:

ROSIE servers: (note there’s a specific protocol section there that covers antibodies)

Rosetta Commons:


Short course:  Designing Antibodies with Rosetta Sunday May 3 2015. Early registration discount ends soon.


Kuroda D., H. Shirai, M. P. Jacobson & H. Nakamura (2012). Computer-aided antibody design, Protein Engineering Design and Selection, 25 (10) 507-522. DOI:

Adolf-Bryfogle J., Q. Xu, B. North, A. Lehmann & R. L. Dunbrack (2014). PyIgClassify: a database of antibody CDR structural classifications, Nucleic Acids Research, 43 (D1) D432-D438. DOI:

Rybicki E.P. (2014). Plant-based vaccines against viruses, Virology Journal, 11 (1) 205. DOI:

Note: OpenHelix is a part of Cambridge Healthtech Institute.

Video Tip of the Week: Aquaria, streamlined access to protein structures for biologists

This week’s Video Tip of the Week is Aquaria, a new resource for exploring protein structures, mutations, and similarities to other proteins. It’s a very well-designed and interactive experience for end users. It is aimed largely at biologists who could benefit from exploring the structural details of their proteins of interest, but are daunted by tools aimed at structural biologists. But for tool developers, you should also look at how this rollout went. It’s one of the best examples of a tool launch I’ve seen in this field. And I’ve seen a lot.

So first, the tool. Aquaria offers users a streamlined way to access and explore protein structures. Combining the kinds of information you get from the PDB structure resources, and additional details like the UniProt mutations. Currently you start with a basic search by asking for a protein by name, or PDB or UniProt ID. They have pre-calculated the relationships of proteins in PDB and Swiss-Prot to quickly offer you a structure and related proteins. The paper notes: “Currently, Aquaria contains 46 million precalculated sequence-to-structure alignments, resulting in at least one matching structure for 87% of Swiss-Prot proteins and a median of 35 structures per protein….” In addition, it lets you explore other important biological features such as InterPro domains, post-translational modifications, so you can think about how the mutations + structures + functions impact a given protein that you are interested in. As they describe it:

“We have loaded SNP data from Uniprot and Interpro so you can see where the mutations lie on your 3D model. And we have found that you may be pleasantly surprised to find your mutations clustering in 3D space!”

The Aquaria folks provided an intro video to get you started:

Another handy feature they provided is a Quick Reference Card with shortcuts to the functions [PDF]. In addition to this intro, they have a longer video as well. This is more like a typical lecture with the background, the framework, the goals of the project, and more about the underlying database.

Now, this thing about the rollout of this software project. I found it when I was looking over the talks at the upcoming VIZBI conference (Visualizing Biological Data). Every year I find there are awesome ideas that come out of VIZBI, and tools I want to explore. Among them this year is Aquaria. So I went looking for more detail, and found some of the traditional stuff. The paper (below), the press release, etc. And then I found the Reddit discussion. The Aquaria team did a Science AMA on this tool. It engaged a range of folks–some folks just fans of science who had probably never seen protein structures before. That’s fine with me–the more folks who appreciate research and learn about how researchers explore proteins is a good thing. But others had good technical questions for the team–such as other ways to find proteins of interest with sequence searches, or integration with other tools like UCSC Genome Browser. All the answers are over there. I enjoyed the question about the name of the tool:

It seems you get the ideas we had in mind: using Aquaria lets us observe these fascinating creatures (proteins) from the natural world. Aquaria creates an artificial environment and lighting where we can observe isolated proteins; like aquarium fish, proteins are often beautiful and (usually) live in water.

I asked them about how this played out, and they had ~1000 folks visit their site as a result of this Reddit event. That was really interesting to me, and a very neat route to drive awareness.

They also provided a way to support users with one of my other favorite resources–Biostars. They created a support thread there where uses can ask questions and get answers. I so prefer this to mailing lists, and I’m glad to see this easy method to get support. In fact, I asked something that I couldn’t quite figure out yet.prot_structure_sample (Here’s the protein I was looking at: I wanted to see all the subunits in full color, you de-select autofocus to do that. And color by chains for this version.)

Also, for the developer types: they offer a way for you to interact with the Aquaria software to add your own features of interest with their API. Maybe you have new mutations you have found in some sequence you’ve obtained in your lab, for example. They are offering guidance on that here: They touch on this in the longer video (~27min) if you want a bit more explanation. I suspect from the high quality support they are offering, they’d be interested to hear from you and what features you’d like to see applied to these proteins as well.

So kudos to this team for a nifty tool and really serious multi-media outreach efforts. I think it was well done on all counts. I’ll bet you Reddit reached more of the right folks than a press release ever will. PIOs take note–get your scientists on Reddit.

Quick links:

Aquaria site:

Reddit Science AMA:

Biostar support thread:

O’Donoghue S.I., Kenneth S Sabir, Maria Kalemanov, Christian Stolte, Benjamin Wellmann, Vivian Ho, Manfred Roos, Nelson Perdigão, Fabian A Buske, Julian Heinrich & Burkhard Rost & (2015). Aquaria: simplifying discovery and insight from protein structures, Nature Methods, 12 (2) 98-99. DOI:

Video Tip of the Week: Beacon, to locate genome variants of potential clinical significance

This week’s Video Tip of the Week follows on last week’s chatter about the Internet of DNA. As I mentioned then, the Beacon tool we touched on was going to get more coverage. So this week’s video is provided by the Beacon team, part of the larger Global Alliance for Genomics and Health project (GA4GH).

I’ve touched on some of the GA4GH work in the past. I heard more about a very interesting piece of it from David Haussler at the recent TRICON meeting.

D. Haussler, slide from TRICON talk.

D. Haussler, slide from TRICON talk.

The talk was called “Stable Reference Structures for Human Genome Analysis” and it was important for me to see this. I’ve been wrestling with some of the literature (linked below) that describes ways to represent genome variations among massive numbers of humans. It really helped me to hear it described and shown as cartoons on slides that were less like equations. And how this will play out in graphs and visualizations with software tools is of particular interest to me.

So one branch of the Data Working Group of the GA4GH is tasked with how to represent the variations as multiple paths as graphs, instead of the one linear reference genome we think of today. It has to accommodate many types of variations–inversions, deletions, duplications, as well as just SNPs. So, as the kids say today, it’s complicated. But we have to figure it out. Stay tuned, I’m sure we’ll be talking more about this in the years to come.


Beacon is like SETI for genome variations.

Another branch of this project is tasked with trying to figure out how to share genomic data among all the international producers of this data. If we can’t share the data, we won’t be able to look at the variations among humans and learn from them, nevermind display them. This has additional layers of social and legal complexity we are just beginning to face. As a first pass at sharing this data, a “Beacon” system has been implemented to help researchers locate variations of interest to them.

You should read up on the whole Beacon philosophy and see its current implementation at their site. From what I gather, it is a minimal way to share genome information, without incurring privacy and consent barriers that might be hit if you were pulling down a whole genome. You can query any site that implements a Beacon to ask: do you have a variation at this position? And the Beacons can respond with “yes” or “no”. If there are useful variations, you can then pursue them from there, and if you need access to more you can go through the channels then. But at least you’ve possibly found some needles in some haystacks that you might not have known about otherwise.

The Beacon team has done a short video explaining this. It has no audio, just explanatory text with the graphics. Marc Fiume gave me permission to embed it here.

The “Beacon of Beacons” aggregates the query to send it out to all the known Beacons. You can use it today to search for this kind of data. The video also notes that you can cloak the name of the institution to protect patient privacy.

I have been more acutely concerned about genomic privacy issues than some of my cohorts in this arena. And I fully accept that there will not be privacy–what I want is protection from misuse of the information, which I find lacking in the US legal framework right now. That said, I think that Beacon is a nice work-around for that. If I had a variant of concern, I could ping these other sites to see if others had it. Or vice-versa. But the framework under which the donor of that material provided the data would not be pierced. This makes total sense to me, and I can accept this strategy.

Sharing the genomic data from sequenced individuals is going to be tricky and complex. But I’m keen to see the GA4GH group tackle it. I like several of the directions that I’ve seen so far. But right now–check out Beacon. Implement one if you have this kind of data, and let’s see if it works.

Quick links:

Global Alliance for Genomics and Health:

Beacon (project details page):

Beacon of Beacons (where you would do a search):


Nguyen N., Glenn Hickey, Daniel R. Zerbino, Brian Raney, Dent Earl, Joel Armstrong, W. James Kent, David Haussler & Benedict Paten (2015). Building a Pan-Genome Reference for a Population, Journal of Computational Biology, 150107093755006. DOI:

During David Haussler’s talk, he also referenced these papers:

Video Tip of the Week: CRISPRdirect for editing tools and off-target information

Great RCSB PDB molecule-of-the-month page on CRISPR

Great RCSB PDB molecule-of-the-month page on CRISPR

Genome editing strategies are certainly a hot topic of late. We were astonished at the traffic that the animation of the CRISPR/Cas-9 process recently drew to the blog. There’s a huge amount of potential for novel types of studies and interventions in human disease situations–but I’m already seeing applications in agriculture coming along. There’s an edited canola available in Canada already. China has edited wheat for disease resistance. There’s a project underway to remove horns from cattle–by merely snipping out a bit of sequence with TALENs/ZNF strategies. They’ve already created cattle with edited myostatin too.

To accompany this work, new software tools have been developed to help design target sequences and evaluate potential off-target situations. Both TALEN target software tools exist, and CRISPR tools exist. For this post I’ll be focusing on just one of the CRISPR tools, but I’ll list a few others as well. Some sites have incorporated both options in their software tools. Some will have a small range of species, some have larger sets. So part of choosing a tool is asking about the genomes it supports. In future Tips we may explore some of the others. There is something of a flood of these tools coming along, and I’ll continue to explore them.

This week’s focus is CRISPRdirect. A Japanese group has created this tool for generating a guide sequence and for evaluating potential off-target activity. This introductory video (with music, and with English annotations to convey the features) will give you an overview of the functions.

It seems to be an easy-to-use interface, with effective organization of the results. They have a nice range of species to examine–not only some of the mammalian genomes, but fish, chicken, worm, plants, and yeast too. There’s a graphical viewing component and an easy export option as well.

So I’ve come across a few tools in my search, but if you have favorites please feel free to add them below in the comments. I’m going to continue to look into these tools and will be looking to highlight others in the future.

Quick link:


A few links to other tools I’ve been looking at:



TAL Effector Nucleotide Targeter 2.0:


ZiFiT Targeter software (TALEN/ZNF/CRISPR support):


CRISPY (specific for CHO cells):


Naito Y., K. Hino, H. Bono & K. Ui-Tei (2014). CRISPRdirect: software for designing CRISPR/Cas guide RNA with reduced off-target sites, Bioinformatics, DOI:

Video Tip of the Week: RStudio as an interface for using R

Although typically we focus on databases and algorithms in use in bioinformatics and genomics, there are some other tools that support this work that are crucial as well. The statistical software and computing tools associated with R fall into this category. Increasingly RStudio is being adopted by folks in genomics, and although we talked about R in the past, I hadn’t highlighted the RStudio interface before. But this really lowered the barrier to entry, and has changed the way to use R effectively, and it’s time to include this in our Video Tips of the Week.

In a previous tip we highlighted some training on R that was delivered in a webinar, by Heather Merk of Ohio State. So if you need an overall Introduction to R Statistical Software, that’s a good place to start. When you are ready to begin to work with R, though, you should consider trying out RStudio.

This overview video will demonstrate the basics of the interface for RStudio.

RStudio Overview – 1:30 from RStudio, Inc. on Vimeo.

There’s more detail on many of the features of RStudio that they provide as well. And their Vimeo channel has a few more videos as well. Another thing about using RStudio is that there’s increasingly additional types of support coming from that front. A popular tip we did was on Slidify to make sides directly from RStudio.

RStudio is not just for genomics, though–it’s widely used in many fields that engage in statistical analysis. I was surprised to not find a lot of references to it in PubMed yet–some guidance and explainers in biotech, but I know it’s being widely used. You can see a lot of examples in use in Google Scholar. This includes several enthusiastic uses of RStudio in teaching situation: An Attractive Template of a Reproducible Data Analysis Document for an Awesome Class Project; and Teaching precursors to data science in introductory and second courses in statistics. I did find reference to a software review in an economics publication. And you can get a book to help if that’s how you like to learn more as well.

But if you haven’t had a chance to check out RStudio yet, I’d recommend it.

Quick links:



RSeek: an R-specific search engine (hat tip Elana Fertig’s handy intro slide deck)


Gandrud, Christopher. Reproducible Research with R and R Studio. CRC Press, 2013.

Racine J.S. (2011). RStudio: A Platform-Independent IDE for R and Sweave, Journal of Applied Econometrics, 27 (1) 167-172. DOI:

Fertig, E. (2012) Getting Started in R.

Video Tip of the Week: IntOGen, for Integrative OncoGenomics

When I’m looking for upcoming Tips of the Week, frequently one tool or paper will lead me to looking at other related tools in that sphere. Last week’s tip on COSMIC got me looking through cancer genomics resources, and one of the others that I came across is IntOGen, “Integrative Onco Genomics”.

Their Nature Methods paper has a nice summary of their goals:

The IntOGen-mutations platform ( summarizes somatic mutations, genes and pathways involved in tumorigenesis. It identifies and visualizes cancer drivers, analyzing 4,623 exomes from 13 cancer sites. It provides support to cancer researchers, aids the identification of drivers across tumor cohorts and helps rank mutations for better clinical decision-making.

The paper has a nice description of the features that they have incorporated into their interface. They have different mutation-calling workflows that are run, and then scores are derived from them. There is a searchable component–where data from projects like TCGA and ICGC have been assessed with these tools and you can explore the outcomes with their browser function. In this video from their team you can get a sense of how to search and examine the data [no audio, annotated with text for guidance]:

You can also upload samples that you have to examine and compare in their Mutation Analysis component. There are separate additional videos that they offer to help you to perform this type of analysis at their site. Here’s one of them:

I had actually been at their site recently looking at a blog post about mutation diagrams: How to generate mutation distribution and frequency plots? People who read our blog seem to really gravitate towards tools that help them to visualize various features of the research they are doing. Domain drawing, motif representation, and sets are popular. Recently one of the posts I did highlighting a domain and mutation mapper at the cBio Portal got some traction. According to the blog post, the team from IntOGen decided to create a helpful similar tool that they can incorporate in an upcoming release of IntOGen. So it’s great to see new ideas and new tools coming along as well.

The tools they offer look very nice, check them out. And I’m trying not to be jealous of their location and photos as I continue to shovel snow….

Quick links:


Gonzalez-Perez A., Christian Perez-Llamas, Jordi Deu-Pons, David Tamborero, Michael P Schroeder, Alba Jene-Sanz, Alberto Santos & Nuria Lopez-Bigas (2013). IntOGen-mutations identifies cancer drivers across tumor types, Nature Methods, 10 (11) 1081-1082. DOI:

Video Tip of the Week: COSMIC, Catalogue Of Somatic Mutations In Cancer

When we do workshops at medical centers, one of the most common questions I get is about locating good resources for cancer data. And we’ve talked about some of the large projects, like the ICGC. We’ve talked about ways to stratify data sets, and one example of this was in cancer, using data from The Cancer Genome Atlas.  Going forward, the ability to rapidly sequence normal vs tumor pairs should help us to even more rapidly understand and target tumors. And this will lead to other cases of entirely new leads in some situations.

But one of the really solid tools that I like to be sure to highlight for people is the COSMIC collection. It’s not new–it’s been around for a decade now. But it’s one of those types of core data resources that people really need to know about. Their long experience, their high quality curation, and their adaptations to new influxes of data volumes and data types, make them a really valuable source of information.

Reading their update paper in the 2015 NAR Database issue, I wanted to go over and refresh my memory of the features I knew, and explore some of the newer features too. There really is some serious depth over there, and I can’t touch on all of the aspects that they have in a blog post like this. But I also discovered that they’ve recently provided a number of videos to help people learn about the various tools and options.

For this week’s Video Tip of the Week, I’ll include their “overview” piece. But you should check out their Tutorials page for additional topics as well.

One feature that I hadn’t realized is that they offer was a Genome Browser using the JBrowse framework.  There’s a separate video with some guidance on how to use that.

Their future directions section in the paper makes it clear they are preparing to be able to handle the incoming data on this topic. And they are evaluation new tools and analyses that may be appropriate. But they commit to maintaining their strong emphasis on curation–which is music to my ears. I think quality hand curation is simultaneously undervalued by end users (and sadly by funders), while being entirely critical to handling all the big data that’s coming. So get familiar with COSMIC for cancer genomics data. It will be worth you time.

Quick link:



Forbes S.A., D. Beare, P. Gunasekaran, K. Leung, N. Bindal, H. Boutselakis, M. Ding, S. Bamford, C. Cole, S. Ward & C. Y. Kok & (2014). COSMIC: exploring the world’s knowledge of somatic mutations in human cancer, Nucleic Acids Research, 43 (D1) D805-D811. DOI:

Video Tip of the Week: Helium plant pedigree software, because “Plants are weird.”

A lot of people find our blog by searching for “pedigree” tools. We’ve covered them in the past, and we’ve got some training on the Madeline 2.0 web tools that we like. We have groused about the fact that some pedigree tools do not accommodate same-sex families. Largely focused on human relationships, there are a variety of options.

Another branch of this type of software is animal colony management software. This can be used to track animals in breeding situations. We’ve highlighted The Jackson Lab’s Mouse Colony Management Software, and we see a lot people going over to take a look. But there are other types of breeding software out there too.

Plant pedigrees are a special challenge, though. Although I did begin to look into that software at one point, I hadn’t looked again for a while. So when I saw the announcement about an upcoming talk at the  Bio-IT World conference, I thought it was time to look again. Helium was new to me, and I admit I laughed out loud at my first introduction to it:

BioVis 2013: Poster: Evaluation of Helium: Visualization of Large Scale Plant Pedigrees from VGTCommunity on Vimeo.

“Ok, so, plants are weird….” Best poster intro I’ve heard.

But really, the potential complexity of plant breeding pedigrees is much more daunting than even tricky human pedigrees. Their paper on the Helium efforts (linked below) describes some of those aspects in more detail:

Firstly, the named entities in plant pedigrees may, but not always, represent a population of genetically identical individuals, not a single plant. While it is relatively simple to grow many plants from seed, potentially many decades after production, in humans and animals this is understandably not the norm. The generation of these genetically identical (homozygous) varieties is possible through doubled haploidy, inbreeding, or crossing of pairs of inbred lines to achieve what is termed an F1 hybrid. Successive inbreeding by self-pollination of these F1 generation plants leads to individual plants that are close to homozygous across all alleles.

There are no standards for plant pedigrees yet, I learned from this paper. Zoiks! Well, I guess that gives them free rein to design something that users want. The folks on the Helium project got a bunch of potential users, asked them what they needed, what worked, what didn’t work, and they are building a nice looking tool with the specs they got. Their paper goes on to describe their paper prototyping, the feedback, and other interactions they got further downstream in the process. It’s a nice example of how to get some direction from the likely end users.

Another video offers a bit longer view on their software, but there’s no audio (below). The most detailed video is the one attached to their paper in the supplemental files, but I can’t embed it. Go over there to download and watch that, with captions about what’s happening.

I wasn’t able to find any downloadable software yet to kick the tires myself. And because of the blizzard I’m worried I won’t have power for the next few days to check it out. But from what I can see and read in the paper, it looks promising and I’m eager to try it out at some point. Looking forward to Jessie Kennedy’s talk.

Quick link:

Helium project page:

Best intro video version, with explanation captions:

This is the item that caught my eye, via email. I’m going to be at Bio-IT World, so I’m hoping to be able to see this presented live.

Dr. Jessie Kennedy to Deliver Keynote Presentation on Visualization Tools Designed for Biologists at 2015 Bio-IT World Conference, as part of the Data Visualization and Exploration Tools Track.

Jessie KennedyKeynote Presentation: Pedigree Visualization in Genomics
Jessie Kennedy, Ph.D., Professor & Director, Institute for Informatics and Digital Innovation, Edinburgh Napier University Most visualizations that display pedigree structure for genetic research have been designed to deal with human family trees. Animal and plant breeders study the inheritance of genetic markers in pedigrees to identify regions of the genome that contain genes controlling traits of economic benefit and, ultimately, to improve the quality of animal and plant breeding programs. However, due to the size and nature of plant and animal pedigree structures, human pedigree visualizations tools are unsuitable for use in studying animal and plant genotype data. We discuss two visualization tools, VIPER (designed for cleaning genotyping errors in animal pedigree genotype datasets) and Helium (designed to visualize the transmission of alleles encoding traits and characteristics of agricultural importance in a plant pedigree-based framework), and show how they support the work of biologists.

Early Registration Rates Available Now!
Register by January 30 to Save up to $400



Shaw P.D., Martin Graham, Jessie Kennedy, Iain Milne & David F Marshall (2014). Helium: visualization of large scale plant pedigrees, BMC Bioinformatics, 15 (1) 259. DOI:

Note: OpenHelix is a part of Cambridge Healthtech Institute.