Author Archives: Trey

Don’t mind the dust

building_under_constructionGreetings readers of the OpenHelix Blog.

You might have noticed a large change to the look and features of the OH blog. These are temporary. With the update of Word Press, some of the functionally of the old theme has become incompatible and unworkable. We are working on a new and better theme and appearance, but in the interim, we’ve reverted the blog to a default theme. Some of our old functions and features are temporarily missing, but the meat of the blog remains. You will still be able to read our posts, comment, etc.

If you do notice anything amiss or missing, please take the time to comment here or email us.

Thank you!

Tip of the Week: Gemini, exploration of genetic variation

You Tube:

This week’s tip of the week is on Gemini which is the acronym for “GENome MINing.” Unlike most of the tips we give every week, this one is a software package. But, it is does use and integrate with many internet databases such as dbSNP, ENCODE, UCSC, ClinVar and KEGG. It’s also a freely available, open source tool and quite a useful software package that gives the researcher the ability to create quite complex queries based on genotypes, inheritance patterns, etc.  The above 12 minute clip is a talk given at a conference that gives a introduction of the science behind the tool.

The abstract from the recent paper from the developers gives a good introduction concerning the functionality of the tool:

Modern DNA sequencing technologies enable geneticists to rapidly identify genetic variation among many human genomes. However, isolating the minority of variants underlying disease remains an important, yet formidable challenge for medical genetics. We have developed GEMINI (GEnome MINIng), a flexible software package for exploring all forms of human genetic variation. Unlike existing tools, GEMINI integrates genetic variation with a diverse and adaptable set of genome annotations (e.g., dbSNP, ENCODE, UCSC, ClinVar, KEGG) into a unified database to facilitate interpretation and data exploration. Whereas other methods provide an inflexible set of variant filters or prioritization methods, GEMINI allows researchers to compose complex queries based on sample genotypes, inheritance patterns, and both pre-installed and custom genome annotations. GEMINI also provides methods for ad hoc queries and data exploration, a simple programming interface for custom analyses that leverage the underlying database, and both command line and graphical tools for common analyses. We demonstrate GEMINI’s utility for exploring variation in personal genomes and family based genetic studies, and illustrate its ability to scale to studies involving thousands of human samples. GEMINI is designed for reproducibility and flexibility and our goal is to provide researchers with a standard framework for medical genomics.

If you’d like to learn more, there is some pretty good documentation of the software package here.

While I’m at it, and totally unrelated except it’s human genomics, there is this slideshare presentation of the ‘current’ state of personal genomics. Current is in quotes because the slideshare is actually from 3 years ago, but there is a lot of good information in there. Anyone know of a more up-to-date slide set or extensive intro to the current state of personal genomics science similar to this?


Relevent Links:

GEMINI Software package
UCSC Genome Browser

(tutorials are linked below for those tools in bold above)

Relevant Reference:

Paila U, Chapman BA, Kirchner R, & Quinlan AR (2013). GEMINI: Integrative Exploration of Genetic Variation and Genome Annotations. PLoS computational biology, 9 (7) PMID: 23874191

Tip of the Week: Yeast genome? There is an app for that!

GBrowse navigation basics tutorial from yeastgenome on Vimeo.

The Saccharomyces Genome Database (SGD) has several short video tutorials that introduce basic navigation (shown above), expression data and more.  Each of these tutorials are short, 1-2 minutes, and there are 21 of them (15 on yeastmine alone). If you want to go further in depth, we have a tutorial on the Saccharomyces Genome Database (SGD) (subscription)also that is about an hour long, modular and includes exercises. We have also done tips on Yeastmine and other SGD related tools (open access). You can also find a tutorial at OpenHelix on GBrowse, which is the browser used at SGD. And there is this short 5 minute GBrowse video tutorial.

So, a yeast researcher has no lack for video training on using SGD.

But, today I wanted to introduce you to SGD’s new app for the mobile researcher: YeastGenome App. The app has some pretty decent functionality. As their FAQ enumerates:


“Saccharomyces Genome Database by gene name or keyword to find fundamental information about your favorite gene. Browse the database by feature type and quickly view fundamental information, sequence information, Gene Ontology, interactions, phenotypes, and references associated with the terms.

Does the app have any other special features?
Yes, you can use the app to save your favorite genes in a convenient list. You can also e-mail yourself or your friends and colleagues any information you find about yeast genes in the Saccharomyces Genome Database.”

I’ve had a go at the app for a bit, and it makes browsing and searching yeast genome data pretty convenient and easy. The app was reported in this year’s Database issue at NAR and it gives a good rundown of the app. Don’t need convincing? Then you can go right to iTunes and get it now.

But this reminds me, I did a feature on mobile apps for genomics research last year that reviewed GeneWall, Wowser and MyGenome and the year before that introduced an app for “moving molecules”. THis new app and several I’ve seen in the interim since those posts suggest that perhaps it’s time to do a new post on mobile apps available for genome research.  Perhaps that will be the next tip of the week from me in a few weeks.

Tip of the Week: Prezi and other nonlinear presentation methods

Unable to display content. Adobe Flash is required.

Though not specifically about genomics, today’s tip of the week will hopefully help you find alternative methods for presenting science that might at times work better than a straight linear slidshow presentation. I have a quick intro above to Prezi, but also see Prezi site for Prezi presentations about Prezi use. Prezi (and other similar programs) are not for every presentation, but it does offer an alternative for some times of presentations that would work better in a less linear presentation style than a slide show (wikipedia article). At least it can be a nice change after a slew of slide presentations.. something to grab the audiences’ attention. Of course, like with slide shows you don’t want to get their attention because it’s a bad presentation.

What is a Prezi? (or similar method). Basically, a prezi is somewhere between a slide show and a white board. It’s a canvas that zoomable. That gives it some advantages (and disadvantages) that slide shows don’t have. First, look at some of the best Prezis of 2012 (as decided by Prezi :) to get an idea of what can be done. Here is one on the 2008 election results.

Prezi’s are sharable in the cloud (or private, or semi-private) and you can work in groups to edit them. Also, you can present them online or download a file that allows you to present it without an internet connection. I’ve found that the file has worked for me regardless of OS or computer.

Prezi is free if all you want to do is create, edit and share presentations, but if you want to create private presentations or use your own logo, etc.. it will cost anywhere from 5-13 dollars a month.

There are alternatives to Prezi that are free or open source. I personally like Prezi best, it has the most features and simple to use, but there are some others that are good. My favorite of the alternatives is Impress.js (downloaded software) and the editor Impressionist (can do 3d!). So if you like the idea of a Prezi-like presentation but want a more open alternative, that might be one to try.

Tip of the Week: Transfac (and HGMD, Proteome, etc)

BioBase is a provider of expert-curated biological databases. Two well known BioBase databases are TransFac and HGMD. Both have publicly available data (see previous links), but if you go to the BioBase site, you’ll find there are subscription based data access also for more feature-rich access. HGMD is the Human Gene Mutation database and ” represents an attempt to collate known (published) gene lesions responsible for human inherited disease.” TransFac on the other hand “provides data on eukaryotic transcription factors, their experimentally-proven binding sites, consensus binding sequences (positional weight matrices) and regulated genes.” As you can tell from a search of our blog, HGMD is often cited as a good location for human disease data, as TransFac is for TFBS.

BioBase has a series of video tutorials for both TransFac and HGMD (and more for the other databases such as Proteome, Genome Trax and ExPlain). For this weeks tip of the week, we’ve embedded two video tutorials.

This first explains MATCH, an analysis tool in TransFac to predict binding sites for Transcription Factors in a particular DNA sequence.



The second video tip is a quick tutorial on how to get started with searching HGMD


If you are interested in advanced searching of these two databases, or Genome Trax, Proteome or ExPlain, check out the video tutorials from BioBase.

Tip of the Week: FlyBase

I have a soft spot for Flybase. My Ph.D. work used Drosophila and I’ve used Drosophila species to teach after that. Something about Dipteran  genetics fascinates me. FlyBase is also one of the older genetics and genomics databases and we’ve got a tutorial on it. Today’s tip is their 12 minute video of FlyBase for Undergrads. One of the things I always believed is that the databases and analysis tools we train on and come across in our daily work are excellent places for teaching and learning genomics for undergraduates. Lots of data and lots of analysis that would make very interesting projects and experiences that an undergraduate could do.

Today’s video starts off with a kind of silly live-action sequence :D, but fun silly, and walks through FlyBase on an introductory level. Check it out.

They have a  youtube channel with two additional (and much shorter) videos on using TermLink, a controlled vocabulary search tool, and an introduction to Fast-Track, a community paper curation tool.


Tip of the Week: NCBI Genome Workbench

Today’s tip is from NCBI. Specifically, NCBI’s Genome Workbench. The workbench is

…an integrated application for viewing and analyzing sequence data. With Genome Workbench, you can view data in publically available sequence databases at NCBI, and mix this data with your own private data.

It’s a useful program and they have a great set of videos to introduce you to the workbench’s functions and features. The video embedded here is the introduction, but they also have several additional videos including how to load a genome into the workbench, phylogenies and others. Check it out.

( forgive the delay of this week’s tip. Snow canceled work, and knocked out Internet access!)

Video Tip of the Week: PATRIC, Pathosystems Resource Integrations Center

PATRIC is a integration portal (as the name implies) of  data concerning disease-causing infectious bacteria. Or to put it in their words:

PATRIC is the Bacterial Bioinformatics Resource Center, an information system designed to support the biomedical research community’s work on bacterial infectious diseases via integration of vital pathogen information with rich data and analysis tools.

We mentioned PATRIC at the beginning of the year in a SNPpets. Also, recently I was speaking with a threat abatement specialist who was lamenting the lack of coordinated data on infectious bacteria genomes. I was sure there was such a site, so we checked our blog here and voila, sure enough, exactly what they needed.

PATRIC indeed coordinates a lot of different types of data from disease-causing infectious bacteria. This includes data from all NIAID biodefense A/B/C pathogens. This includes hundreds of genomes from many isolation sources. For example, as of this writing there are nearly 500 genomes, including 57 complete, of Escherichia. In addition to genomic data, there are many other types of data including phylogenetic, host-pathogen protein-protein interactions, protein, pathways and more. One interesting feature, of many,  is the disease map (for mycobacterium only right now) that shows local outbreaks and alerts. There are many tools to access and analyze this data from specialized searches to browsers.

To get a good idea of what is available at PATRIC, check out the quick intro video embedded above from the PATRIC developers. They have two other video tutorials on the feature table and identifying novel proteins you also might want to check out. Also, check out the blog for more databases and resources for infectious disease pathogens.

To cite or learn more about PATRIC, see:

Gillespie, J., Wattam, A., Cammer, S., Gabbard, J., Shukla, M., Dalay, O., Driscoll, T., Hix, D., Mane, S., Mao, C., Nordberg, E., Scott, M., Schulman, J., Snyder, E., Sullivan, D., Wang, C., Warren, A., Williams, K., Xue, T., Seung Yoo, H., Zhang, C., Zhang, Y., Will, R., Kenyon, R., & Sobral, B. (2011). PATRIC: the Comprehensive Bacterial Bioinformatics Resource with a Focus on Human Pathogenic Species Infection and Immunity, 79 (11), 4286-4298 DOI: 10.1128/IAI.00207-11

Video Tip of the Week: Let Allie help you figure out those acronyms

Just ask Allie.

As I mentioned in a post nearly 5 years ago, I dislike acronyms. From my time in the army, which speaks in a language of acronyms, they’ve made me cringe. Of course, genomics databases aren’t without their acronyms from DBTSS to STRING so I’ve got used to them. As you’ve might of read, I’m taking a leave for a AAAS Science Policy fellowship (science and computing education at NSF). Working here allows me to work with many other fellows from NIH, DOD, HHS and State. Just the locations reek acronyms (except State.. they’re special :D), but to make matters worse, I haven’t been to a meeting, hearing, discussion group, panel or talk in the last month where the language wasn’t nearly all acronyms. Or so it seems.

The same can be very true of scientific research papers. It’s difficult enough keeping up with acronyms in your own field, but stray even a bit into a field you aren’t  immersed in can sound foreign. In that post 4 years ago linked to above, I mentioned two databases of science acronyms, ARGH (love that acronym) and the Stanford Biomedical Abbreviation Server,  to help the researcher make sense of the acronyms they come across in reading research. Unfortunately, both seem to be no longer active.

Allie to the rescue. Allie is an life sciences acronym database. It’s computationally derived from Medline. Here is their description:

 Allie is a search service for abbreviations and long forms utilized in Lifesciences. It provides a solution to the issue that many abbreviations are used in the literature, and polysemous or synonymous abbreviations appear frequently, making it difficult to read and understand scientific papers that are not relevant to the reader’s expertise. Allie searches for abbreviations and their corresponding long forms from titles and abstracts in the entire MEDLINE®

There is a lot more to Allie than just looking up acronyms. Some of the suggested uses?

  • Users can search for the long forms of abbreviations or the abbreviations of long forms.
  • Bibliographic data which includes the inquired abbreviation or long form in titles or abstracts can be obtained.
  • Users can also obtain co-occurring abbreviations in titles and abstracts.
  • REST/SOAP interfaces are available which allow the users to call upon Allie from their scripts, programs, etc.
It’s all quite useful, though I like that I can find abstracts with certain acronyms and co-occuring abbreviations. Today’s tip will introduce you to Allie. Unfortunately, today the site was down temporarily (though I’m assured it will be up tomorrow) and I wasn’t able to create a tip. THe tip above is from the developers and is quite good. You can also access the video from GOTV.
What I’m looking forward to is someone putting together an app from their REST/SOAP interface that will type out acronyms’ long forms while someone is talking to me. That’d be wonderful :).

Yamamoto, Y. (2011). Allie: a database and a search service of abbreviations and long forms. Database : the journal of biological databases and curation, 9 (5) DOI: 10.1093/database/bar013

Tip of the Week: MaizeGDB Genome Browser & other videos

Occasionally we highly video tips and tutorials from other sites. Today I’d like to point to you MaizeGDB. Last year the folks at MaizeGDB highlighted their video tutorials in Database, As they state in their abstract:

Video tutorials are an effective way for researchers to quickly learn how to use online tools offered by biological databases.

We obviously agree here at OpenHelix. We have over 100 full tutorials and 200 tips in agreement :D.

Embedded above is their tutorial on using the MaizeGDB Genome Browser (v2). They have seven additional tutorials on their resources such as how to find a genetic map position of a locus and doing a phenotype search. On that page of tutorials they also have links to various other tutorials from others on Maize research and MaizeGDB.

You can learn more about MaizeGDB from our blog. We highlighted MaizeGDB in several posts of the last few years chronically their move to GBrowse,  it’s success and their experience.

Useful Links:
MaizeGDB tutorials
OpenHelix GBrowse Tutorial
Harper LC, Schaeffer ML, Thistle J, Gardiner JM, Andorf CM, Campbell DA, Cannon EK, Braun BL, Birkett SM, Lawrence CJ, & Sen TZ (2011). The MaizeGDB Genome Browser tutorial: one example of database outreach to biologists via video. Database : the journal of biological databases and curation, 2011 PMID: 21565781