Category Archives: Tip of the Week

Video Tip of the Week: NaviCell for custom interaction maps for systems biology

The onslaught of sequence data from a whole range of species and tissues continues, and certainly will for a long time. But moving from there to the level of understanding the interactions among the genes that contribute to the structures, behaviors, and phenotypes of the systems requires other types of supporting software. NaviCell is a tool that aims to help standardize and represent the key features of they systems biology and molecular interactions that we need to capture.

NaviCell came to my attention as I was watching the tweets from the recent #isb2014 conference: The International Society for Biocuration conference. Suddenly a bunch of excited attendees started chatting about it:

So I went to investigate. And I can see why they liked it. This excerpt from the abstract of their paper (linked below) captures the key aspects very nicely:

NaviCell is a web-based environment for exploiting large maps of molecular interactions, created in CellDesigner, allowing their easy exploration, curation and maintenance. It is characterized by a combination of three essential features: (1) efficient map browsing based on Google Maps; (2) semantic zooming for viewing different levels of details or of abstraction of the map and (3) integrated web-based blog for collecting community feedback.

So you can use this web-based interface to create, curate, and navigate around molecular interaction diagrams in a really useful way. I haven’t seen any video of this talk, but they do have a video that offers an overview of their tools.

They have maps available to get you started exploring the tools and features. You should also access their documentation. They have separate guides for general use and for creating and uploading your own maps. Their literature section also has a good collection of papers that will help you understand the context of their tool in the ecosystem of pathway and interaction data–but you can also read their paper for a nice description of it all. The standards efforts are important to grasp, and you’ll also need CellDesigner (which we covered in the past) to create your own maps.

Quick links:

Navicell homepage:



Kuperstein I., Cohen D.P., Pook S., Viara E., Calzone L., Barillot E. & Zinovyev A. (2013). NaviCell: a web-based environment for navigation, curation and maintenance of large molecular interaction maps, BMC Systems Biology, 7 (1) 100. DOI:

Funahashi A., Matsuoka Y., Jouraku A., Morohashi M., Kikuchi N. & Kitano H. (2008). CellDesigner 3.5: A Versatile Modeling Tool for Biochemical Networks, Proceedings of the IEEE, 96 (8) 1254-1265. DOI:

Video Tip of the Week: list of genes associated with a disease

I am currently in Puerto Varas, Chile at an EMBO genomics workshop. The workshop is mainly for grad students and the instructors are, for the most part, alumni of the Bork group. I gave a tutorial on genomics databases.

Anyway, the last two days of the workshop is a challenge, in teams of 3-4 advised by an instructor, students are to develop a list of genes associated with epilepsy. Obviously, this could be a trivial task, just go to OMIM or GENECARDS and grab a list. But this challenge requires them to go behind that and use the available data and make predictions. My team attempted, on my suggestion, some brainstorming techniques to ensure a more creative solution than they could come up with individually or just jumping into normal group dynamics. It seemed to work, their solution was quite creative and we will find out today how that worked.

That was my long way of saying, in the process we came across many databases of gene-disease information. above you will find a video of rat gene disease associations from RGD, often used of course to investigate human gene disease associations.

Below you will find a list of some excellent databases and resources to find similar lists:

Gene Association Database






Several NCBI resources

UCSC Genome Browser’s tracks for disease and phenotype

There are several others I’m sure, if you have a favorite not on this list, please comment.

Reference for RGD:
Laulederkind S.J.F., Hayman G.T., Wang S.J., Smith J.R., Lowry T.F., Nigam R., Petri V., de Pons J., Dwinell M.R. & Shimoyama M. & (2013). The Rat Genome Database 2013–data, tools and users, Briefings in Bioinformatics, 14 (4) 520-526. DOI:

Video Tip of the Week: EuPathDB

I love pathogens and parasites. Really, I do. I mean–not up very close. But their astonishing range and cleverness often earns my respect. How these small organisms can wreak havoc on larger systems, inveigle the larger organisms to fly, carry, or ship them around during appropriate points in their life cycle, and to cloak themselves from immune systems–it can really be amazing. The EuPathDB–the Eukaryotic Pathogen Database–connects you (safely) to many of these tricksy actors.

We’ve mentioned them a couple of times before, but we never had a tip of the week on their site. They store details on a whole bunch of organisms important for health, and provide a wide range of tools to interact with the data. And I recently realized how many videos they’ve provided to help you access the effectively use their resources.

First there’s an overview sort of introduction that will help you to understand what their mission is:

But there are also a number of other videos that are more specific for the types of tasks you will want to accomplish. Some are more like tours of the sites, some are more detailed looks at specific types of data that are available. I’ll include the PlasmoDB overview here, but the same software features and behavior would be true for all of the parasites they host (ha–get it??).

They are also adding new videos–the one that I noticed the other day was this new one on orthology information.

So if you need information on pathogen genomics, be sure to investigate the EuPathDB site.

Quick links:

EuPathDB main site:

EuPathDB YouTube Channel


Aurrecoechea C., Brestelli J., Brunk B.P., Fischer S., Gajria B., Gao X., Gingle A., Grant G., Harb O.S. & Heiges M. & (2009). EuPathDB: a portal to eukaryotic pathogen databases, Nucleic Acids Research, 38 (Database) D415-D419. DOI:

Video Tip of the Week: AGRICOLA for agricultural science searches

Although lots of attention and resources are focused on human genomics and disease, I keep track of a lot of work on agricultural genomics fronts as well. In many ways, the techniques and technologies of genomics are already paying real benefits–and some tools are even further along in important agricultural species–than they are for human health so far.

A frustrating thing, though, is that there can be important literature siloed in different places. There is a tremendous amount of information in PubMed, some of it covering ag species and technology, but there is a whole other resource with relevant research that you might also need to investigate if you are working on these ag topics. Some plants science tweeps on twitter were recently noting this problem:

I decided it was time to highlight AGRICOLA because of this (AGRICultural OnLine Access). And this week is also the week of a celebration of a hero of agriculture–Norman Borlaug. Winner of the Nobel Peace Prize in 1970 for his work in breeding plants with better properties. The #Borlaug100 festivities focused on the 100th anniversary of his birth. But you know what: I think Norm would want people to focus on the science. And to drive it forward. So I wanted to raise awareness of this catalog of ag science for researchers to use.

The AGRICOLA team describes their resource this way:

AGRICOLA (AGRICultural OnLine Access) serves as the catalog and index to the collections of the National Agricultural Library, as well as a primary public source for world-wide access to agricultural information. The database covers materials in all formats and periods, including printed works from as far back as the 15th century.

This video tour of how to use the NAL site comes from the UT iSchool folks. Have a look to learn about what you can expect to find and how to accomplish searches.

You may go right to the AGRICOLA search page. But you can also use a more recent interface from the top of the NAL site (National Agricultural Library) to accomplish your searches. There’s a quick search box right there that can help you get started faster. Or you can still access the other interface (NAL Catalog) from the navigation menu at the top:


Another amazing feature of the NAL resources is their digital collections. They even have thousands of beautiful watercolors and drawings of plants available. Really–go look at these: USDA Pomological Watercolor Collection.

Looking around for some kind of integrated resource (the love child of PubMed and AGRICOLA) I did notice that the Europe PMC (formerly known as UKPMC) site contains the literature from both sources. But it doesn’t have a lot of the tools I’m used to using with PubMed, so for some things I would still certainly go there. But still that’s a handy thing to keep in mind–you could cover both areas with a search from the Europe PMC site.

So I close this brief intro with a bit more Norman Borlaug. He was afraid that the progress on agricultural science could be prevented by folks opposed to this work. But let’s persist. And now–go do science. And “play it hard”.

Video details: Play it Hard – A Tribute to Dr. Norman Borlaug

Quick Links:

National Agricultural Library homepage:

NAL AGRICOLA directly:

Europe PMC:


Borlaug N. (2007). Feeding a Hungry World, Science, 318 (5849) 359-359. DOI:

Borlaug N.E. (2000). Ending World Hunger. The Promise of Biotechnology and the Threat of Antiscience Zealotry, PLANT PHYSIOLOGY, 124 (2) 487-490. DOI:

McEntyre J.R., Ananiadou S., Andrews S., Black W.J., Boulderstone R., Buttery P., Chaplin D., Chevuru S., Cobley N. & Coleman L.A. & (2010). UKPMC: a full text article resource for the life sciences, Nucleic Acids Research, 39 (Database) D58-D65. DOI:

Video Tip of the Week: ICGC portal for cancer genomics

A question at Biostar about cancer “gene sets” recently got me looking at one of my favorite data sources again–the ICGC, International Cancer Genome Consortium, and their data portal. Previous posts we’ve done were based on their legacy portal (which is still available on their site). They changed things up a bit with a release last fall, and I hadn’t covered those changes yet.

Conveniently, they have done a short video explaining how to access the data that they offer. They’ve continued to add new data, and to refine the software. You should check it out.

ICGC Data Portal Tutorial from ICGC on Vimeo.

In the past I found some really useful info to compare with a lung cancer cell line I had been examining. I saw the same mutation in actual tumor samples as had been found in this cell line years back. But there have also been publications recently that talk in more detail about the project and some interesting outcomes from data that’s been found there (linked below).

You really need to be mining these projects for data if they cover your research area. There’s a lot to learn that hasn’t been published yet–just be sure to read up on their usage policies before you deliver your great discoveries to the journals!

Quick link:

Data portal:

Project homepage:


Hudson (Chairperson) T.J., Anderson W., Aretz A., Barker A.D., Bell C., Bernabé R.R., Bhan M.K., Calvo F., Eerola I. & Gerhard D.S. & many others in a large consortium… (2010). International network of cancer genome projects, Nature, 464 (7291) 993-998. DOI:

Alexandrov L.B., Nik-Zainal S., Wedge D.C., Aparicio S.A.J.R., Behjati S., Biankin A.V., Bignell G.R., Bolli N., Borg A. & Børresen-Dale A.L. & many others in a large consortium…; (2013). Signatures of mutational processes in human cancer, Nature, 500 (7463) 415-421. DOI:

Gonzalez-Perez A., Mustonen V., Reva B., Ritchie G.R.S., Creixell P., Karchin R., Vazquez M., Fink J.L., Kassahn K.S. & Pearson J.V. & many others in a large consortium… (2013). Computational approaches to identify functional genetic variants in cancer genomes, Nature Methods, 10 (8) 723-729. DOI:

Video Tip of the Week: JANE, comparing phylogenies

Unable to display content. Adobe Flash is required.
When I was doing my Ph.D. in the ancient days of the Sanger Method sequencing and reading in the results with one hand on the keyboard and reading the GATCs on the read (and going to the lab in the snow uphill both ways), my purpose for slogging  through all that was to eventually get a phylogeny of the sequences of the retrotransposable elements I was studying. Why did I want that phylogeny? Because I was comparing the phylogeny of the retroelements to that of the species in which they reside. We were attempting to determine if these retroelements were stable within the taxa lineage (they are) or there was promiscuous horizontal transfer occurring. We did those comparisons, but it would have been nice to have a ‘cophylogeny reconstruction’ program :D. There are often times similar comparisons of phylogenies are necessary. Host-parasite studies, coevolution, etc. Jane is a software package (free with registration) that uses a heuristic approach, “running a genetic algorithm with an internal fitness function that is evaluated using a dynamic programming algorithm.” It can often give an optimal solution for that cophylogeny you are studying. Jane was developed in the research group of  Ran Libeskind-Hadas at Harvey Mudd College and you can  read more about the algorithm and approach here. They also have an extensive written tutorial. In these tips we usually focus on web-interface to tools, but I liked this package (and it’s free) and wanted to play around with it, so today I’ll walk you through a very quick intro to downloading and getting started with the tool. Quick Links: Jane Jane Tutorial CoPhylogeny Reconstruction TreeMap (another cophylogeny reconstruction software) CopyCat (yet another) Book Chapter on Cophylogeny and reconstruction Conow, C., Fielder, D., Ovadia, Y., & Libeskind-Hadas, R. (2010). Jane: a new tool for the cophylogeny reconstruction problem Algorithms for Molecular Biology, 5 (1) DOI: 10.1186/1748-7188-5-16

Video Tip of the Week: Introduction to IGB Genome Browser

Last fall I noticed an announcement at Biostar about an upcoming webinar that would illustrate some new features in the IGB browser. And at the time I highlighted some of their materials as our video tip that week.  So for more details you can check out that overview.

But recently I was told that their longer-form introduction is available again. If you are interested in the different functionalities of various browsers this is a better overview perhaps. So now that it’s viewable again I thought I’d offer that as the tip this week. It’s in 2 parts, the first one is here:

The second one is available with the other videos on their YouTube channel: Introduction to IGB Part II: Starting Analysis.

To get an idea of how it’s used in the field, have a look at data about blueberries. There’s a SlideShare of Ann Loraine’s recent presentation that shows you where to find their blueberry RNA-Seq data. On slide 13 there’s a neat example of the different transcripts present in ripe and unripe fruit. With those details (focusing on Cuff.187.1 region, and loading up the RNA-Seq Berry Development tracks, load the “coverage” files in the Graphs folder) I was able to see exactly what they show as the difference among the data sets. And I loved how they were color-coded to match the berry stages–I thought that was very effective. The slides go on to show further steps of annotation and exploration with Blast2GO and PlantCyc. And they show some sample data of pathways that are altered over developmental time points.


So have a look at IGB, and the great example of how it can be used to visualize key features in genomic data. And have some blueberries.

Quick links:

IGB site:

YouTube channel with more videos: IGB Channel


Nicol J.W., Helt G.A., Blanchard S.G., Raja A. & Loraine A.E. (2009). The Integrated Genome Browser: free software for distribution and exploration of genome-scale datasets, Bioinformatics, 25 (20) 2730-2731. DOI:

Video Tip of the Week: Ambiscript Mosaic for visualizing nucleotide motifs

One of the topics I keep an eye on is visualization of various types of genomics data, and I’m always interested in new tools for graphical representations. In the past some of our most popular posts have been tools that aren’t heavy-lifting analysis types of tools–but better ways to visualize and explore data, or different ways to present it.

This week’s tip of the week is a tool of this type–Ambiscript Mosaic offers a new way to look at nucleotides in stretches of sequence data. Now, I know–you think: new ways to look at A, T, G, and C? Really? Do we need this? And I’ll admit I wasn’t convinced at first.  When I read the first paper on it the sequences just looked like Elvish–which I thought was cute, but I wasn’t convinced it was useful. But the more I thought about this interesting abstraction and read more about it, the more I liked the concept.

The basic idea is that the Roman letters for ATGC are certainly important and useful. But they can be represented with graphical elements that convey more detail visually. And the 5′ to 3′ representation of letter-based sequence info offers one way to think about the sequence, but the reverse complement of that requires a translation step. However, if represented graphically, the same data is just a physical flip away with no additional changes.

This strategy isn’t one that you’d want to replace every view of sequence data, of course. But for some purposes this might offer a new view of the information that will be better suited to seeing some types of motifs or patterns.

In this week’s tip I’ll illustrate an example of how this type of visualization could offer a complementary way to evaluate a particular DNA motif. As a bonus, I’ll also provide the video of the presentation by the Rozak team that helped me to understand why this offers something different from the letter system. You can see it on their site, but I wanted to have a video version of it as well for cross-platform access.

For the demonstration video, I chose to compare the sequence logo style representation generated by the MEME suite tools with this graphical notation. MEME is a tool that I would use to identify motifs–to do the heavy lifting of the analysis part–and then visualize the results. They offer several ways to visually examine the results, and one of them is a sequence logo. The MEME documentation offers a sample motif, which I used to then display the Mosaic style. Here is MEME above, and Mosaic below it:

logo_vs_mosaicIn the demonstration video I don’t have the time to cover a number of the useful aspects of the graphical strategies employed by the Ambiscript tool–this just covers the basics. Be sure to read their papers and see that other background video to understand more about the actual graphical representational choices and details of colors and shading, for example. There’s a lot more thought behind this than I had time to cover. I didn’t show gaps here either, but it can account for gaps.

This bonus video offers some of the background and foundations of the graphical representations they’ve selected. It is based on the prior work, so it doesn’t have some of the additional features that the Mosaic paper describes. But it helps to explain the conceptual basis for the styles. It helped me to connect to the ideas about the choices for graphics. There’s no audio with it, it’s just a conversion of the slide walk-through.

This tool is unusual, I know—I’m sure not everyone will want to let go of ATGCs as letters. And it won’t be suited for every sequence visualization purpose. It took me a while to wrap my head around the idea of not having the letters there. But as a different way to consider sequence data, I think it could be useful for exploring some features. You’ll still want to use the algorithms like the MEME suite has to discover features like possible transcription factor binding motifs. But you can think about seeing them differently with Ambiscript Mosaic.


Credits or quick links to things you saw in the demo video:

Ambiscript Mosaic site:

Rozak slide presentation on the foundations:

Wikipedia Base pair page:

MEME documentation sample motif:

MEME Suite homepage:

Previous tips on motif visualization tools: iceLogo, WebLogo

Thanks to David Rozak for permission to convert the slide presentation to video.


Rozak D. & Rozak A. (2008). Simplicity, function, and legibility in an enhanced ambigraphic nucleic acid notation, BioTechniques, 44 (6) 811-813. DOI:

Rozak D.A. & Rozak A.J. (2014). Using a color-coded ambigraphic nucleic acid notation to visualize conserved palindromic motifs within and across genomes, BMC Genomics, 15 (1) 52. DOI:

Bailey T.L., Boden M., Buske F.A., Frith M., Grant C.E., Clementi L., Ren J., Li W.W. & Noble W.S. (2009). MEME SUITE: tools for motif discovery and searching, Nucleic Acids Research, 37 (Web Server) W202-W208. DOI:

Video Tip of the Week: Centralized Model Organism Database (CMOD)

This week’s Tip of the Week is a bit different. The database resource that’s the focus of this piece doesn’t exist yet. Parts of it do, but there’s a ways to go before we actually have a Centralized Model Organism Database (CMOD).

The ideas that Andrew Su offers for CMOD in this talk are ones that we have to start moving towards. There has got to be a way to capture more of the annotation information that scientists have–and others need–from and for all of these sequencing projects that are flowing in daily at this point.

Using the current infrastructure of GMOD (Generic Model Organism Database) and the large community of users of the resources like the numerous GBrowses that are already out there, we’ve got access to a lot of organism-specific community-based information (even yak, butterflies, and trees among them; will these all continue to be supported individually?). But some of these species coming along lack the community size or resourcing that the big ones have. And the way we are doing things now just doesn’t scale.

I know there have been multiple attempts to capture the Wikipediean-type model of community curation, with varying success. Personally I still want a group of professional curators involved–but if we can supplement their work with additional information from the wider community that would be great. And if we can have the professionals help to seed and maintain information with new tools and strategies it would help encourage volunteer curation too. So in this talk you’ll hear more about these issues and how Su’s collaborators have approached this so far, with the reasons for the directions they chose and their experiences with Gene Wiki curation.

The video misses a bit of the intro and the questions at the end, but there’s plenty to chew on still. And you can follow along with the slide deck too (I put that in below, but you can also go directly there).

The missing piece still, though, is something that was true in the article about biocuration that Andrew noted in the talk:

To date, not much of the research community is rolling up its sleeves to annotate. What will be the tipping point? The main limitation in community annotation is the perceived lack of incentive.

I think some of the altmetrics strategies could come to support this part of the problem, but I still haven’t seen the real answer to this barrier yet.

For paper on the framework of the Gene Wiki project: Good B.M., Clarke E.L., de Alfaro L. & Su A.I. (2011). The Gene Wiki in 2011: community intelligence applied to human gene annotation, Nucleic Acids Research, 40 (D1) D1255-D1261. DOI:
More references from the Su team’s projects:
Issues around biocuration in general: Howe D., Costanzo M., Fey P., Gojobori T., Hannick L., Hide W., Hill D.P., Kania R., Schaeffer M. & St Pierre S. & (2008). Big data: The future of biocuration, Nature, 455 (7209) 47-50. DOI:

Video Tip of the Week: MetaPhlAn and Galaxy

CPB Using Galaxy 2 from Galaxy Project on Vimeo.

for loading and using datatypes and  the OpenHelix Galaxy tutorial for getting familiar with Galaxy interface and usage.

Metagenomics analysis can be a bit daunting at times, but there are a good number of tools out there to assist a researcher in analysis.  Integrated Microbial Genomes at JGI has some excellent tools such as IMG/M and IMG HMP M. (OpenHelix tutorialThere are other excellent tools that I suggest you check out. QIIME is an excellent tool also.

But the above is not per se a metagenomics tutorial, rather it’s some short screencast of how to use the Galaxy interface for loading data and datatypes. Why? Because another excellent set of tools to use for metagenomic analysis is MetaPhlAn from the Huttenhower lab at Harvard.

The MetaPhlan tools can be downloaded and used ‘offline’, but they also have an excellent Galaxy interface to the tools. If you walk yourself through the MetaPhlAn tutorials on their site, including their Galaxy module one, after familiarizing yourself with Galaxy above, that should help you get started on some excellent metagenomics analysis.

To get a feel of these and other tools and workflows, you might want to browse through this excellent slide set from Surya Saha, Research Associate at Cornell University, from last year.

Quick Links:


Nicola Segata, Levi Waldron, Annalisa Ballarini, Vagheesh Narasimhan, Olivier Jousson & Curtis Huttenhower (2012). Metagenomic microbial community profiling using unique clade-specific marker genes Nature Methods (9), 811-814 : doi:10.1038/nmeth.2066