Guest Post: WAVe – Pedro Lopes

This next post in our continuing semi-regular Guest Post series is from Pedro Lopez, developer of WAVe at the University of Aveiro Bioinformatic Group in Aveiro Portugal.

I would like to start by thanking Trey Lathe  for the opportunity to promote WAVe in this great blog. After his short tip of the week post, I’ll now try to make a more detailed overview of this new application.

What is WAVe?

WAVe stands for Web Analysis of the Variome and is a simple application focused on centralizing the access to distributed and heterogeneous locus-specific databases (LSDB). LSDBs are an emerging type of bioinformatics applications, aiming at providing gene-centric information regarding discovered genomic variants. In WAVe, we offer both LSDBs as well as to its variants. Moreover, we also provide access to a comprehensive list of carefully selected external resources. With this, users have, in a single application, access to gene and variation information enriched with a multitude of gene-related resources in a lightweight and easy to use web application.

What are WAVe’s key features?

At this early stage, WAVe’s publicly available features are related with data access. Users can easily browse through available genes, search for genes, view gene info and access each gene RSS feed. In WAVe’s entry page, users simply need to start typing a gene HGNC-approved symbol and several suggestions will appear: accepting one of them leads directly to the gene view page. Following theview alllink, users can browse all available genes or check, for each gene, how many LSDBs and variants are available.

To access the application data, users just need to navigate in the gene tree. Each tree node represents a distinct data type and the various leaf provide access to external applications: by clicking a leaf, the destination page is loaded in the main content area. Repeating this process, users can navigate in the dozens of listed links for each gene.

WAVe also offers its core data to other developers. To obtain the gene tree and its links, users just need to add the rss tag to the end of gene address. This will output a RSS2.0 feed that can be easily parsed by any application or added to a feed reader.

How was WAVe born?

The european GEN2PHEN project is an initiative to link, as deeply as possible, data from genotype features to its phenotype counterparts. The first step consisted in an attempt to improve various genomic variation resource scenarios. This implied normalizing LSDBs (the “LSDB-in-a-box” approach, LOVD) and defining novel data models and formats for data exchanges from and to LSDBs.

In a long term perspective, applying the GEN2PHEN-approved data models, will enhance the creation of new services and applications to integrate and interact with the exponentially growing dataset of genomic variation data.

With WAVe we tried a different approach based on three questions: why wait for everyone to adopt these new formats? What will happen to legacy LSDBs that won’t adopt the new formats? How can we have an immediate solution? We have created a lightweight integration architecture, based on links to applications and adopted a simple (yet familiar) tree-based navigation interaction to deploy a new application that can be used right now and will easily scale to integrate the foreseen data exchanges formats. Technical details aside, based on a manually curated LSDB list, we can connect and integrated any kind of LSDB application whether it is a modern LOVD application or a simple text-based legacy LSDB.

How is it relevant?

To demo WAVe efficiency let’s just try to perform a simple search in our lab: Are there any LSDBs for COL3A1 gene in the human species? And known variants? And what are the associated proteins and pathways?

In a WAVe-free scenario, to find out COL3A1 LSDBs (if any), researchers need to google it (the main COL3A1 LSDB does not appear in the first result page) or, if you they are used to it, go to HGVS site, go to the “Databases & Tools” section, select “Locus-specific Mutation Databases” and then search for the gene in search box. Now for the variants researchers just need to browse the last page they’ve just entered. How many clicks (and time!) does it take?

For protein information, researchers enter in UniProt and search for COL3A1: that gives about 29 results. Add a filter for the human species and there are 5 results. Good enough to access directly to P02461 (SwissProt reviewed). Though, there is new window/tab open. Now for pathway information, a KEGG quick search for COL3A1 lists 14 results. In the end, there are about 3 windows/tabs and made some 20 mouse clicks to obtain the desired information.

Using WAVe, researchers simply need to access WAVe, start typing the gene HGNC symbol, select COL3A1 from the suggestions and access COL3A1 page. Once in the page, it’s as easy as browsing in the tree… Variations? Check the variation node, they’re even grouped according to the change type. UniProt information? Check the protein node where you have direct access to SwissProt, TrEMBL, PDB, Expasy and InterPro. And I guess you get the picture. In the end, one window/tab and about 6/7 mouse clicks.

Other UA.PT Bioinformatics tools

At the University of Aveiro’s Bioinformatics research group we are mainly young and enthusiast computer science experts, simply trying to make biology easier (at least in terms of computer applications!). Our more relevant web-based tools include MIND (a microarray analysis tool), GeneBrowser (a gene expression tools, useful to process data gathered from systems like MIND) and QuExT (a comprehensive MEDLINE mining application).

-Pedro Lopes

Guest Post: New features at CTD – Allan Peter Davis

This next post in our continuing semi-regular Guest Post series is from Allen Peter Davis, of Comparative Toxicogenomics Database (CTD) at Mount Desert Island Biological Laboratory (MDIBL).

The Comparative Toxicogenomics Database (CTD) is a free, public resource that promotes understanding about the effects of environmental chemicals on human health.  Since Trey’s original Tip of the Week about CTD, we’ve added many new features we’d like to highlight.

* The redesigned CTD homepage makes navigation easier and more intuitive.  Check out the keyword quick search box on every page, and try the “All” setting to see the scope of information available at CTD.

* A new Data Status page uses tag clouds to display the updated content for that month.

* We are particularly pleased to announce new statistical analyses of CTD data.  Chemical pages now feature enriched Gene Ontology (GO) terms, garnered from the genes that interact with a chemical.  In this release, CTD connects over 5,000 enriched GO terms to more than 4,500 chemicals.  As well, now our inferred chemical-disease relationships are also statistically scored and ranked.  Both new features will help users explore and generate testable hypotheses about the biological effects of chemicals.

* GeneComps and ChemComps discover genes or chemicals with a similar toxicogenomic profile to your molecule of interest.  Learn more about this feature in our recent publication.

* Reactome data are now also included with KEGG, for a more comprehensive view of pathways affected by chemicals.

VennViewer and MyGeneVenn are new tools that compare datasets for chemicals, diseases, or genes (including your own gene list) using Venn diagrams to discover shared and unique information.  These two visualization tools are a nice accompaniment to our original Batch Query tool for meta-analysis.

* The FAQ section under the “Help” menu provides examples of how to maximize your experience with CTD.

* Download our Resource Guide (pdf link) to keep as a handy reference card for CTD.

From the homepage, you can also subscribe to our monthly email newsletter to keep current with CTD’s growing content and features.  You can always contact us to request curation of your favorite chemical or paper.  And with our new “Author Alert” email program, we’ll even contact you to let you know when we’ve curated data from one of your publications in CTD.

We strive to be the best possible resource of chemical-gene-disease networks for the biological community, so feedback and input from users are of great importance to us.

- Allan Peter Davis

Guest Post: CoGe, The Suite for Comparative Genomics – Eric Lyons

This next post in our continuing semi-regular Guest Post series is from Eric Lyons, of CoGe at the University of California, Berkeley.

Thanks both for the prior CoGe post (editors note: a tip of the week on GoGe) and the invitation to write a bit about CoGe.  Since most people are probably not familiar with CoGe, let me begin with how it is designed:

CoGe’s architecture and philosophy:  Solve a problem once

CoGe is a web-based platform for comparative genomics and consists of many interconnected web-based tools.  The entire system is hooked up to a database that can store any version of any genome in any state of assembly from any organism (currently ~9000 genomes from ~8000 organisms). Each of CoGe’s tools is designed to do one task (e.g. search and display information about a genome, compare two genomes and generate syntenic dotplots, search any number of genomes for similar sequence, manage a list of genes, etc.), and are linked to one another. This means that there is no predefined analysis workflow. Instead, people can begin exploring a genome of interest, compare it to what they want, find something interesting, explore that, finding something else, explore that, etc.) People anywhere in the world can perform computationally intense analyses by clicking a few buttons on a web-page, and letting our servers crunch away on whatever genomes we have currently loaded in our system .  Since each tool is web-based, links are used to move from tool to tool which creates an easy way to save an analysis for future work or to send to a colleague. This also has the benefit that as we develop new tools to solve a specific problem, we can generalize the solution, and plug it into CoGe’s database and connect it to its pre-existing tool set. Overall, this allows an easy way for us to expand CoGe’s functionality.

Guest Post: CHOP’s new tool, CNV Workshop – Xiaowu Gai

This next post in our continuing semi-regular Guest Post series is from Xiaowu Gai, the Bioinformatics Core Director at CHOP .

Thanks to Mary for running a Tip of the Week – “CHOP CNV database” a couple of months back. CHOP CNV database is a high-resolution genome-wide survey of copy number variations of a large number (2,026) of apparently healthy individuals. It is publicly accessible and has been widely used by a large number of research groups world-wide. I am now pleased to announce the public release of our software system behind it: CNV Workshop. CNV Workshop is a suite of software tools that we have developed over the last a few years. It provides a comprehensive workflow for analyzing, managing, and visualizing genome copy number variation (CNV) data.

It can be used for almost any CNV research or clinical project by offering the following capabilities for both individual samples and cohort studies:

CNV identification
Implements a modified circular binary segmentation algorithm that reduces false positives
Fully configurable parameters for sensitivity/specificity management
Individual locus-specific annotations such as position, type of variation, call metrics, and overlap with CNVs of other data sets, including the Database of Genomic Variants.
Functional gene annotations such as genes affected and known disease associations
Accepts user-provided annotations
GBrowse-enabled visuals for querying, browsing, interpreting, and reporting CNVs
Export of results into Excel, XML, CSV, and BED files
Direct links to public resources such as the UCSC Genome Browser, NCBI Entrez, Entrez Gene, and FABLE
Project and Account Management
Authentication and permission scheme that is especially useful for clinical diagnostic settings
Analysis result sharing within and between projects
Simple Web-based administrative interface
Remote access and administration enabled

CNV Workshop currently accepts genotyping array data from Illumina’s 550k, 610- and 660-Quad, and Omni arrays, along with Affymetrix’s 5.0 and 6.0 arrays, and can be easily configured to accept data from other platforms. The package comes preloaded with publicly available reference data from more than 2,000 healthy control subjects (the CHOP CNV Database). CNV Workshop also allows the user to upload already processed CNV calls for annotation and presentation.

The software package is freely available at It is also described in more detailed in our recent paper on BMC Bioinformatics.

-Xiaowu Gai

Guest Post: New at VISTA- Inna Dubchak

Our first guest post in our new semi-regular Guest Post series is from Inna Dubchak , principal investigator at the LBNL/JGI group, developers of the VISTA comparative genomics resource (who sponsors a tutorial, free to the users).

I would like to give you a heads up on some new VISTA updates and ongoing development!

Updates: As you probably know from this blog, a new, still free VISTA tutorial is available now. We have introduced a lot of updates to these tools - built new programs, improved the existing ones, and entirely changed the design of the site to make it more up-to-date and convenient.

Main addition to the site – VISTA Point – combines capabilities of the three tools currently available at the site – VISTA Gateway, VISTA Browser, and Text Browser usually used step-by-step. VISTA Point makes analyzing multiple and pairwise genome alignments and extracting relevant numerical data much more straightforward, it is easy to update, expand and add new programs.

Soon: We are actively working on visualizing synteny at scales ranging from whole-genome alignment to the conservation of individual genes, with seamless navigation across different levels of resolution. In our upcoming VISTA-Dot tool we used the concept of two-dimensional “dot-plots”, historically employed in the analysis of local alignment, and an interactive Google-map-like interface to visualize whole-genome alignments. You will be able to get a display and analyze large-scale duplication in plants in one click! It can also be useful in genome assembly and finishing. Another addition coming in the near future, VISTA Synteny Viewer, presents a novel interface as three cross-navigable panels representing different scales of the alignment.

Attention: do not forget to use our whole-genome capabilities – Whole-genome VISTA to align sequence of any quality, from draft to finished, up to 10MB long, and Whole Genome rVISTA to evaluate which transcription factor binding sites (TFBS) are over-represented in upstream regions in a group of genes.

-Inna Dubchak