This next post in our continuing semi-regular Guest Post series is from Pedro Lopez, developer of WAVe at the University of Aveiro Bioinformatic Group in Aveiro Portugal. If you are a provider of a free, publicly available genomics tool, database or resource and would like to convey something to users on our guest post feature, please feel free to contact us at wlathe AT openhelix DOT com or the contact form (write ‘guest post’ as subject heading). We welcome introductions to your resource, information on updates, highlights of little known gems or opinion pieces on the state of genomic research and databases.
I would like to start by thanking Trey Lathe for the opportunity to promote WAVe in this great blog. After his short tip of the week post, I’ll now try to make a more detailed overview of this new application.
What is WAVe?
WAVe stands for Web Analysis of the Variome and is a simple application focused on centralizing the access to distributed and heterogeneous locus-specific databases (LSDB). LSDBs are an emerging type of bioinformatics applications, aiming at providing gene-centric information regarding discovered genomic variants. In WAVe, we offer both LSDBs as well as to its variants. Moreover, we also provide access to a comprehensive list of carefully selected external resources. With this, users have, in a single application, access to gene and variation information enriched with a multitude of gene-related resources in a lightweight and easy to use web application.
What are WAVe’s key features?
At this early stage, WAVe’s publicly available features are related with data access. Users can easily browse through available genes, search for genes, view gene info and access each gene RSS feed. In WAVe’s entry page, users simply need to start typing a gene HGNC-approved symbol and several suggestions will appear: accepting one of them leads directly to the gene view page. Following the view all link, users can browse all available genes or check, for each gene, how many LSDBs and variants are available.
To access the application data, users just need to navigate in the gene tree. Each tree node represents a distinct data type and the various leaf provide access to external applications: by clicking a leaf, the destination page is loaded in the main content area. Repeating this process, users can navigate in the dozens of listed links for each gene.
WAVe also offers its core data to other developers. To obtain the gene tree and its links, users just need to add the rss tag to the end of gene address. This will output a RSS2.0 feed that can be easily parsed by any application or added to a feed reader.
How was WAVe born?
The european GEN2PHEN project is an initiative to link, as deeply as possible, data from genotype features to its phenotype counterparts. The first step consisted in an attempt to improve various genomic variation resource scenarios. This implied normalizing LSDBs (the “LSDB-in-a-box” approach, LOVD) and defining novel data models and formats for data exchanges from and to LSDBs.
In a long term perspective, applying the GEN2PHEN-approved data models, will enhance the creation of new services and applications to integrate and interact with the exponentially growing dataset of genomic variation data.
With WAVe we tried a different approach based on three questions: why wait for everyone to adopt these new formats? What will happen to legacy LSDBs that won’t adopt the new formats? How can we have an immediate solution? We have created a lightweight integration architecture, based on links to applications and adopted a simple (yet familiar) tree-based navigation interaction to deploy a new application that can be used right now and will easily scale to integrate the foreseen data exchanges formats. Technical details aside, based on a manually curated LSDB list, we can connect and integrated any kind of LSDB application whether it is a modern LOVD application or a simple text-based legacy LSDB.
How is it relevant?
To demo WAVe efficiency let’s just try to perform a simple search in our lab: Are there any LSDBs for COL3A1 gene in the human species? And known variants? And what are the associated proteins and pathways?
In a WAVe-free scenario, to find out COL3A1 LSDBs (if any), researchers need to google it (the main COL3A1 LSDB does not appear in the first result page) or, if you they are used to it, go to HGVS site, go to the “Databases & Tools” section, select “Locus-specific Mutation Databases” and then search for the gene in search box. Now for the variants researchers just need to browse the last page they’ve just entered. How many clicks (and time!) does it take?
For protein information, researchers enter in UniProt and search for COL3A1: that gives about 29 results. Add a filter for the human species and there are 5 results. Good enough to access directly to P02461 (SwissProt reviewed). Though, there is new window/tab open. Now for pathway information, a KEGG quick search for COL3A1 lists 14 results. In the end, there are about 3 windows/tabs and made some 20 mouse clicks to obtain the desired information.
Using WAVe, researchers simply need to access WAVe, start typing the gene HGNC symbol, select COL3A1 from the suggestions and access COL3A1 page. Once in the page, it’s as easy as browsing in the tree… Variations? Check the variation node, they’re even grouped according to the change type. UniProt information? Check the protein node where you have direct access to SwissProt, TrEMBL, PDB, Expasy and InterPro. And I guess you get the picture. In the end, one window/tab and about 6/7 mouse clicks.
Other UA.PT Bioinformatics tools
At the University of Aveiro’s Bioinformatics research group we are mainly young and enthusiast computer science experts, simply trying to make biology easier (at least in terms of computer applications!). Our more relevant web-based tools include MIND (a microarray analysis tool), GeneBrowser (a gene expression tools, useful to process data gathered from systems like MIND) and QuExT (a comprehensive MEDLINE mining application).