Tag Archives: genome variation


Friday SNPpets

This week’s SNPpets offered a lot of good stuff, everybody must be out of their summer vacation mode and back to the lab. There’s a BLASTX alternative, a helpful tip in workshop teaching, quack DTC tests, handy errors (really), new variant database DIVAS, exome sequencing patients and outcomes, imputation, 24M novel rare variants, variant caller comparison, the long term evolution experiment hits a milestone, new citrus resources, eQTLs, and more. The best item this week, though, was a story of a family with a rare disease situation that found genomics researchers via Reddit, and that gave them answers and hope.

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

Friday SNPpets

This week’s SNPpets include an extensive list of commercial bioinformatics tools that’s been crowd-sourced, new resources for variation ranking and transposon detection, the 10K genome project path, and the chatter from the Future of Genomic Medicine conference–including that big story on cancers being found in pregnant women via non-invasive prenatal testing.

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

Video Tip of the Week: The New Database of Genomic Variants – DGV2 (edited)

In today’s tip I will briefly introduce you to the beta version of the updated DGV resource. The Database of Genomic Variants, or DGV, was created in 2004 at a time early in the understanding of human structural variation, or SV, which is defined by DGV as genomic variation larger than 50bp. DGV has historically provided public access to SV data in humans who are non-diseased. In the past it both accepted direct data submissions on SV and also provided high quality curation and analysis of the data such that it was appropriate for use in biomedical studies.

We’ve had an introductory tutorial on using DGV for years, and we’ve posted on changes at DGV in the past, so we were quite interested to read in their recent newsletter that there is a newly updated beta version of the DGV resource. The increase in SV data being generated by many large-scale sequencing projects as well as individual labs, has made it difficult for the DGV to continue to collect SV data, to provide a stable and comprehensive data archive AND to manually curate it at the level they have in the past. Therefore the DGV team is now partnering with DGVa at EBI and dbVar at NCBI. DGVa and dbVar will accept SV data submissions, and will function as public data archives (PDA) and, according to the publication sited below, DGVa and dbVar will:

 “...provide stable and traceable identifiers and allow for a single point of access for data collections, facilitating download and meta-analysis across studies.

DGV will no longer accept data submissions, but will instead use accessioned SV data from the archives and focus on providing the scientific community and public at-large with a subset of the data. Again quoting from the paper referenced below:

The main role of DGV going forward will be to curate and visualize selected studies to facilitate interpretation of SV data, including implementing the highest-level quality standards required by the clinical and diagnostic communities.

The original DGV resource is still available while comments are collected on the updated beta site. For more information on the updated DGV I suggest you check out this documentation from the DGV team: From their FAQ – “What is the data model used for DGV2?” and from a link in their top navigation area – “DGV Beta User Tutorial“. Be sure to check out the new displays & data that’s available, and most importantly to send your comments & suggestions to the group so that they can design a resource best suited for your needs.

Quick Links:

Original Database of Genomic Variants: http://projects.tcag.ca/variation/

New beta version of the Updated DGV: http://dgvbeta.tcag.ca/dgv/app/home

Introductory OpenHelix on Original DGV: http://www.openhelix.com/cgi/tutorialInfo.cgi?id=88

DGV Beta User Tutorial from DGV: http://dgvbeta.tcag.ca/dgv/docs/20111019-DGV_Beta_User_Tutorial.pdf

Church, D., Lappalainen, I., Sneddon, T., Hinton, J., Maguire, M., Lopez, J., Garner, J., Paschall, J., DiCuccio, M., Yaschenko, E., Scherer, S., Feuk, L., & Flicek, P. (2010). Public data archives for genomic structural variation Nature Genetics, 42 (10), 813-814 DOI: 10.1038/ng1010-813
(Free access from PubMed Central here)

Edit, March 5, 2012 – I wanted to add a clarification that we recieved through our contact link. I am pasting it in full, with permission from Margie:

“Hi Jennifer
We at TCAG think you did a great job on your video blog of the New Database of Genomic Variants.
I wanted to make a correction to one of your statements: “The increase in SV data (…) at the level they have in the past.”
We, the DGV team, have built a system that CAN handle the new volumes and types of SV data now being published, and we are able to curate all of these data. The reason we partnered with DGVa and dbVar was primarily to provide stable, “universal” accessions for SV data. We also work with DGVa and dbVar to define standard terminology, data types, and data exchange formats.
I just wanted to make sure it was clear that we are fully capable to handle the SV data being published now. Our reason for partnership was to foster standardized data and open data sharing across systems.
Thanks again for your blog post!
Margie Manker”

Guest Post: WAVe – Pedro Lopes

This next post in our continuing semi-regular Guest Post series is from Pedro Lopez, developer of WAVe at the University of Aveiro Bioinformatic Group in Aveiro Portugal. If you are a provider of a free, publicly available genomics tool, database or resource and would like to convey something to users on our guest post feature, please feel free to contact us at wlathe AT openhelix DOT com or the contact form (write ‘guest post’ as subject heading). We welcome introductions to your resource, information on updates, highlights of little known gems or opinion pieces on the state of genomic research and databases.

I would like to start by thanking Trey Lathe  for the opportunity to promote WAVe in this great blog. After his short tip of the week post, I’ll now try to make a more detailed overview of this new application.

What is WAVe?

WAVe stands for Web Analysis of the Variome and is a simple application focused on centralizing the access to distributed and heterogeneous locus-specific databases (LSDB). LSDBs are an emerging type of bioinformatics applications, aiming at providing gene-centric information regarding discovered genomic variants. In WAVe, we offer both LSDBs as well as to its variants. Moreover, we also provide access to a comprehensive list of carefully selected external resources. With this, users have, in a single application, access to gene and variation information enriched with a multitude of gene-related resources in a lightweight and easy to use web application.

What are WAVe’s key features?

At this early stage, WAVe’s publicly available features are related with data access. Users can easily browse through available genes, search for genes, view gene info and access each gene RSS feed. In WAVe’s entry page, users simply need to start typing a gene HGNC-approved symbol and several suggestions will appear: accepting one of them leads directly to the gene view page. Following theview alllink, users can browse all available genes or check, for each gene, how many LSDBs and variants are available.

To access the application data, users just need to navigate in the gene tree. Each tree node represents a distinct data type and the various leaf provide access to external applications: by clicking a leaf, the destination page is loaded in the main content area. Repeating this process, users can navigate in the dozens of listed links for each gene.

WAVe also offers its core data to other developers. To obtain the gene tree and its links, users just need to add the rss tag to the end of gene address. This will output a RSS2.0 feed that can be easily parsed by any application or added to a feed reader.

How was WAVe born?

The european GEN2PHEN project is an initiative to link, as deeply as possible, data from genotype features to its phenotype counterparts. The first step consisted in an attempt to improve various genomic variation resource scenarios. This implied normalizing LSDBs (the “LSDB-in-a-box” approach, LOVD) and defining novel data models and formats for data exchanges from and to LSDBs.

In a long term perspective, applying the GEN2PHEN-approved data models, will enhance the creation of new services and applications to integrate and interact with the exponentially growing dataset of genomic variation data.

With WAVe we tried a different approach based on three questions: why wait for everyone to adopt these new formats? What will happen to legacy LSDBs that won’t adopt the new formats? How can we have an immediate solution? We have created a lightweight integration architecture, based on links to applications and adopted a simple (yet familiar) tree-based navigation interaction to deploy a new application that can be used right now and will easily scale to integrate the foreseen data exchanges formats. Technical details aside, based on a manually curated LSDB list, we can connect and integrated any kind of LSDB application whether it is a modern LOVD application or a simple text-based legacy LSDB.

How is it relevant?

To demo WAVe efficiency let’s just try to perform a simple search in our lab: Are there any LSDBs for COL3A1 gene in the human species? And known variants? And what are the associated proteins and pathways?

In a WAVe-free scenario, to find out COL3A1 LSDBs (if any), researchers need to google it (the main COL3A1 LSDB does not appear in the first result page) or, if you they are used to it, go to HGVS site, go to the “Databases & Tools” section, select “Locus-specific Mutation Databases” and then search for the gene in search box. Now for the variants researchers just need to browse the last page they’ve just entered. How many clicks (and time!) does it take?

For protein information, researchers enter in UniProt and search for COL3A1: that gives about 29 results. Add a filter for the human species and there are 5 results. Good enough to access directly to P02461 (SwissProt reviewed). Though, there is new window/tab open. Now for pathway information, a KEGG quick search for COL3A1 lists 14 results. In the end, there are about 3 windows/tabs and made some 20 mouse clicks to obtain the desired information.

Using WAVe, researchers simply need to access WAVe, start typing the gene HGNC symbol, select COL3A1 from the suggestions and access COL3A1 page. Once in the page, it’s as easy as browsing in the tree… Variations? Check the variation node, they’re even grouped according to the change type. UniProt information? Check the protein node where you have direct access to SwissProt, TrEMBL, PDB, Expasy and InterPro. And I guess you get the picture. In the end, one window/tab and about 6/7 mouse clicks.

Other UA.PT Bioinformatics tools

At the University of Aveiro’s Bioinformatics research group we are mainly young and enthusiast computer science experts, simply trying to make biology easier (at least in terms of computer applications!). Our more relevant web-based tools include MIND (a microarray analysis tool), GeneBrowser (a gene expression tools, useful to process data gathered from systems like MIND) and QuExT (a comprehensive MEDLINE mining application).

-Pedro Lopes

Tip of the Week: Genomic Variation Tour I

Today’s tip of the week is actually the first in a series of tips I will be doing over the next couple months. The recent paper in Lancet did a clinical assessment of an individual genome. In doing so, the researchers used various genomic resources do ascertain and interpret the data. We have a free tutorial on NIEHS SNPs that walks through some of these resources, but I thought it might be useful to follow one specific nucleotide variation through a lot more genomic databases to show the user what data is available and how to access it. Each tip I do over the next couples months (not every week, I do tips every 2-3 weeks) will follow a specific SNP through the databases. In this case, rs108622 in the CYP4F2 gene (cytochrome P450, family 4). These tips aren’t for the genome jockey’s and SNP surfers among us, they are more an introductory tour of what’s out there. They will be useful for those just starting to look at genomic variations, medical practitioners, clinicians or those just curious what is available. Today’s tip will start with the UCSC Genome Browser, find the variation and follow it through to dbSNP. Next tip will look closer at the dbSNP information and then follow the trail to OMIM and GeneTests. In later tips we’ll take the variation to another 4-6 different databases and genomic variation resources from HapMap and others. In the posts themselves I’ll link to even other variation databases. There is a plethora of them.