Tag Archives: dbVar

Video Tip of the Week: The New Database of Genomic Variants – DGV2 (edited)

In today’s tip I will briefly introduce you to the beta version of the updated DGV resource. The Database of Genomic Variants, or DGV, was created in 2004 at a time early in the understanding of human structural variation, or SV, which is defined by DGV as genomic variation larger than 50bp. DGV has historically provided public access to SV data in humans who are non-diseased. In the past it both accepted direct data submissions on SV and also provided high quality curation and analysis of the data such that it was appropriate for use in biomedical studies.

We’ve had an introductory tutorial on using DGV for years, and we’ve posted on changes at DGV in the past, so we were quite interested to read in their recent newsletter that there is a newly updated beta version of the DGV resource. The increase in SV data being generated by many large-scale sequencing projects as well as individual labs, has made it difficult for the DGV to continue to collect SV data, to provide a stable and comprehensive data archive AND to manually curate it at the level they have in the past. Therefore the DGV team is now partnering with DGVa at EBI and dbVar at NCBI. DGVa and dbVar will accept SV data submissions, and will function as public data archives (PDA) and, according to the publication sited below, DGVa and dbVar will:

 “...provide stable and traceable identifiers and allow for a single point of access for data collections, facilitating download and meta-analysis across studies.

DGV will no longer accept data submissions, but will instead use accessioned SV data from the archives and focus on providing the scientific community and public at-large with a subset of the data. Again quoting from the paper referenced below:

The main role of DGV going forward will be to curate and visualize selected studies to facilitate interpretation of SV data, including implementing the highest-level quality standards required by the clinical and diagnostic communities.

The original DGV resource is still available while comments are collected on the updated beta site. For more information on the updated DGV I suggest you check out this documentation from the DGV team: From their FAQ – “What is the data model used for DGV2?” and from a link in their top navigation area – “DGV Beta User Tutorial“. Be sure to check out the new displays & data that’s available, and most importantly to send your comments & suggestions to the group so that they can design a resource best suited for your needs.

Quick Links:

Original Database of Genomic Variants: http://projects.tcag.ca/variation/

New beta version of the Updated DGV: http://dgvbeta.tcag.ca/dgv/app/home

Introductory OpenHelix on Original DGV: http://www.openhelix.com/cgi/tutorialInfo.cgi?id=88

DGV Beta User Tutorial from DGV: http://dgvbeta.tcag.ca/dgv/docs/20111019-DGV_Beta_User_Tutorial.pdf

Church, D., Lappalainen, I., Sneddon, T., Hinton, J., Maguire, M., Lopez, J., Garner, J., Paschall, J., DiCuccio, M., Yaschenko, E., Scherer, S., Feuk, L., & Flicek, P. (2010). Public data archives for genomic structural variation Nature Genetics, 42 (10), 813-814 DOI: 10.1038/ng1010-813
(Free access from PubMed Central here)

Edit, March 5, 2012 – I wanted to add a clarification that we recieved through our contact link. I am pasting it in full, with permission from Margie:

“Hi Jennifer
We at TCAG think you did a great job on your video blog of the New Database of Genomic Variants.
I wanted to make a correction to one of your statements: “The increase in SV data (…) at the level they have in the past.”
We, the DGV team, have built a system that CAN handle the new volumes and types of SV data now being published, and we are able to curate all of these data. The reason we partnered with DGVa and dbVar was primarily to provide stable, “universal” accessions for SV data. We also work with DGVa and dbVar to define standard terminology, data types, and data exchange formats.
I just wanted to make sure it was clear that we are fully capable to handle the SV data being published now. Our reason for partnership was to foster standardized data and open data sharing across systems.
Thanks again for your blog post!
Margie Manker”

Tip of the Week: CHOP CNV database

chop_cnv_tipOne of the hottest searches we see all the time is for more information on CNVs, or copy number variations.  These intriguing structural variants in our genomes explain a lot of the reason that SNP hunting for complex diseases like schizophrenia and autism weren’t able to elucidate the problems as most people expected.  These spectrum sorts of conditions were just not going to turn out as straightforward as the sickle-cell variation or the cystic fibrosis stories.

Resources to catalog and look at CNVs have developed.  We have had a tutorial on DGV, the Database of Genomic Variants for some time (subscription required for tutorial).  Just the other day I was looking around at the NCBI tool called dbVar, which has a nice diagrammatic overview of the kinds of structural variations CNVs represent (but I’m not sure I understand how to use the database yet–I’ll keep you posted :) ). Now there is also CHOP CNV.

Today I’ll be introducing you to the CHOP CNV resource.  I heard about it at ASHG a couple of weeks ago, and decided to look into it.  I had remembered hearing about the tool at one of the trainings we did at CHOP, but I wasn’t sure it was publicly available.  Now I’m sure it is!

The publication associated with the CHOP CNV resource provides an overview of the  strategy. The authors highlight the reason they developed this one–to use a uniform technology (Illumina chips to start, and then subsequent validation with other techniques) and to have a large sample set.  They examine the genomes of over 2000 healthy individuals.  The point of looking at healthy folks is that they form the reference set essentially: you can now take the samples from affected patients and subtract the things that healthy folks appear to share.  This helps to narrow down your search for CNVs that might cause disease conditions.  They offer various statistics on the types and sizes of the structural variants observed in the healthy population.  It reminded me of another talk I heard at ASHG called “The first map of dispensable regions in the human genome” by Terry Vrijenhoek et al–which was a cool talk that began with a Facebook chat that had us all giggling–but the serious message was there’s a lot of missing genome healthy people appear to tolerate just fine….

The paper goes on to describe the creation of their web interface.  Although I couldn’t find it mentioned in the paper, I asked one of the authors and my suspicion that it was based on GBrowse was confirmed–I thought the tracks and controls appeared “GBrowsy” to me.  It shows the variations on the graphical display.  The deletions are red, the duplications are blue.  There is also a table that contains the data which you can color code to indicate uniqueness with green.  And the table provides a column that summarizes the genes in that region (if there are some), and links to the UCSC Genome Browser in that region so you can choose to go there and examine the other genomic features in that region.  When you have that loaded at UCSC, the data becomes a custom track that you can then examine with all the UCSC tools, including detailed queries with the table browser.  It’s a nice example of a big data set from a publication getting displayed at UCSC for further query options.

Another nice feature of the tabular display is that it also links to FABLE.  FABLE is a literature mining tool (Fast Automated Biomedical Literature Extraction) that will be searched for papers relating to the genes you find in that region–so you can quickly assess what’s known about a given gene in a CNV region.

They also include a compelling “application” as a way to illustrate how you can use the CHOP CNV resource to make discoveries.  There was a clinical sample of a patient with a number of congenital anomalies.  The CNV detection of the genomic sample indicated that 32 of the 35 variations this patient had existed in the healthy controls–which means that targeting the remaining 3 for further study provides a much more helpful focus on the likely issues.  There were a couple of other examples of utility as well.

When I asked the CHOP CNV team some questions about their Figure 1 in the paper (it showed what appeared to be lab group names with data sets), I was told that new versions will be coming that will offer some new features–including an option to upload your own samples to compare them to their data set.

If you are interested in structural variations in the genome you should check out the CHOP CNV database.  You might find some helpful information for your project!  I almost forgot to note–you can download all the data as well, and use it with other data you may have or for other analysis tools.

Direct to the site: http://cnv.chop.edu/

Shaikh, T., Gai, X., Perin, J., Glessner, J., Xie, H., Murphy, K., O’Hara, R., Casalunovo, T., Conlin, L., D’Arcy, M., Frackelton, E., Geiger, E., Haldeman-Englert, C., Imielinski, M., Kim, C., Medne, L., Annaiah, K., Bradfield, J., Dabaghyan, E., Eckert, A., Onyiah, C., Ostapenko, S., Otieno, F., Santa, E., Shaner, J., Skraban, R., Smith, R., Elia, J., Goldmuntz, E., Spinner, N., Zackai, E., Chiavacci, R., Grundmeier, R., Rappaport, E., Grant, S., White, P., & Hakonarson, H. (2009). High-resolution mapping and analysis of copy number variations in the human genome: A data resource for clinical and research applications Genome Research, 19 (9), 1682-1690 DOI: 10.1101/gr.083501.108