Guest Post: CHOP’s new tool, CNV Workshop – Xiaowu Gai

This next post in our continuing semi-regular Guest Post series is from Xiaowu Gai, the Bioinformatics Core Director at CHOP . If you are a provider of a free, publicly available genomics tool, database or resource and would like to convey something to users on our guest post feature, please feel free to contact us at wlathe AT openhelix DOT com.

Thanks to Mary for running a Tip of the Week – “CHOP CNV database” a couple of months back. CHOP CNV database is a high-resolution genome-wide survey of copy number variations of a large number (2,026) of apparently healthy individuals. It is publicly accessible and has been widely used by a large number of research groups world-wide. I am now pleased to announce the public release of our software system behind it: CNV Workshop. CNV Workshop is a suite of software tools that we have developed over the last a few years. It provides a comprehensive workflow for analyzing, managing, and visualizing genome copy number variation (CNV) data.

It can be used for almost any CNV research or clinical project by offering the following capabilities for both individual samples and cohort studies:

CNV identification
Implements a modified circular binary segmentation algorithm that reduces false positives
Fully configurable parameters for sensitivity/specificity management
Individual locus-specific annotations such as position, type of variation, call metrics, and overlap with CNVs of other data sets, including the Database of Genomic Variants.
Functional gene annotations such as genes affected and known disease associations
Accepts user-provided annotations
GBrowse-enabled visuals for querying, browsing, interpreting, and reporting CNVs
Export of results into Excel, XML, CSV, and BED files
Direct links to public resources such as the UCSC Genome Browser, NCBI Entrez, Entrez Gene, and FABLE
Project and Account Management
Authentication and permission scheme that is especially useful for clinical diagnostic settings
Analysis result sharing within and between projects
Simple Web-based administrative interface
Remote access and administration enabled

CNV Workshop currently accepts genotyping array data from Illumina’s 550k, 610- and 660-Quad, and Omni arrays, along with Affymetrix’s 5.0 and 6.0 arrays, and can be easily configured to accept data from other platforms. The package comes preloaded with publicly available reference data from more than 2,000 healthy control subjects (the CHOP CNV Database). CNV Workshop also allows the user to upload already processed CNV calls for annotation and presentation.

The software package is freely available at http://sourceforge.net/projects/cnv/. It is also described in more detailed in our recent paper on BMC Bioinformatics.

-Xiaowu Gai

Tip of the Week: Fable, text mining for literature on human genes

fable_thumb A couple of weeks ago we brought you a tip of the week about the CHOP CNV Database. The same people who bring you that database also do FABLE (Fast Automated Biomedical Literature Extraction), a literature mining tool. The tool uses an advanced algorithm to find Human genes that are directly related to the keywords search on and then find literature on those genes. The tool has some great features and is a great way to quickly find  the literature of a gene of interest. Today’s tip will give you a quick intro to the tool.

Tip of the Week: CHOP CNV database

chop_cnv_tipOne of the hottest searches we see all the time is for more information on CNVs, or copy number variations.  These intriguing structural variants in our genomes explain a lot of the reason that SNP hunting for complex diseases like schizophrenia and autism weren’t able to elucidate the problems as most people expected.  These spectrum sorts of conditions were just not going to turn out as straightforward as the sickle-cell variation or the cystic fibrosis stories.

Resources to catalog and look at CNVs have developed.  We have had a tutorial on DGV, the Database of Genomic Variants for some time (subscription required for tutorial).  Just the other day I was looking around at the NCBI tool called dbVar, which has a nice diagrammatic overview of the kinds of structural variations CNVs represent (but I’m not sure I understand how to use the database yet–I’ll keep you posted :) ). Now there is also CHOP CNV.

Today I’ll be introducing you to the CHOP CNV resource.  I heard about it at ASHG a couple of weeks ago, and decided to look into it.  I had remembered hearing about the tool at one of the trainings we did at CHOP, but I wasn’t sure it was publicly available.  Now I’m sure it is!

The publication associated with the CHOP CNV resource provides an overview of the  strategy. The authors highlight the reason they developed this one–to use a uniform technology (Illumina chips to start, and then subsequent validation with other techniques) and to have a large sample set.  They examine the genomes of over 2000 healthy individuals.  The point of looking at healthy folks is that they form the reference set essentially: you can now take the samples from affected patients and subtract the things that healthy folks appear to share.  This helps to narrow down your search for CNVs that might cause disease conditions.  They offer various statistics on the types and sizes of the structural variants observed in the healthy population.  It reminded me of another talk I heard at ASHG called “The first map of dispensable regions in the human genome” by Terry Vrijenhoek et al–which was a cool talk that began with a Facebook chat that had us all giggling–but the serious message was there’s a lot of missing genome healthy people appear to tolerate just fine….

The paper goes on to describe the creation of their web interface.  Although I couldn’t find it mentioned in the paper, I asked one of the authors and my suspicion that it was based on GBrowse was confirmed–I thought the tracks and controls appeared “GBrowsy” to me.  It shows the variations on the graphical display.  The deletions are red, the duplications are blue.  There is also a table that contains the data which you can color code to indicate uniqueness with green.  And the table provides a column that summarizes the genes in that region (if there are some), and links to the UCSC Genome Browser in that region so you can choose to go there and examine the other genomic features in that region.  When you have that loaded at UCSC, the data becomes a custom track that you can then examine with all the UCSC tools, including detailed queries with the table browser.  It’s a nice example of a big data set from a publication getting displayed at UCSC for further query options.

Another nice feature of the tabular display is that it also links to FABLE.  FABLE is a literature mining tool (Fast Automated Biomedical Literature Extraction) that will be searched for papers relating to the genes you find in that region–so you can quickly assess what’s known about a given gene in a CNV region.

They also include a compelling “application” as a way to illustrate how you can use the CHOP CNV resource to make discoveries.  There was a clinical sample of a patient with a number of congenital anomalies.  The CNV detection of the genomic sample indicated that 32 of the 35 variations this patient had existed in the healthy controls–which means that targeting the remaining 3 for further study provides a much more helpful focus on the likely issues.  There were a couple of other examples of utility as well.

When I asked the CHOP CNV team some questions about their Figure 1 in the paper (it showed what appeared to be lab group names with data sets), I was told that new versions will be coming that will offer some new features–including an option to upload your own samples to compare them to their data set.

If you are interested in structural variations in the genome you should check out the CHOP CNV database.  You might find some helpful information for your project!  I almost forgot to note–you can download all the data as well, and use it with other data you may have or for other analysis tools.

Direct to the site: http://cnv.chop.edu/

Shaikh, T., Gai, X., Perin, J., Glessner, J., Xie, H., Murphy, K., O’Hara, R., Casalunovo, T., Conlin, L., D’Arcy, M., Frackelton, E., Geiger, E., Haldeman-Englert, C., Imielinski, M., Kim, C., Medne, L., Annaiah, K., Bradfield, J., Dabaghyan, E., Eckert, A., Onyiah, C., Ostapenko, S., Otieno, F., Santa, E., Shaner, J., Skraban, R., Smith, R., Elia, J., Goldmuntz, E., Spinner, N., Zackai, E., Chiavacci, R., Grundmeier, R., Rappaport, E., Grant, S., White, P., & Hakonarson, H. (2009). High-resolution mapping and analysis of copy number variations in the human genome: A data resource for clinical and research applications Genome Research, 19 (9), 1682-1690 DOI: 10.1101/gr.083501.108

Space-age training: mobile battery-powered workstations

chop_station3.jpgA couple of weeks ago we did a training at the Children’s Hospital of Philadelphia (CHOP). What a terrific group–eager to learn more, attentive, and smart. Our host–Xiaowu Gai–set up a great day for more than 50 people to get trained up on the UCSC Genome Browser. And Dr. Gai and his team created one of the coolest training rooms we have been in–from an ordinary seminar room.

Shown here are a couple of shots of the mobile workstations that they used. These are battery powered (if required, but you could plug them in), thin-client, wireless workstations. They appeared to be capable of monitoring the trainees heart rate and breathing (although we didn’t try that :) ). They had a nice size monitor, the keyboard area, and a desk surface for notes and handouts and such, too. The trainees had no problems accessing the UCSC Genome Browser or following along with our handouts and exercise documents. Click the images for a slightly larger view of them.

chop_station4.jpgWe’ve been in a lot of training rooms across the country now, and this was by far the most unusual conversion of a regular room to a training room we’ve seen. Reduced tripping hazards from wires for trainers was especially welcomed! And it must have been relatively easy to break down afterwards, too. Those units can get wheeled away and stored until needed again.

Anyway–we had a great day, we hope the CHOP folks did as well. It really felt like the age of genomics!