A couple of weeks ago we brought you a tip of the week about the CHOP CNV Database. The same people who bring you that database also do FABLE (Fast Automated Biomedical Literature Extraction), a literature mining tool. The tool uses an advanced algorithm to find Human genes that are directly related to the keywords search on and then find literature on those genes. The tool has some great features and is a great way to quickly find the literature of a gene of interest. Today’s tip will give you a quick intro to the tool.
One of the hottest searches we see all the time is for more information on CNVs, or copy number variations. These intriguing structural variants in our genomes explain a lot of the reason that SNP hunting for complex diseases like schizophrenia and autism weren’t able to elucidate the problems as most people expected. These spectrum sorts of conditions were just not going to turn out as straightforward as the sickle-cell variation or the cystic fibrosis stories.
Resources to catalog and look at CNVs have developed. We have had a tutorial on DGV, the Database of Genomic Variants for some time (subscription required for tutorial). Just the other day I was looking around at the NCBI tool called dbVar, which has a nice diagrammatic overview of the kinds of structural variations CNVs represent (but I’m not sure I understand how to use the database yet–I’ll keep you posted ). Now there is also CHOP CNV.
Today I’ll be introducing you to the CHOP CNV resource. I heard about it at ASHG a couple of weeks ago, and decided to look into it. I had remembered hearing about the tool at one of the trainings we did at CHOP, but I wasn’t sure it was publicly available. Now I’m sure it is!
The publication associated with the CHOP CNV resource provides an overview of the strategy. The authors highlight the reason they developed this one–to use a uniform technology (Illumina chips to start, and then subsequent validation with other techniques) and to have a large sample set. They examine the genomes of over 2000 healthy individuals. The point of looking at healthy folks is that they form the reference set essentially: you can now take the samples from affected patients and subtract the things that healthy folks appear to share. This helps to narrow down your search for CNVs that might cause disease conditions. They offer various statistics on the types and sizes of the structural variants observed in the healthy population. It reminded me of another talk I heard at ASHG called “The first map of dispensable regions in the human genome” by Terry Vrijenhoek et al–which was a cool talk that began with a Facebook chat that had us all giggling–but the serious message was there’s a lot of missing genome healthy people appear to tolerate just fine….
The paper goes on to describe the creation of their web interface. Although I couldn’t find it mentioned in the paper, I asked one of the authors and my suspicion that it was based on GBrowse was confirmed–I thought the tracks and controls appeared “GBrowsy” to me. It shows the variations on the graphical display. The deletions are red, the duplications are blue. There is also a table that contains the data which you can color code to indicate uniqueness with green. And the table provides a column that summarizes the genes in that region (if there are some), and links to the UCSC Genome Browser in that region so you can choose to go there and examine the other genomic features in that region. When you have that loaded at UCSC, the data becomes a custom track that you can then examine with all the UCSC tools, including detailed queries with the table browser. It’s a nice example of a big data set from a publication getting displayed at UCSC for further query options.
Another nice feature of the tabular display is that it also links to FABLE. FABLE is a literature mining tool (Fast Automated Biomedical Literature Extraction) that will be searched for papers relating to the genes you find in that region–so you can quickly assess what’s known about a given gene in a CNV region.
They also include a compelling “application” as a way to illustrate how you can use the CHOP CNV resource to make discoveries. There was a clinical sample of a patient with a number of congenital anomalies. The CNV detection of the genomic sample indicated that 32 of the 35 variations this patient had existed in the healthy controls–which means that targeting the remaining 3 for further study provides a much more helpful focus on the likely issues. There were a couple of other examples of utility as well.
When I asked the CHOP CNV team some questions about their Figure 1 in the paper (it showed what appeared to be lab group names with data sets), I was told that new versions will be coming that will offer some new features–including an option to upload your own samples to compare them to their data set.
If you are interested in structural variations in the genome you should check out the CHOP CNV database. You might find some helpful information for your project! I almost forgot to note–you can download all the data as well, and use it with other data you may have or for other analysis tools.
Direct to the site: http://cnv.chop.edu/
Shaikh, T., Gai, X., Perin, J., Glessner, J., Xie, H., Murphy, K., O’Hara, R., Casalunovo, T., Conlin, L., D’Arcy, M., Frackelton, E., Geiger, E., Haldeman-Englert, C., Imielinski, M., Kim, C., Medne, L., Annaiah, K., Bradfield, J., Dabaghyan, E., Eckert, A., Onyiah, C., Ostapenko, S., Otieno, F., Santa, E., Shaner, J., Skraban, R., Smith, R., Elia, J., Goldmuntz, E., Spinner, N., Zackai, E., Chiavacci, R., Grundmeier, R., Rappaport, E., Grant, S., White, P., & Hakonarson, H. (2009). High-resolution mapping and analysis of copy number variations in the human genome: A data resource for clinical and research applications Genome Research, 19 (9), 1682-1690 DOI: 10.1101/gr.083501.108