Tag Archives: CNV

The PhenogramViz team illustrates how they analyze and visualize gene-phenotype relationships

Video Tip of the Week: PhenogramViz for evaluating phenotypes and CNVs

As I’ve mentioned before, once I start looking over some new tools I’m often led to others in the same arena that offer related but different features. That’s what happened when I looked at the Proband iPad app for human pedigrees. I noted that they are using important community standards, and I decided to follow those threads a bit. That led me to last week’s tip, the Human Phenotype Ontology (HPO).

HPO has been around for a while and I’ve been aware of it, but this recent re-investigation made me realize how mature it has become, and I was impressed with the amount of adoption there’s been in the genomics community in the big projects. But it also led me to some new tools that I hadn’t encountered before. This week’s tip highlights PhenogramViz–combining my appreciation for controlled vocabularies, standards, and data visualization.

The PhenogramViz team illustrates how they analyze and visualize gene-phenotype relationships

The PhenogramViz team illustrates how they analyze and visualize gene-phenotype relationships

Here’s now the PhenogramViz team describes their tool:

A tool that automatically analyses and visualizes gene-to-phenotype relations for a set of genes affected by CNV of a patient and a set of HPO-terms representing the symptoms of said patient. The tool makes full use of the cross-species phenotype ontology “uberpheno” (see here).

So if you have a patient with copy-number variation issues in their genome, you may be able to use this tool to lead to the genes in that CNV segment that convey certain phenotypes. So the goal–as stated in their paper linked below–is to assist with the clinical interpretation of the genome alterations.

The additional layer of this effort that I find useful is that they use another ontology to take this even further for supporting information. They employ the “Uberpheno” cross-species phenotype ontology to find further details in model organisms.

I’ll let you get a sense of how this works with one of their tutorial videos from their YouTube channel. They have others too–which will help you with different aspects on everything from installation to analyses. I’ll embed the one that shows how you start with a list of patient symptoms or phenotypes, then loading the CNVs or genes, then from the results list you can simply click for graphical representations of the gene-phenotype relationships. Then with the Cytoscape tools you can interact with the “phenograms” in more detail. There’s no sound, you can read the guidance in the callouts.

The videos include some abbreviations–like HPO. That’s why I talked last week about the Human Phenotype Ontology. I was prepping you for this one.  And in another video (Prioritization of pathogenic CNVs) they reference the scoring strategies, which you will find need further explanation in their paper linked below (Journal of Medical Genetics one). I would spend some time looking over how the scoring and ranking happens to understand what’s shown.

Although the focus of this is using the data for human diagnosis, I think it could also help researchers to choose more appropriate animal model for further testing. There are lots of complaints about the unsuitability of animal models for a range of subjects–but refining those choices would also be a huge benefit. Saving resources by helping to choose the right animal model would be another worthwhile use of this tool.

Check out PhenogramViz as a bridge between genomic segments and possible phenotypes. You can try it yourself with sample files they have available on their landing page.

Quick links:

PhenogramViz: http://compbio.charite.de/contao/index.php/phenoviz.html

Cytoscape: http://cytoscape.org/


Köhler, S., Doelken, S., Mungall, C., Bauer, S., Firth, H., Bailleul-Forestier, I., Black, G., Brown, D., Brudno, M., Campbell, J., FitzPatrick, D., Eppig, J., Jackson, A., Freson, K., Girdea, M., Helbig, I., Hurst, J., Jahn, J., Jackson, L., Kelly, A., Ledbetter, D., Mansour, S., Martin, C., Moss, C., Mumford, A., Ouwehand, W., Park, S., Riggs, E., Scott, R., Sisodiya, S., Vooren, S., Wapner, R., Wilkie, A., Wright, C., Vulto-van Silfhout, A., Leeuw, N., de Vries, B., Washingthon, N., Smith, C., Westerfield, M., Schofield, P., Ruef, B., Gkoutos, G., Haendel, M., Smedley, D., Lewis, S., & Robinson, P. (2013). The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data Nucleic Acids Research, 42 (D1) DOI: 10.1093/nar/gkt1026

Köhler S., Doelken S.C., Ruef B.J., Bauer S., Washington N., Westerfield M., Gkoutos G., Schofield P., Smedley D. & Lewis S.E. & (2013). Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research., F1000Research, PMID: http://www.ncbi.nlm.nih.gov/pubmed/24358873

Köhler, S., Schoeneberg, U., Czeschik, J., Doelken, S., Hehir-Kwa, J., Ibn-Salem, J., Mungall, C., Smedley, D., Haendel, M., & Robinson, P. (2014). Clinical interpretation of CNVs with cross-species phenotype data Journal of Medical Genetics, 51 (11), 766-772 DOI: 10.1136/jmedgenet-2014-102633

Shannon P., Markiel A., Ozier O., Baliga N.S., Wang J.T., Ramage D., Amin N., Schwikowski B. & Ideker T. Cytoscape: a software environment for integrated models of biomolecular interaction networks., Genome research, PMID: http://www.ncbi.nlm.nih.gov/pubmed/14597658

Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

Tip of the Week: MutaDATABASE, a centralized and standardized DNA variation database

We all know and love dbSNP, and DGV, and 1000 Genomes, and HapMap, and OMIM, and the couple of other dozen variation databases I can think of off the top of my head. But–even though there’s a lot of stuff out there–you never know what you aren’t seeing. What *isn’t* yet stored in those resources?  One new consortium suggests that there’s a lot you aren’t seeing. And they aim to make it easier to collect variation data, curate it, visualize it, and have it all in one place. The resource they are constructing is called MutaDATABASE.

MutaDATABASE is a new effort to bring together a lot of variation information that is just not getting into existing databases as it should be. The group is described as “a large consortium of diagnostic testing laboratories in Europe, the United States, Australia, and Asia.” In their Nature Biotechnology correspondence they describe many of the barriers facing deposition of new variants in databases. Among them are lack of incentive (or lack of pressure by publishers and other organizations), challenging/difficult software interfaces for submissions, privacy concerns for medical testing situations, and some desire to withhold novel variations as intellectual property. Not all of these issues can be overcome with some software, but they aim to try.

The structural organization of the consortium and contributor community that they wish to develop is described in this slide, which is like Figure 1 in the publication:

So there is a group of MutaAdministrators who oversee the project as a whole (this name makes me giggle a little bit–like a sci-fi government might be called…). There are MutaCurators who assemble and review data on a given gene (is it really just genes? what about non-genic regions and large deletions and such–this isn’t entirely clear to me). Clinicians can give input into the curation, and MutaCircles is a group of labs that do diagnostic testing for a gene that can also discuss, submit, evaluate data. The MutaCurator role is a gatekeeper and accountability on the final appearance.

The gene-specific collections will be freely available online in their database, and link to disease/phenotype information associated with those variations as well. In the tip-of-the-week movie I’ll show you an example of how you might expect a gene record to look when it’s been filled out to some extent.

MutaReviews is a separate aspect that they describe this way on the web site:

MutaREVIEWS is a new “Gene review journal ” published only online, which is freely available to all users. It consists of a compilation of gene review studies that describe the most common human disease genes in a standardised way and lists all observed gene variants. The variants include monogenic variants with high penetrance, rare variants with reduced penetrance and polymorphisms without clinical significance. Each gene review is edited by a specific MutaCURATOR for that gene. These gene reviews are updated every 6 months. There are 12 issues per year.

It’s certainly in the early stages of this project. A lot of the genes I checked just haven’t been curated yet, and I understand that. I hope it works out: I do like the organization and structure, and a one-stop-shop would be handy. But the “build a platform they will come and curate” system has had mixed success elsewhere around biology. And some of the things that need to happen for this to take off are philosophical or possibly legal barriers that are going to vary quite a bit around the research and genetic testing world.

One thing I’d like to see them do is permit and encourage citizen science curation by people who are adopters of personal genomics and looking at data, and by disease community groups who have a specific interest in these genes, but have even more barriers to contribution than the researchers often do.  I’ve found stuff from my genome scan that I don’t really have any place to take, and there’s no way to supplement records at that provider’s site as far as I know. But maybe that’s another variation project somewhere….

Anyway, have a look at MutaDATABASE and see what you think. Or if you participate in this project and I’ve not got some part of this right, drop a note in the comments. I know it’s early in the project and I may not have all the finer points in hand from my looking around and reading.

MutaDATABASE: http://www.mutadatabase.org/ (freely available online database with the variation content)

The sample gene that’s well filled out: http://www.mutareporter.org/mutareporter/Mutadatabase.html?showgene=L1CAM#

MutaReporter: http://www.mutabase.com/index.php?option=com_content&view=article&id=48&Itemid=54 (required license and user subscriptions; but supposedly the MutaDATABASE will have a function to submit that does not require use of this specifically, if I understood that correctly)

MutaBASE: http://www.mutabase.com A company associated with the MutaReporter software. (We have no relationship with that company)


Bale, S., Devisscher, M., Criekinge, W., Rehm, H., Decouttere, F., Nussbaum, R., Dunnen, J., & Willems, P. (2011). MutaDATABASE: a centralized and standardized DNA variation database Nature Biotechnology, 29 (2), 117-118 DOI: 10.1038/nbt.1772

Changes to DGV display standards (and yay standards!)

This notice came from DGV (Database of Genomic Variants) while I was on vacation last week, but I wanted to highlight this for a couple of reasons. First–it’s very cool that these groups have now chosen to establish a standard across databases for the representations of the copy-number variation displays.  But I also like that they are now also providing support for the red-green colorblind. As someone from a family of the colorblind, that’s something I like to be able to access.

Here’s the note from the mailing list:

As a result of discussions surrounding the representation of structural variants at the recent ISCA meeting, groups at DGV, NCBI and DECIPHER have decided to standardize colour schemes for gains and losses. Moving forward, deletions/losses will be displayed as red, gains/duplications will be displayed as blue. Regions where both gains and losses occur at the same locus will be represented as brown, and we will continue to represent inversions as purple(indigo). In addition to ensuring the colour schemes are consistent across databases, changes have also been implemented to ensure ease of use for individuals with red-green colour blindness.

And here’s a link to the announcement on their site: Notice of change to DGV display

We’ll be taking a look at our tutorials as these changes roll out, and we’ll be updating them appropriately.

Tip of the Week: Varietas. A plaid database.

For this week’s Tip of the Week I’ll introduce Varietas, a resource that integrates human variation information such as SNP and CNV data, and offers a handy tabular output with links to additional databases that will enable researchers to quickly explore other sources of information about the variations or regions of interest.

I think this is the first resource I’ve used from Finland. And it’s definitely the first resource I have used that is plaid. But it struck me that plaid is a pretty good conceptualization of the variations that we see in the genomes. Some are a single thread, some are larger sections, and the overlaps  between the variations we observed in the genome are important to our understanding of them as well. And the history of computation leads back to textile manufacturing, in fact. So I thought it was a pretty good concept.

But let’s explore the threads of Varietas.  You can read the paper which  is linked below, but here I’ll just summarize some of the main features. First  let me say the focus of this database appears to be human variation. Although you wouldn’t know that from the site very clearly. As far as I could tell there wasn’t any other species data. But if  you want human variation data, you’ll find a variety of threads available to you.  If you check out the About page, you’ll see the source data available includes Ensembl, the NHGRI GWAS catalog, SNPedia, and GAD.  These sources also provide OMIM data, HGNC nomenclature, phenotypes, and MeSH terms. And the threads out include dbSNP, PubMed, SNPedia, and WikiGenes as well. This is also summarized nicely in Figure 1 of their paper.

It’s a very straightforward interface. There is a basic search with a text box for quick searching, and you select the type of data you are starting with: SNPs, genes, keywords, or locations. And the output will be a table with the results that correspond to  your query.

If  you have larger sets of features that you want to interrogate you can use the advanced forms to enter more data.

The tabular output can be viewed on the web with all the handy links. Or you can download the data as a text file to be used in other ways.

I’ll demonstrate the sample search for the movie, but you won’t see the full range of data that’s available there. I wish they had samples for each type of search. But I found one sample that will also show CNV results: choose the Location radio button and enter this location range to see some CNV samples 6:1234-123400

Varietas home page: http://kokki.uku.fi/bioinformatics/varietas/

PubMed record for the paper: http://www.ncbi.nlm.nih.gov/pubmed/20671203


Paananen, J., Ciszek, R., & Wong, G. (2010). Varietas: a functional variation database portal Database, 2010 DOI: 10.1093/database/baq016

Friday SNPpets

Welcome to our Friday feature link dump: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

Sorry, wordpress ate the post during an update….will try to re-create ASAP

  • From BioTechniques: Human Microbiome Project publishes first data – way cool, I’ve been hearing about this project since I first created our IMG tutorial. It will be fun to check the data out! [Jennifer]
  • There’s a competition about the greatest discovery in bioinformatics.  It’s a small prize, and I don’t know this organization so I can’t verify that you’d get the prize, but the discussion is interesting and can’t hurt, I think.  I’ve entered my thoughts.  But if I won I’ll donate the money to Kiva. [Mary]
  • Researchers to Map Ozzy Osbourne’s Genome, Find Out Why He’s Alive – Um, ok, so are they using Keith Richard & Ronnie Wood’s genomes to confirm the findings? And what about the controls – people who didn’t survive massive over doses (Janice Joplin, Jimi Hendrix, Sid Vicious, etc.)? And don’t even get me started on the quack genetic products that might result… hat tip to Cyndy [Jennifer]
  • The Economist has a special report on human genomics. Favorite line so far: “The casual observer, then, might be forgiven for thinking the whole thing a damp squib, and the $3 billion spent on the project to be so much wasted money. But the casual observer would be wrong.” Chew on that Mandel. Hat tip to Eric Topol’s tweeting.  [Mary]
  • A new tool to associate CNVs with phenotypes is CONAN.  [Mary]
  • Splice. A scary cloning movie.  It’s funny–last month seemed to be breakthrough month. This month seems to be  disappointment and fear month. Sigh. [Mary]

Tip of the Week: International Cancer Genome Consortium

So, remember that tidal wave of data we were going to get from the human genome project?  Yeah.  That was a puddle compared to what’s coming your way now. For this week’s tip of the week I will introduce the very ambitious big data project from the International Cancer Genome Consortium (ICGC).  In addition, you’ll get your first look at the shiny new interface for BioMart!

People reading this blog know that we have made great progress on many fronts in the war on cancer.  But there’s an awful lot we don’t know yet.  The ICGC network of researchers plans to change that.  This international group of researchers has organized and standardized an effort to learn about tumors.  From their homepage:

ICGC Goal: To obtain a comprehensive description of genomic, transcriptomic and epigenomic changes in 50 different tumor types and/or subtypes which are of clinical and societal importance across the globe.

Check that out:

  • 50 tumor types.  Oh–and by the way–they will also obtain a normal tissue same from the same individual so you can see what’s part of the normal constitution and what has changed in the tumor.
  • Hundreds of samples of that tumor type.  Except for some rare tumors, they intend to obtain 500 of each tumor.
  • More than a dozen types of cancer. Breast, lung, brain, pancreas, liver, leukemia…and on and on.
  • Genomic. Transcriptomic. Epigenomic.  Each of these is a separate data set that needs to be obtained.  Oh, and already there are simple variations (small numbers of nucleotides), CNVs, structural re-arrangements, expression data….And that’s just the initial release.

Are you overwhelmed yet?  50 x 500 x more than a dozen x 3+ types of data (and that’s just back-of-the-napkin, there’s more…)  I am daunted just thinking about the scale of this.

They have organized and standardized the protocols, technologies, data collection, data submissions, and more.  You should check out their marker paper for a complete description of their framework.  They are going to make 2 types of data available: open access data that is de-identified.  And there is a controlled access data set with clinical details that you’ll have to register for access to.

Do note though: the data (like all these large data projects) is subject to data usage policies that you need to be aware of.  There is a publication moratorium that enables the data submitters a window to publish their findings before others are allowed to publish.  It’s that typical balance of rapid access to data + a non-scoop window for the data providers.  Be sure to familiarize yourself with the policies if you are going to use this data.

But let’s say you are ready for it–you understand the framework, you understand the usage policies–how do you get the data?  You use the very cool new interface for BioMart to do it!  This is your first opportunity to look at the GUI developed for BioMart v 0.8.  There’s more coming, this is an early version.  But that’s how you are going to be able to build great custom queries on the underlying data and pull it down.  You may be familiar with BioMart from any number of places now (Ensembl, Gramene, FlyBase, WormBase….more).  But this is the first implementation of the new look–you are going to want to check that out.

For this week’s Tip of the Week you’ll see the ICGC site, and a quick query of the initial data that is available in the Data Coordination Center (DCC).  But this is just an appetizer.  Brace yourselves–the deluge is coming.

A Nature News article offers a nice overview, but be sure to check out the full paper for the project details.

The International Cancer Genome Consortium site: http://icgc.org/

Oh, and this made me laugh:

Be sure to contact the ICGC team if you have any questions.  they want to help you to use this data, and will be happy to answer your questions.  Personally, I’m making it a mission to help them populate the FAQ–I’ve sent in questions.  And so far my answers have been quite speedy :)

Oy. The reference is longer than the blog post.  Sigh.

Hudson (Chairperson), T., Anderson, W., Aretz, A., Barker, A., Bell, C., Bernabé, R., Bhan, M., Calvo, F., Eerola, I., Gerhard, D., Guttmacher, A., Guyer, M., Hemsley, F., Jennings, J., Kerr, D., Klatt, P., Kolar, P., Kusuda, J., Lane, D., Laplace, F., Lu, Y., Nettekoven, G., Ozenberger, B., Peterson, J., Rao, T., Remacle, J., Schafer, A., Shibata, T., Stratton, M., Vockley, J., Watanabe, K., Yang, H., Yuen, M., Knoppers (Leader), B., Bobrow, M., Cambon-Thomsen, A., Dressler, L., Dyke, S., Joly, Y., Kato, K., Kennedy, K., Nicolás, P., Parker, M., Rial-Sebbag, E., Romeo-Casabona, C., Shaw, K., Wallace, S., Wiesner, G., Zeps, N., Lichter (Leader), P., Biankin, A., Chabannon, C., Chin, L., Clément, B., de Alava, E., Degos, F., Ferguson, M., Geary, P., Hayes, D., Hudson, T., Johns, A., Kasprzyk, A., Nakagawa, H., Penny, R., Piris, M., Sarin, R., Scarpa, A., Shibata, T., van de Vijver, M., Futreal (Leader), P., Aburatani, H., Bayés, M., Bowtell, D., Campbell, P., Estivill, X., Gerhard, D., Grimmond, S., Gut, I., Hirst, M., López-Otín, C., Majumder, P., Marra, M., McPherson, J., Nakagawa, H., Ning, Z., Puente, X., Ruan, Y., Shibata, T., Stratton, M., Stunnenberg, H., Swerdlow, H., Velculescu, V., Wilson, R., Xue, H., Yang, L., Spellman (Leader), P., Bader, G., Boutros, P., Campbell, P., Flicek, P., Getz, G., Guigó, R., Guo, G., Haussler, D., Heath, S., Hubbard, T., Jiang, T., Jones, S., Li, Q., López-Bigas, N., Luo, R., Muthuswamy, L., Francis Ouellette, B., Pearson, J., Puente, X., Quesada, V., Raphael, B., Sander, C., Shibata, T., Speed, T., Stein, L., Stuart, J., Teague, J., Totoki, Y., Tsunoda, T., Valencia, A., Wheeler, D., Wu, H., Zhao, S., Zhou, G., Stein (Leader), L., Guigó, R., Hubbard, T., Joly, Y., Jones, S., Kasprzyk, A., Lathrop, M., López-Bigas, N., Francis Ouellette, B., Spellman, P., Teague, J., Thomas, G., Valencia, A., Yoshida, T., Kennedy (Leader), K., Axton, M., Dyke, S., Futreal, P., Gerhard, D., Gunter, C., Guyer, M., Hudson, T., McPherson, J., Miller, L., Ozenberger, B., Shaw, K., Kasprzyk (Leader), A., Stein (Leader), L., Zhang, J., Haider, S., Wang, J., Yung, C., Cross, A., Liang, Y., Gnaneshan, S., Guberman, J., Hsu, J., Bobrow (Leader), M., Chalmers, D., Hasel, K., Joly, Y., Kaan, T., Kennedy, K., Knoppers, B., Lowrance, W., Masui, T., Nicolás, P., Rial-Sebbag, E., Lyman Rodriguez, L., Vergely, C., Yoshida, T., Grimmond (Leader), S., Biankin, A., Bowtell, D., Cloonan, N., deFazio, A., Eshleman, J., Etemadmoghadam, D., Gardiner, B., Kench, J., Scarpa, A., Sutherland, R., Tempero, M., Waddell, N., Wilson, P., McPherson (Leader), J., Gallinger, S., Tsao, M., Shaw, P., Petersen, G., Mukhopadhyay, D., Chin, L., DePinho, R., Thayer, S., Muthuswamy, L., Shazand, K., Beck, T., Sam, M., Timms, L., Ballin, V., Lu (Leader), Y., Ji, J., Zhang, X., Chen, F., Hu, X., Zhou, G., Yang, Q., Tian, G., Zhang, L., Xing, X., Li, X., Zhu, Z., Yu, Y., Yu, J., Yang, H., Lathrop (Leader), M., Tost, J., Brennan, P., Holcatova, I., Zaridze, D., Brazma, A., Egevad, L., Prokhortchouk, E., Elizabeth Banks, R., Uhlén, M., Cambon-Thomsen, A., Viksna, J., Ponten, F., Skryabin, K., Stratton (Leader), M., Futreal, P., Birney, E., Borg, A., Børresen-Dale, A., Caldas, C., Foekens, J., Martin, S., Reis-Filho, J., Richardson, A., Sotiriou, C., Stunnenberg, H., Thomas, G., van de Vijver, M., van’t Veer, L., Calvo (Leader), F., Birnbaum, D., Blanche, H., Boucher, P., Boyault, S., Chabannon, C., Gut, I., Masson-Jacquemier, J., Lathrop, M., Pauporté, I., Pivot, X., Vincent-Salomon, A., Tabone, E., Theillet, C., Thomas, G., Tost, J., Treilleux, I., Calvo (Leader), F., Bioulac-Sage, P., Clément, B., Decaens, T., Degos, F., Franco, D., Gut, I., Gut, M., Heath, S., Lathrop, M., Samuel, D., Thomas, G., Zucman-Rossi, J., Lichter (Leader), P., Eils (Leader), R., Brors, B., Korbel, J., Korshunov, A., Landgraf, P., Lehrach, H., Pfister, S., Radlwimmer, B., Reifenberger, G., Taylor, M., von Kalle, C., Majumder (Leader), P., Sarin, R., Rao, T., Bhan, M., Scarpa (Leader), A., Pederzoli, P., Lawlor, R., Delledonne, M., Bardelli, A., Biankin, A., Grimmond, S., Gress, T., Klimstra, D., Zamboni, G., Shibata (Leader), T., Nakamura, Y., Nakagawa, H., Kusuda, J., Tsunoda, T., Miyano, S., Aburatani, H., Kato, K., Fujimoto, A., Yoshida, T., Campo (Leader), E., López-Otín, C., Estivill, X., Guigó, R., de Sanjosé, S., Piris, M., Montserrat, E., González-Díaz, M., Puente, X., Jares, P., Valencia, A., Himmelbaue, H., Quesada, V., Bea, S., Stratton (Leader), M., Futreal, P., Campbell, P., Vincent-Salomon, A., Richardson, A., Reis-Filho, J., van de Vijver, M., Thomas, G., Masson-Jacquemier, J., Aparicio, S., Borg, A., Børresen-Dale, A., Caldas, C., Foekens, J., Stunnenberg, H., van’t Veer, L., Easton, D., Spellman, P., Martin, S., Barker, A., Chin, L., Collins, F., Compton, C., Ferguson, M., Gerhard, D., Getz, G., Gunter, C., Guttmacher, A., Guyer, M., Hayes, D., Lander, E., Ozenberger, B., Penny, R., Peterson, J., Sander, C., Shaw, K., Speed, T., Spellman, P., Vockley, J., Wheeler, D., Wilson, R., Hudson (Chairperson), T., Chin, L., Knoppers, B., Lander, E., Lichter, P., Stein, L., Stratton, M., Anderson, W., Barker, A., Bell, C., Bobrow, M., Burke, W., Collins, F., Compton, C., DePinho, R., Easton, D., Futreal, P., Gerhard, D., Green, A., Guyer, M., Hamilton, S., Hubbard, T., Kallioniemi, O., Kennedy, K., Ley, T., Liu, E., Lu, Y., Majumder, P., Marra, M., Ozenberger, B., Peterson, J., Schafer, A., Spellman, P., Stunnenberg, H., Wainwright, B., Wilson, R., & Yang, H. (2010). International network of cancer genome projects Nature, 464 (7291), 993-998 DOI: 10.1038/nature08987

FYI: NCBI Has Officially Announced New dbVar Resource

As I was browsing over NCBI’s homepage, I happened to notice an announcement dated March 2nd that stated that the dbVar resource that Mary mentioned briefly in a weekly tip a while back is now publicly available. Here’s the brief announcement:

Tuesday, March 02, 2010, 1:00:00 PM
NCBI’s new database of Genomic Structural Variation (dbVar) archives large scale genomic variation data as well as associations of defined variants with phenotypic information.

You can find this announcement and others here.

From the dbVar documentation, it looks like it is mostly in ‘collection mode’ at the moment with lots and lots of data being added, FAQs on how to submit to dbVar, and some background information on what structural variation is, and how it is detected. It looks like the actual graphical displays of the variations use NCBI’s Sequence Viewer. It will be interesting to see how this new NCBI resource grows and is utilized.

edit: 3/16 9am – links to dbVar all appear to be down today. We have an email in to NCBI & will keep you posted on anything that we hear from them.

edit 2: 3/16 1pm – The links to dbVar are working for me now. Thanks, NCBI, for the quick fix!

Guest Post: CHOP’s new tool, CNV Workshop – Xiaowu Gai

This next post in our continuing semi-regular Guest Post series is from Xiaowu Gai, the Bioinformatics Core Director at CHOP . If you are a provider of a free, publicly available genomics tool, database or resource and would like to convey something to users on our guest post feature, please feel free to contact us at wlathe AT openhelix DOT com.

Thanks to Mary for running a Tip of the Week – “CHOP CNV database” a couple of months back. CHOP CNV database is a high-resolution genome-wide survey of copy number variations of a large number (2,026) of apparently healthy individuals. It is publicly accessible and has been widely used by a large number of research groups world-wide. I am now pleased to announce the public release of our software system behind it: CNV Workshop. CNV Workshop is a suite of software tools that we have developed over the last a few years. It provides a comprehensive workflow for analyzing, managing, and visualizing genome copy number variation (CNV) data.

It can be used for almost any CNV research or clinical project by offering the following capabilities for both individual samples and cohort studies:

CNV identification
Implements a modified circular binary segmentation algorithm that reduces false positives
Fully configurable parameters for sensitivity/specificity management
Individual locus-specific annotations such as position, type of variation, call metrics, and overlap with CNVs of other data sets, including the Database of Genomic Variants.
Functional gene annotations such as genes affected and known disease associations
Accepts user-provided annotations
GBrowse-enabled visuals for querying, browsing, interpreting, and reporting CNVs
Export of results into Excel, XML, CSV, and BED files
Direct links to public resources such as the UCSC Genome Browser, NCBI Entrez, Entrez Gene, and FABLE
Project and Account Management
Authentication and permission scheme that is especially useful for clinical diagnostic settings
Analysis result sharing within and between projects
Simple Web-based administrative interface
Remote access and administration enabled

CNV Workshop currently accepts genotyping array data from Illumina’s 550k, 610- and 660-Quad, and Omni arrays, along with Affymetrix’s 5.0 and 6.0 arrays, and can be easily configured to accept data from other platforms. The package comes preloaded with publicly available reference data from more than 2,000 healthy control subjects (the CHOP CNV Database). CNV Workshop also allows the user to upload already processed CNV calls for annotation and presentation.

The software package is freely available at http://sourceforge.net/projects/cnv/. It is also described in more detailed in our recent paper on BMC Bioinformatics.

-Xiaowu Gai