Tag: CNV

Friday SNPpets

18 June, 2010 (08:48) | Genomics Research, SNPpets | By: Mary

Welcome to our Friday feature link dump: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

Sorry, wordpress ate the post during an update….will try to re-create ASAP

  • From BioTechniques: Human Microbiome Project publishes first data – way cool, I’ve been hearing about this project since I first created our IMG tutorial. It will be fun to check the data out! [Jennifer]
  • There’s a competition about the greatest discovery in bioinformatics.  It’s a small prize, and I don’t know this organization so I can’t verify that you’d get the prize, but the discussion is interesting and can’t hurt, I think.  I’ve entered my thoughts.  But if I won I’ll donate the money to Kiva. [Mary]
  • Researchers to Map Ozzy Osbourne’s Genome, Find Out Why He’s Alive – Um, ok, so are they using Keith Richard & Ronnie Wood’s genomes to confirm the findings? And what about the controls – people who didn’t survive massive over doses (Janice Joplin, Jimi Hendrix, Sid Vicious, etc.)? And don’t even get me started on the quack genetic products that might result… hat tip to Cyndy [Jennifer]
  • The Economist has a special report on human genomics. Favorite line so far: “The casual observer, then, might be forgiven for thinking the whole thing a damp squib, and the $3 billion spent on the project to be so much wasted money. But the casual observer would be wrong.” Chew on that Mandel. Hat tip to Eric Topol’s tweeting.  [Mary]
  • A new tool to associate CNVs with phenotypes is CONAN.  [Mary]
  • Splice. A scary cloning movie.  It’s funny–last month seemed to be breakthrough month. This month seems to be  disappointment and fear month. Sigh. [Mary]

Tip of the Week: International Cancer Genome Consortium

28 April, 2010 (09:10) | Genomics News, Genomics Research, Genomics Resource News, New Resource, Tip of the Week | By: Mary

So, remember that tidal wave of data we were going to get from the human genome project?  Yeah.  That was a puddle compared to what’s coming your way now. For this week’s tip of the week I will introduce the very ambitious big data project from the International Cancer Genome Consortium (ICGC).  In addition, you’ll get your first look at the shiny new interface for BioMart!

People reading this blog know that we have made great progress on many fronts in the war on cancer.  But there’s an awful lot we don’t know yet.  The ICGC network of researchers plans to change that.  This international group of researchers has organized and standardized an effort to learn about tumors.  From their homepage:

ICGC Goal: To obtain a comprehensive description of genomic, transcriptomic and epigenomic changes in 50 different tumor types and/or subtypes which are of clinical and societal importance across the globe.

Check that out:

  • 50 tumor types.  Oh–and by the way–they will also obtain a normal tissue same from the same individual so you can see what’s part of the normal constitution and what has changed in the tumor.
  • Hundreds of samples of that tumor type.  Except for some rare tumors, they intend to obtain 500 of each tumor.
  • More than a dozen types of cancer. Breast, lung, brain, pancreas, liver, leukemia…and on and on.
  • Genomic. Transcriptomic. Epigenomic.  Each of these is a separate data set that needs to be obtained.  Oh, and already there are simple variations (small numbers of nucleotides), CNVs, structural re-arrangements, expression data….And that’s just the initial release.

Are you overwhelmed yet?  50 x 500 x more than a dozen x 3+ types of data (and that’s just back-of-the-napkin, there’s more…)  I am daunted just thinking about the scale of this.

They have organized and standardized the protocols, technologies, data collection, data submissions, and more.  You should check out their marker paper for a complete description of their framework.  They are going to make 2 types of data available: open access data that is de-identified.  And there is a controlled access data set with clinical details that you’ll have to register for access to.

Do note though: the data (like all these large data projects) is subject to data usage policies that you need to be aware of.  There is a publication moratorium that enables the data submitters a window to publish their findings before others are allowed to publish.  It’s that typical balance of rapid access to data + a non-scoop window for the data providers.  Be sure to familiarize yourself with the policies if you are going to use this data.

But let’s say you are ready for it–you understand the framework, you understand the usage policies–how do you get the data?  You use the very cool new interface for BioMart to do it!  This is your first opportunity to look at the GUI developed for BioMart v 0.8.  There’s more coming, this is an early version.  But that’s how you are going to be able to build great custom queries on the underlying data and pull it down.  You may be familiar with BioMart from any number of places now (Ensembl, Gramene, FlyBase, WormBase….more).  But this is the first implementation of the new look–you are going to want to check that out.

For this week’s Tip of the Week you’ll see the ICGC site, and a quick query of the initial data that is available in the Data Coordination Center (DCC).  But this is just an appetizer.  Brace yourselves–the deluge is coming.

A Nature News article offers a nice overview, but be sure to check out the full paper for the project details.

The International Cancer Genome Consortium site: http://icgc.org/

Oh, and this made me laugh:

Be sure to contact the ICGC team if you have any questions.  they want to help you to use this data, and will be happy to answer your questions.  Personally, I’m making it a mission to help them populate the FAQ–I’ve sent in questions.  And so far my answers have been quite speedy :)

Oy. The reference is longer than the blog post.  Sigh.

Hudson (Chairperson), T., Anderson, W., Aretz, A., Barker, A., Bell, C., Bernabé, R., Bhan, M., Calvo, F., Eerola, I., Gerhard, D., Guttmacher, A., Guyer, M., Hemsley, F., Jennings, J., Kerr, D., Klatt, P., Kolar, P., Kusuda, J., Lane, D., Laplace, F., Lu, Y., Nettekoven, G., Ozenberger, B., Peterson, J., Rao, T., Remacle, J., Schafer, A., Shibata, T., Stratton, M., Vockley, J., Watanabe, K., Yang, H., Yuen, M., Knoppers (Leader), B., Bobrow, M., Cambon-Thomsen, A., Dressler, L., Dyke, S., Joly, Y., Kato, K., Kennedy, K., Nicolás, P., Parker, M., Rial-Sebbag, E., Romeo-Casabona, C., Shaw, K., Wallace, S., Wiesner, G., Zeps, N., Lichter (Leader), P., Biankin, A., Chabannon, C., Chin, L., Clément, B., de Alava, E., Degos, F., Ferguson, M., Geary, P., Hayes, D., Hudson, T., Johns, A., Kasprzyk, A., Nakagawa, H., Penny, R., Piris, M., Sarin, R., Scarpa, A., Shibata, T., van de Vijver, M., Futreal (Leader), P., Aburatani, H., Bayés, M., Bowtell, D., Campbell, P., Estivill, X., Gerhard, D., Grimmond, S., Gut, I., Hirst, M., López-Otín, C., Majumder, P., Marra, M., McPherson, J., Nakagawa, H., Ning, Z., Puente, X., Ruan, Y., Shibata, T., Stratton, M., Stunnenberg, H., Swerdlow, H., Velculescu, V., Wilson, R., Xue, H., Yang, L., Spellman (Leader), P., Bader, G., Boutros, P., Campbell, P., Flicek, P., Getz, G., Guigó, R., Guo, G., Haussler, D., Heath, S., Hubbard, T., Jiang, T., Jones, S., Li, Q., López-Bigas, N., Luo, R., Muthuswamy, L., Francis Ouellette, B., Pearson, J., Puente, X., Quesada, V., Raphael, B., Sander, C., Shibata, T., Speed, T., Stein, L., Stuart, J., Teague, J., Totoki, Y., Tsunoda, T., Valencia, A., Wheeler, D., Wu, H., Zhao, S., Zhou, G., Stein (Leader), L., Guigó, R., Hubbard, T., Joly, Y., Jones, S., Kasprzyk, A., Lathrop, M., López-Bigas, N., Francis Ouellette, B., Spellman, P., Teague, J., Thomas, G., Valencia, A., Yoshida, T., Kennedy (Leader), K., Axton, M., Dyke, S., Futreal, P., Gerhard, D., Gunter, C., Guyer, M., Hudson, T., McPherson, J., Miller, L., Ozenberger, B., Shaw, K., Kasprzyk (Leader), A., Stein (Leader), L., Zhang, J., Haider, S., Wang, J., Yung, C., Cross, A., Liang, Y., Gnaneshan, S., Guberman, J., Hsu, J., Bobrow (Leader), M., Chalmers, D., Hasel, K., Joly, Y., Kaan, T., Kennedy, K., Knoppers, B., Lowrance, W., Masui, T., Nicolás, P., Rial-Sebbag, E., Lyman Rodriguez, L., Vergely, C., Yoshida, T., Grimmond (Leader), S., Biankin, A., Bowtell, D., Cloonan, N., deFazio, A., Eshleman, J., Etemadmoghadam, D., Gardiner, B., Kench, J., Scarpa, A., Sutherland, R., Tempero, M., Waddell, N., Wilson, P., McPherson (Leader), J., Gallinger, S., Tsao, M., Shaw, P., Petersen, G., Mukhopadhyay, D., Chin, L., DePinho, R., Thayer, S., Muthuswamy, L., Shazand, K., Beck, T., Sam, M., Timms, L., Ballin, V., Lu (Leader), Y., Ji, J., Zhang, X., Chen, F., Hu, X., Zhou, G., Yang, Q., Tian, G., Zhang, L., Xing, X., Li, X., Zhu, Z., Yu, Y., Yu, J., Yang, H., Lathrop (Leader), M., Tost, J., Brennan, P., Holcatova, I., Zaridze, D., Brazma, A., Egevad, L., Prokhortchouk, E., Elizabeth Banks, R., Uhlén, M., Cambon-Thomsen, A., Viksna, J., Ponten, F., Skryabin, K., Stratton (Leader), M., Futreal, P., Birney, E., Borg, A., Børresen-Dale, A., Caldas, C., Foekens, J., Martin, S., Reis-Filho, J., Richardson, A., Sotiriou, C., Stunnenberg, H., Thomas, G., van de Vijver, M., van’t Veer, L., Calvo (Leader), F., Birnbaum, D., Blanche, H., Boucher, P., Boyault, S., Chabannon, C., Gut, I., Masson-Jacquemier, J., Lathrop, M., Pauporté, I., Pivot, X., Vincent-Salomon, A., Tabone, E., Theillet, C., Thomas, G., Tost, J., Treilleux, I., Calvo (Leader), F., Bioulac-Sage, P., Clément, B., Decaens, T., Degos, F., Franco, D., Gut, I., Gut, M., Heath, S., Lathrop, M., Samuel, D., Thomas, G., Zucman-Rossi, J., Lichter (Leader), P., Eils (Leader), R., Brors, B., Korbel, J., Korshunov, A., Landgraf, P., Lehrach, H., Pfister, S., Radlwimmer, B., Reifenberger, G., Taylor, M., von Kalle, C., Majumder (Leader), P., Sarin, R., Rao, T., Bhan, M., Scarpa (Leader), A., Pederzoli, P., Lawlor, R., Delledonne, M., Bardelli, A., Biankin, A., Grimmond, S., Gress, T., Klimstra, D., Zamboni, G., Shibata (Leader), T., Nakamura, Y., Nakagawa, H., Kusuda, J., Tsunoda, T., Miyano, S., Aburatani, H., Kato, K., Fujimoto, A., Yoshida, T., Campo (Leader), E., López-Otín, C., Estivill, X., Guigó, R., de Sanjosé, S., Piris, M., Montserrat, E., González-Díaz, M., Puente, X., Jares, P., Valencia, A., Himmelbaue, H., Quesada, V., Bea, S., Stratton (Leader), M., Futreal, P., Campbell, P., Vincent-Salomon, A., Richardson, A., Reis-Filho, J., van de Vijver, M., Thomas, G., Masson-Jacquemier, J., Aparicio, S., Borg, A., Børresen-Dale, A., Caldas, C., Foekens, J., Stunnenberg, H., van’t Veer, L., Easton, D., Spellman, P., Martin, S., Barker, A., Chin, L., Collins, F., Compton, C., Ferguson, M., Gerhard, D., Getz, G., Gunter, C., Guttmacher, A., Guyer, M., Hayes, D., Lander, E., Ozenberger, B., Penny, R., Peterson, J., Sander, C., Shaw, K., Speed, T., Spellman, P., Vockley, J., Wheeler, D., Wilson, R., Hudson (Chairperson), T., Chin, L., Knoppers, B., Lander, E., Lichter, P., Stein, L., Stratton, M., Anderson, W., Barker, A., Bell, C., Bobrow, M., Burke, W., Collins, F., Compton, C., DePinho, R., Easton, D., Futreal, P., Gerhard, D., Green, A., Guyer, M., Hamilton, S., Hubbard, T., Kallioniemi, O., Kennedy, K., Ley, T., Liu, E., Lu, Y., Majumder, P., Marra, M., Ozenberger, B., Peterson, J., Schafer, A., Spellman, P., Stunnenberg, H., Wainwright, B., Wilson, R., & Yang, H. (2010). International network of cancer genome projects Nature, 464 (7291), 993-998 DOI: 10.1038/nature08987

FYI: NCBI Has Officially Announced New dbVar Resource

15 March, 2010 (15:55) | Genomics News, Genomics Resource News, New Resource | By: Jennifer

As I was browsing over NCBI’s homepage, I happened to notice an announcement dated March 2nd that stated that the dbVar resource that Mary mentioned briefly in a weekly tip a while back is now publicly available. Here’s the brief announcement:

Tuesday, March 02, 2010, 1:00:00 PM
NCBI’s new database of Genomic Structural Variation (dbVar) archives large scale genomic variation data as well as associations of defined variants with phenotypic information.

You can find this announcement and others here.

From the dbVar documentation, it looks like it is mostly in ‘collection mode’ at the moment with lots and lots of data being added, FAQs on how to submit to dbVar, and some background information on what structural variation is, and how it is detected. It looks like the actual graphical displays of the variations use NCBI’s Sequence Viewer. It will be interesting to see how this new NCBI resource grows and is utilized.

edit: 3/16 9am – links to dbVar all appear to be down today. We have an email in to NCBI & will keep you posted on anything that we hear from them.

edit 2: 3/16 1pm – The links to dbVar are working for me now. Thanks, NCBI, for the quick fix!

Guest Post: CHOP’s new tool, CNV Workshop – Xiaowu Gai

2 March, 2010 (00:01) | Genomics Resource News, Guest Posts, New Resource | By: Guest

This next post in our continuing semi-regular Guest Post series is from Xiaowu Gai, the Bioinformatics Core Director at CHOP . If you are a provider of a free, publicly available genomics tool, database or resource and would like to convey something to users on our guest post feature, please feel free to contact us at wlathe AT openhelix DOT com.

Thanks to Mary for running a Tip of the Week – “CHOP CNV database” a couple of months back. CHOP CNV database is a high-resolution genome-wide survey of copy number variations of a large number (2,026) of apparently healthy individuals. It is publicly accessible and has been widely used by a large number of research groups world-wide. I am now pleased to announce the public release of our software system behind it: CNV Workshop. CNV Workshop is a suite of software tools that we have developed over the last a few years. It provides a comprehensive workflow for analyzing, managing, and visualizing genome copy number variation (CNV) data.

It can be used for almost any CNV research or clinical project by offering the following capabilities for both individual samples and cohort studies:

CNV identification
Implements a modified circular binary segmentation algorithm that reduces false positives
Fully configurable parameters for sensitivity/specificity management
Annotation
Individual locus-specific annotations such as position, type of variation, call metrics, and overlap with CNVs of other data sets, including the Database of Genomic Variants.
Functional gene annotations such as genes affected and known disease associations
Accepts user-provided annotations
Presentation
GBrowse-enabled visuals for querying, browsing, interpreting, and reporting CNVs
Export of results into Excel, XML, CSV, and BED files
Direct links to public resources such as the UCSC Genome Browser, NCBI Entrez, Entrez Gene, and FABLE
Project and Account Management
Authentication and permission scheme that is especially useful for clinical diagnostic settings
Analysis result sharing within and between projects
Simple Web-based administrative interface
Remote access and administration enabled

CNV Workshop currently accepts genotyping array data from Illumina’s 550k, 610- and 660-Quad, and Omni arrays, along with Affymetrix’s 5.0 and 6.0 arrays, and can be easily configured to accept data from other platforms. The package comes preloaded with publicly available reference data from more than 2,000 healthy control subjects (the CHOP CNV Database). CNV Workshop also allows the user to upload already processed CNV calls for annotation and presentation.

The software package is freely available at http://sourceforge.net/projects/cnv/. It is also described in more detailed in our recent paper on BMC Bioinformatics.

-Xiaowu Gai

Corn: 85% not corn, and missing big pieces

20 November, 2009 (10:51) | Genomics News, Genomics Research | By: Mary

popcornSo I’m all excited about the genome festival that I’m seeing, related to the publication of the new sequence version of corn. You can access the main paper in Science, and there’s a very neat diagram in figure 1 that is like looking across time at the sequence data and into the corn nebula.  But the thing that cracked me up was this line from the abstract:

Nearly 85% of the genome is composed of hundreds of families of transposable elements, dispersed nonuniformly across the genome.

That means 85% of corn isn’t corn!!  And what business do those elements have messing with the genomes??  I am told all the time that messing with plant genomes is wrong and unnatural.  Heh.

For full coverage of the big news today I’ll point you to James and the Giant Corn (appropriately enough) who seems to be the CNN (Corn News Network) of 24-hour coverage of many aspects of the work.

I spent my morning looking over the PLoS Maize Special Collection papers, including the intriguing appetizer:  10 Reasons to be Tantalized by the B73 Maize Genome.  But I spent longer looking at the CNVs and PAVs paper.  I’ve been thinking about CNVs a lot  lately, and was interested to see this covered in a non-mammalian species.

Figure 1 is a nice example of how to use VISTA for effective displays in comparative genomics.  (If you haven’t used VISTA before you might check out our sponsored free tutorial on that–we are currently working with the VISTA team to update that with their new features too.)

There’s a really striking segment of chromosome 6 that appears to be present in one of the strains they examine and absent in the other (illustrated in figure 4).  And it looks like it has genes that are expressed and active in the B73 strain.  The ongoing investigation of that is pretty intriguing as well.

The structural variations are not evenly distributed across the genomes.  Some places have large occurrences, and some are untouched.  It’s clear that just in these two strains there’s a lot more structural diversity than in other species that have been examined:

In the human, rat, dog, mouse, macaque and chimpanzee genomes the average number of CNVs between two individuals is between 15 and 75 [43]–[48]. A high resolution study of eight human genomes [49] revealed only several hundred insertions and deletions, including CNV and PAV sequences, in the comparison of any two human genomes. In contrast, even after very stringent filtering we identified >3,700 CNV or PAV sequences that represent at least 2,000 events between these two maize genomes.

Emphasis mine.  Plants are so much more flexible, apparently….

This is going to lead to some neat clues on heterosis (or hybrid vigor) as the research proceeds with these new tools.  What a great time to be a plant scientist.  There are some very exciting projects coming along with the tools of genomics.

What I couldn’t locate was any reference to a CNV database (like DGV or CHOP CNV) where you can examine the whole set.  I’ll dig through the supplement data to see if I can find out more on that.  But I wanted get this post out to celebrate the very nice work and collection of papers on this project. Congrats to the teams involved!

References:

Springer, N., Ying, K., Fu, Y., Ji, T., Yeh, C., Jia, Y., Wu, W., Richmond, T., Kitzman, J., Rosenbaum, H., Iniguez, A., Barbazuk, W., Jeddeloh, J., Nettleton, D., & Schnable, P. (2009). Maize Inbreds Exhibit High Levels of Copy Number Variation (CNV) and Presence/Absence Variation (PAV) in Genome Content PLoS Genetics, 5 (11) DOI: 10.1371/journal.pgen.1000734

Schnable, P., Ware, D., Fulton, R., Stein, J., Wei, F., Pasternak, S., Liang, C., Zhang, J., Fulton, L., Graves, T., Minx, P., Reily, A., Courtney, L., Kruchowski, S., Tomlinson, C., Strong, C., Delehaunty, K., Fronick, C., Courtney, B., Rock, S., Belter, E., Du, F., Kim, K., Abbott, R., Cotton, M., Levy, A., Marchetto, P., Ochoa, K., Jackson, S., Gillam, B., Chen, W., Yan, L., Higginbotham, J., Cardenas, M., Waligorski, J., Applebaum, E., Phelps, L., Falcone, J., Kanchi, K., Thane, T., Scimone, A., Thane, N., Henke, J., Wang, T., Ruppert, J., Shah, N., Rotter, K., Hodges, J., Ingenthron, E., Cordes, M., Kohlberg, S., Sgro, J., Delgado, B., Mead, K., Chinwalla, A., Leonard, S., Crouse, K., Collura, K., Kudrna, D., Currie, J., He, R., Angelova, A., Rajasekar, S., Mueller, T., Lomeli, R., Scara, G., Ko, A., Delaney, K., Wissotski, M., Lopez, G., Campos, D., Braidotti, M., Ashley, E., Golser, W., Kim, H., Lee, S., Lin, J., Dujmic, Z., Kim, W., Talag, J., Zuccolo, A., Fan, C., Sebastian, A., Kramer, M., Spiegel, L., Nascimento, L., Zutavern, T., Miller, B., Ambroise, C., Muller, S., Spooner, W., Narechania, A., Ren, L., Wei, S., Kumari, S., Faga, B., Levy, M., McMahan, L., Van Buren, P., Vaughn, M., Ying, K., Yeh, C., Emrich, S., Jia, Y., Kalyanaraman, A., Hsia, A., Barbazuk, W., Baucom, R., Brutnell, T., Carpita, N., Chaparro, C., Chia, J., Deragon, J., Estill, J., Fu, Y., Jeddeloh, J., Han, Y., Lee, H., Li, P., Lisch, D., Liu, S., Liu, Z., Nagel, D., McCann, M., SanMiguel, P., Myers, A., Nettleton, D., Nguyen, J., Penning, B., Ponnala, L., Schneider, K., Schwartz, D., Sharma, A., Soderlund, C., Springer, N., Sun, Q., Wang, H., Waterman, M., Westerman, R., Wolfgruber, T., Yang, L., Yu, Y., Zhang, L., Zhou, S., Zhu, Q., Bennetzen, J., Dawe, R., Jiang, J., Jiang, N., Presting, G., Wessler, S., Aluru, S., Martienssen, R., Clifton, S., McCombie, W., Wing, R., & Wilson, R. (2009). The B73 Maize Genome: Complexity, Diversity, and Dynamics Science, 326 (5956), 1112-1115 DOI: 10.1126/science.1178534

Tip of the Week: Fable, text mining for literature on human genes

18 November, 2009 (10:57) | Tip of the Week | By: Trey

fable_thumb A couple of weeks ago we brought you a tip of the week about the CHOP CNV Database. The same people who bring you that database also do FABLE (Fast Automated Biomedical Literature Extraction), a literature mining tool. The tool uses an advanced algorithm to find Human genes that are directly related to the keywords search on and then find literature on those genes. The tool has some great features and is a great way to quickly find  the literature of a gene of interest. Today’s tip will give you a quick intro to the tool.

Tip of the Week: CHOP CNV database

4 November, 2009 (07:45) | Genomics Research, Genomics Resource News, New Resource, Tip of the Week | By: Mary

chop_cnv_tipOne of the hottest searches we see all the time is for more information on CNVs, or copy number variations.  These intriguing structural variants in our genomes explain a lot of the reason that SNP hunting for complex diseases like schizophrenia and autism weren’t able to elucidate the problems as most people expected.  These spectrum sorts of conditions were just not going to turn out as straightforward as the sickle-cell variation or the cystic fibrosis stories.

Resources to catalog and look at CNVs have developed.  We have had a tutorial on DGV, the Database of Genomic Variants for some time (subscription required for tutorial).  Just the other day I was looking around at the NCBI tool called dbVar, which has a nice diagrammatic overview of the kinds of structural variations CNVs represent (but I’m not sure I understand how to use the database yet–I’ll keep you posted :) ). Now there is also CHOP CNV.

Today I’ll be introducing you to the CHOP CNV resource.  I heard about it at ASHG a couple of weeks ago, and decided to look into it.  I had remembered hearing about the tool at one of the trainings we did at CHOP, but I wasn’t sure it was publicly available.  Now I’m sure it is!

The publication associated with the CHOP CNV resource provides an overview of the  strategy. The authors highlight the reason they developed this one–to use a uniform technology (Illumina chips to start, and then subsequent validation with other techniques) and to have a large sample set.  They examine the genomes of over 2000 healthy individuals.  The point of looking at healthy folks is that they form the reference set essentially: you can now take the samples from affected patients and subtract the things that healthy folks appear to share.  This helps to narrow down your search for CNVs that might cause disease conditions.  They offer various statistics on the types and sizes of the structural variants observed in the healthy population.  It reminded me of another talk I heard at ASHG called “The first map of dispensable regions in the human genome” by Terry Vrijenhoek et al–which was a cool talk that began with a Facebook chat that had us all giggling–but the serious message was there’s a lot of missing genome healthy people appear to tolerate just fine….

The paper goes on to describe the creation of their web interface.  Although I couldn’t find it mentioned in the paper, I asked one of the authors and my suspicion that it was based on GBrowse was confirmed–I thought the tracks and controls appeared “GBrowsy” to me.  It shows the variations on the graphical display.  The deletions are red, the duplications are blue.  There is also a table that contains the data which you can color code to indicate uniqueness with green.  And the table provides a column that summarizes the genes in that region (if there are some), and links to the UCSC Genome Browser in that region so you can choose to go there and examine the other genomic features in that region.  When you have that loaded at UCSC, the data becomes a custom track that you can then examine with all the UCSC tools, including detailed queries with the table browser.  It’s a nice example of a big data set from a publication getting displayed at UCSC for further query options.

Another nice feature of the tabular display is that it also links to FABLE.  FABLE is a literature mining tool (Fast Automated Biomedical Literature Extraction) that will be searched for papers relating to the genes you find in that region–so you can quickly assess what’s known about a given gene in a CNV region.

They also include a compelling “application” as a way to illustrate how you can use the CHOP CNV resource to make discoveries.  There was a clinical sample of a patient with a number of congenital anomalies.  The CNV detection of the genomic sample indicated that 32 of the 35 variations this patient had existed in the healthy controls–which means that targeting the remaining 3 for further study provides a much more helpful focus on the likely issues.  There were a couple of other examples of utility as well.

When I asked the CHOP CNV team some questions about their Figure 1 in the paper (it showed what appeared to be lab group names with data sets), I was told that new versions will be coming that will offer some new features–including an option to upload your own samples to compare them to their data set.

If you are interested in structural variations in the genome you should check out the CHOP CNV database.  You might find some helpful information for your project!  I almost forgot to note–you can download all the data as well, and use it with other data you may have or for other analysis tools.

Direct to the site: http://cnv.chop.edu/

++++++++++++++++++
Shaikh, T., Gai, X., Perin, J., Glessner, J., Xie, H., Murphy, K., O’Hara, R., Casalunovo, T., Conlin, L., D’Arcy, M., Frackelton, E., Geiger, E., Haldeman-Englert, C., Imielinski, M., Kim, C., Medne, L., Annaiah, K., Bradfield, J., Dabaghyan, E., Eckert, A., Onyiah, C., Ostapenko, S., Otieno, F., Santa, E., Shaner, J., Skraban, R., Smith, R., Elia, J., Goldmuntz, E., Spinner, N., Zackai, E., Chiavacci, R., Grundmeier, R., Rappaport, E., Grant, S., White, P., & Hakonarson, H. (2009). High-resolution mapping and analysis of copy number variations in the human genome: A data resource for clinical and research applications Genome Research, 19 (9), 1682-1690 DOI: 10.1101/gr.083501.108

UCSC Genome Brower on TV

17 July, 2009 (10:38) | General Science, Genomics Research | By: Mary

Sometimes I complain about my Tivo.  It thinks I like to cook and that I speak Portuguese.  Neither of these things are correct (not that there’s anything wrong with that).  But it must also know that I like science because it does find some nuggets that are suitable.  And when I got back from a recent road trip of trainings on the UCSC Genome Browser it cracked me up to see the browser on my TV!

The segment is about finding genes related to autism.  As a popular press sort of story it doesn’t quite get all the science right.  There’s some phrasing of the description of autism that I think is incorrect–or misleading as it was described.  But they talk about the importance of collections of DNA from affected families, they interview Mark Daly and Rudy Tanzi, and they show some software that identfies CNVs (copy number variations) and Rudy shows genes on the UCSC Genome Browser.

I like to see working scientists doing this kind of outreach.  I think it is something we need more of.  Click the image to go to the site and watch the piece (about 15 minutes long).

pbs_now_ucsc

Specific episode and segment link:  http://www.pbs.org/wgbh/nova/sciencenow/0402/04.html

DGV releases a pre-publication data set

5 May, 2009 (08:10) | Genomics Research, Genomics Resource News | By: Mary

I got my newsletter for May from the Database of Genomic Variants, or DGV.  They announce the availability of a large data set of variants from HapMap individuals.  There are more than 8000 variations available in this set.

It’s not peer-reviewed at this point, so keep that in mind.  But if you are eager for new CNVs (copy number variations), you may want to have a look.

This data are released in DGV pre-publication, and we will therefore not incorporate these regions with the rest of the data in DGV (which has all gone through peer-review).
At this stage, the data will be made available through DGV in two ways. The entire data set will be available as a text file for download on the DGV download page, and it will be shown as a separate track in the DGV browser under the heading “Provisional data release from the Genome Structural Variation Consortium”, in a track with the name “NG42M_CNV (CNVE)”.

The data is subject to the “Fort Lauderdale” non-scoop rules:  you can use the data, but the data’s owners reserve the right to publish on global aspects of the data set first. You can see more on the details of use here: http://projects.tcag.ca/variation/ng42m_cnv.php

You can access DGV here: http://projects.tcag.ca/variation/

The newsletter with the details links from the bottom of the homepage.  Here’s a link to that (warning, PDF): http://projects.tcag.ca/variation/DGV_Newsletter.pdf

How do you represent genomes?

9 March, 2009 (16:49) | Genomics Research | By: Trey

cnv_1Not just the genome, but genomeS. As Jan at Saaien Tist has mentioned, human (and other species) genomes are quiet variable. Though the linear representation of genome browsers makes perfect sense (like the UCSC Genome Browser, Ensembl, GBrowse and MapViewer among others) for much annotated data of the genome, structural variations are not so well visualized in a linear representation. And, as we are find the human and other specie genomes are quite variable, we might need to come up with another way to visualize these genomic data beyond the ‘reference genome’ linear model. Jan suggests deBruijn graphs,
pictured here. I find some difficulty in ‘visualizing’ how these are going to work for the _other_ annotations in the data. Though this representation looks like it might work great for CNV and the like, it seems to make viewing other types of data (expression, SNP, etc) more complicated. I’m looking forward to see how this develops.

Or perhaps we’ll be looking at genomes like this (ok, maybe not, but it’s geeky cool).