Category: New Resource

Tip of the Week: 1000 Genomes Project Browser

21 July, 2010 (08:27) | Genomics Research, New Resource, Tip of the Week | By: Mary


You may have been hearing about the 1000 Genomes project–it’s one of the ongoing “big data” projects that is going to yield a great deal of variation information about the human genome. The goal is to sequence well over1000 genomes to identify “most genetic variants that have frequencies of at least 1% in the populations studied”.  They are doing this by sequencing large numbers of samples  with 4x coverage. You can read more about their strategy in their About page on their web site. It also lists the anticipated sample populations.

In this week’s Tip of the Week I’m going to take a quick spin through their browser. (You can also download all the data, but I’ll be focusing on the browser.) They have begun to release data now, and there are 6 individual sequences available at this time.  These are part of their “pilot” studies.  You can get some details on the pilot from their about page, which links to this PDF about the samples.

They are using the Ensembl framework to display their data. So if you are familiar with using Ensembl you’ll have some facility moving around this browser.  One thing that isn’t apparent right away from the site is that you can click the Resembl link on the display to turn on a track that puts the read/coverage data on the viewer. I also liked the alignment display  of all 6 genomes–but I’m sure that’s going to get challenging to view later with more and more genomes.

In an exchange with their very helpful help desk yesterday, I got this quick summary of the samples you’ll see:

For the high coverage populations NA12891, NA12892 and NA12878 are the CEU trio, NA19238, NA19239 and NA19240 are the YRI trio both father, mother, child respectively and both children were daughters.

If you have questions about their data, be sure to go ask them for help–they were very speedy with answers for me :) .

Some of the project data has also been picked up by UCSC and you can access the same sequences in the UCSC Genome Browser in the Genome Variants track on the March 2006 human assembly. (You’ll also see Venter, Watson, and some other individual genomes there).

Quick links:

The Project: http://www.1000genomes.org/

The Browser: http://browser.1000genomes.org/

An article in Science with some background:  A Plan to Capture Human Diversity in 1000 Genomes

We’ve got widgets

28 June, 2010 (13:57) | Genomics Research, New Resource | By: Trey

I’ve mentioned others’ widgets before. They can be very handy tools on websites and blogs to add content and useful interactive searches, etc.

Well, we now have our own. As many of our readers know, we have a genomics and bioinformatics search engine that helps the researcher find the database or analysis tool that best fits their need. Type in a term and you get a list of genomics resources that are queued in rank of relevancy. In addition, you are shown where in context (the resource web site, or in our tutorials or blog if there) where the term was found. Additionally, you’ll find tutorials we’ve created on nearly 100 of them, about a dozen free to the user like PDB, SGKB, UCSC Genome Browser, and another 80 or so by subscription.

Anyway, you can now put the search (which of course is publicly available) on your blog or web site using one the widgets we’ve just had created (by the same people who helped create our database search). We have three sizes and you can find them and the code for them at this page.

You’ll also see I’ve put the smaller widget on the right column here on the blog. You can put a term in there and test it out. It will open another page with the results of our search. Try it out!

Guest Post: SNAP — Andrew Johnson

22 June, 2010 (14:01) | Genomics Research, Genomics Resource News, Guest Posts, New Resource | By: Trey

This next post in our continuing semi-regular Guest Post series is from Andrew Johnson, one of the developers and the concept designer of SNAP, SNP Annotation and Proxy Search which is hosted at the Broad Institute. If you are a provider of a free, publicly available genomics tool, database or resource and would like to convey something to users on our guest post feature, please feel free to contact us at wlathe AT openhelix DOT com or the contact form (write ‘guest post’ as subject heading). We welcome introductions to your resource, information on updates, highlights of little known gems or opinion pieces on the state of genomic research and databases.

SNAP (http://www.broadinstitute.org/mpg/snap/, Johnson et al. (2008) Bioinformatics 24(24): 2938), “SNP Annotation and Proxy search”, is a flexible, web-based tool that allows anyone in the world to quickly accomplish a range of SNP-related genetics and bioinformatics tasks. This post highlights some common questions andfeatures of SNAP, some more obscure uses, and recent and planned developments.

How did SNAP come about?

The idea for SNAP was originally sparked by GWAS analysts within a large collaborative group (the Framingham Heart Study SHARe project). This was in the pre-imputation era when GWAS investigators from different groups using different SNP arrays often wanted to find best proxy SNPs based on HapMap for comparison when they didn’t have common genotyped SNPs across groups. We initially implemented local programs to lookup upHapMap LD and also consider the presence of query and proxy SNPs on different commercial genotyping arrays. We quickly realized this was a community-wide problem as we received requests from outside collaborators so we decided it was worth developing a public tool and approached investigators at the Broad Institute. Through collaboration with Paul de Bakker, Bob Handsaker and others at the Broad Institute we were able to add more features like plotting and build a nice, quick and accessible interface. Many people have contributed ideas, testingand improvements to SNAP, and Bob Handsaker and Pei Lin in particular continue to maintain and update SNAP.

What do you use SNAP for the most?

The two major features of SNAP widely used 1) SNP LD queries, and 2) plotting of LD and association data. There are a number of flexible options for these functions. Beyond these, as a SNP bioinformatics specialist, I often use SNAP to rapidly retrieve information about a list of SNPs for other uses (see specialized queries below).

What are some commonly asked questions from users of SNAP?

Click to continue reading “Guest Post: SNAP — Andrew Johnson”

Tip of the Week: BioCatalogue for finding web services

16 June, 2010 (08:45) | Genomics Research, Genomics Resource News, New Resource, Tip of the Week | By: Mary

A couple of years back at a conference I was introduced to BioCatalogue.  It seemed to me to be a really useful idea: locate bioinformatics tools and databases that are web-accessible, and that also have a mechanism to use the web service features to access the tool/server using strategies that don’t require the main web interface of the site.  There are some introductions  to the concept of web services out there–some of them are more for introduction, but most are aimed at programmers.  Essentially it is kind of a back door into the tool, and lets you pull the information you need out in ways that you want–not constrained by the main user interface.

BioCatalogue is a curated collection of these web services.  The creators  of BioCatalogue provide the framework and perform some of the  collection and annotation–but they also enable the user community to bring in web services and annotate them as well.  This means that you can use BioCatalogue to find and learn more about the services, and you can feed back into the system as well if you join the community.  If you are a software provider you can register your service there–so more people can locate you and learn about your project.  Another really nice aspect of BioCatalogue is that they monitor the services.  As we know at OpenHelix–plenty of times a tool you have accessed in the past is suddenly unavailable.  Sometimes they are intermittent server problems, but sometimes they are longer-term issues.  BioCatalogue is regularly checking  the status of the tools so you can have confidence that the tool has been up and seems stable.

The Web Server issue (see the 2009 issue here) of Nucleic Acids Research provides a wealth of  information about useful servers with bioinformatics tools.  And there’s a paper for the 2010 Server  issue about BioCatalogue that will offer more details on the background (linked below).  In this week’s movie I can only briefly introduce the site and the features available.  Check out the paper from the BioCatalogue team, and explore the documentation wiki to learn more about the features and functions that are  provided.

Now, these web services are not for everyone.  For many people the main user interface will still be the best mechanism to access a tool. But if you need more advanced or customized queries, or if you want to create inflows into your own tools, or if you want to use some of the cool work flow software that’s  out there now (such as Galaxy or Taverna)–web services may be right for you.

Check out BioCatalogue  (and remember the -ue spelling!) http://www.biocatalogue.org/

Bhagat, J., Tanoh, F., Nzuobontane, E., Laurent, T., Orlowski, J., Roos, M., Wolstencroft, K., Aleksejevs, S., Stevens, R., Pettifer, S., Lopez, R., & Goble, C. (2010). BioCatalogue: a universal catalogue of web services for the life sciences Nucleic Acids Research DOI: 10.1093/nar/gkq394

Tip of the Week: Database of mouse databases

2 June, 2010 (09:15) | Genomics Resource News, New Resource, Tip of the Week | By: Mary


We are acutely aware of the thousands of bioinformatics resources out there, and we are often asked for guidance on finding a particular type of tool for some function or other.  There are some excellent lists out there which attempt to catalog the various tools–the NAR Database Issue and corresponding list, the Resource Collection at the Univ. of Pittsburg, and others.  But recently we saw one developed with a specific focus, which claims to bring together over 200 resources for the mouse.  The Mouse Resource Browser collects and categorizes a number of different types of things–not just databases, as we’ll see.  Find them here: http://bioit.fleming.gr/mrb

The curated collection of sites that may  be of use to mouse researchers has a number of features.  The developers used a questionnaire to elicit some information from the resource providers, and when they don’t have that input they have created some basic information for the records themselves. You can do a basic search for resources with a quick search box. There is an advanced search option.  I found the option of browsing by category (they have 22 categories) the most informative to figure out what kind of resources they had collected.

The data for a given record is organized across a series of tabs:

  • General: description, highlights and subject matter of the resource
  • Ontologies and Standards: if the resource relies on any of the important vocabularies or standards formats in the field, they are listed here
  • Technical: details of implementation, type of database, access methods, if there is a web services component, whether there are downloads or not
  • CASIMIR DDF: this is an interesting tab that assesses some of the features of the resources such as currency/updates, quality control process, versioning, technical documentation, user support, and more.

Although the focus is mouse, you’ll see some more broad types of resources in there.  For example, UCSC Genome Browser is listed as there is a mouse database there.  Reactome is listed.  These have a species range and include mouse, but are certainly not focused on mouses.  Other types of resources include commercial suppliers such as Charles River. So it isn’t limited just to things like sequence databases and things of that nature–it’s got more aspects that researchers employing mouse as a model system might find useful.

There are some choices they have made that I’m not sure I would have.  They list the MGI mailing list as a separate feature from MGI.  But as I thought more about it, I could see why.  There is good information there, and if you don’t know of it already a pointer might help.  But as I was thinking of the 200+ resources just for mouse, I thought that sort of affected the total.

If you use mouse as your model system, you will probably find some useful databases and other web sites that are handy for your work.  If you don’t work with mice, there are probably still some useful resources for your work as well.  Check out MRB’s site for more information: http://bioit.fleming.gr/mrb

Reference:
Zouberakis, M., Chandras, C., Swertz, M., Smedley, D., Gruenberger, M., Bard, J., Schughart, K., Rosenthal, N., Hancock, J., Schofield, P., Kollias, G., & Aidinis, V. (2010). Mouse Resource Browser–a database of mouse databases Database, 2010 DOI: 10.1093/database/baq010

Guest Post: WAVe – Pedro Lopes

25 May, 2010 (00:02) | Genomics Resource News, Guest Posts, New Resource | By: Guest

This next post in our continuing semi-regular Guest Post series is from Pedro Lopez, developer of WAVe at the University of Aveiro Bioinformatic Group in Aveiro Portugal. If you are a provider of a free, publicly available genomics tool, database or resource and would like to convey something to users on our guest post feature, please feel free to contact us at wlathe AT openhelix DOT com or the contact form (write ‘guest post’ as subject heading). We welcome introductions to your resource, information on updates, highlights of little known gems or opinion pieces on the state of genomic research and databases.

I would like to start by thanking Trey Lathe  for the opportunity to promote WAVe in this great blog. After his short tip of the week post, I’ll now try to make a more detailed overview of this new application.

What is WAVe?

WAVe stands for Web Analysis of the Variome and is a simple application focused on centralizing the access to distributed and heterogeneous locus-specific databases (LSDB). LSDBs are an emerging type of bioinformatics applications, aiming at providing gene-centric information regarding discovered genomic variants. In WAVe, we offer both LSDBs as well as to its variants. Moreover, we also provide access to a comprehensive list of carefully selected external resources. With this, users have, in a single application, access to gene and variation information enriched with a multitude of gene-related resources in a lightweight and easy to use web application.

What are WAVe’s key features?

At this early stage, WAVe’s publicly available features are related with data access. Users can easily browse through available genes, search for genes, view gene info and access each gene RSS feed. In WAVe’s entry page, users simply need to start typing a gene HGNC-approved symbol and several suggestions will appear: accepting one of them leads directly to the gene view page. Following the view all link, users can browse all available genes or check, for each gene, how many LSDBs and variants are available.

To access the application data, users just need to navigate in the gene tree. Each tree node represents a distinct data type and the various leaf provide access to external applications: by clicking a leaf, the destination page is loaded in the main content area. Repeating this process, users can navigate in the dozens of listed links for each gene.

WAVe also offers its core data to other developers. To obtain the gene tree and its links, users just need to add the rss tag to the end of gene address. This will output a RSS2.0 feed that can be easily parsed by any application or added to a feed reader.

How was WAVe born?

The european GEN2PHEN project is an initiative to link, as deeply as possible, data from genotype features to its phenotype counterparts. The first step consisted in an attempt to improve various genomic variation resource scenarios. This implied normalizing LSDBs (the “LSDB-in-a-box” approach, LOVD) and defining novel data models and formats for data exchanges from and to LSDBs.

In a long term perspective, applying the GEN2PHEN-approved data models, will enhance the creation of new services and applications to integrate and interact with the exponentially growing dataset of genomic variation data.

With WAVe we tried a different approach based on three questions: why wait for everyone to adopt these new formats? What will happen to legacy LSDBs that won’t adopt the new formats? How can we have an immediate solution? We have created a lightweight integration architecture, based on links to applications and adopted a simple (yet familiar) tree-based navigation interaction to deploy a new application that can be used right now and will easily scale to integrate the foreseen data exchanges formats. Technical details aside, based on a manually curated LSDB list, we can connect and integrated any kind of LSDB application whether it is a modern LOVD application or a simple text-based legacy LSDB.

How is it relevant?

To demo WAVe efficiency let’s just try to perform a simple search in our lab: Are there any LSDBs for COL3A1 gene in the human species? And known variants? And what are the associated proteins and pathways?

In a WAVe-free scenario, to find out COL3A1 LSDBs (if any), researchers need to google it (the main COL3A1 LSDB does not appear in the first result page) or, if you they are used to it, go to HGVS site, go to the “Databases & Tools” section, select “Locus-specific Mutation Databases” and then search for the gene in search box. Now for the variants researchers just need to browse the last page they’ve just entered. How many clicks (and time!) does it take?

For protein information, researchers enter in UniProt and search for COL3A1: that gives about 29 results. Add a filter for the human species and there are 5 results. Good enough to access directly to P02461 (SwissProt reviewed). Though, there is new window/tab open. Now for pathway information, a KEGG quick search for COL3A1 lists 14 results. In the end, there are about 3 windows/tabs and made some 20 mouse clicks to obtain the desired information.

Using WAVe, researchers simply need to access WAVe, start typing the gene HGNC symbol, select COL3A1 from the suggestions and access COL3A1 page. Once in the page, it’s as easy as browsing in the tree… Variations? Check the variation node, they’re even grouped according to the change type. UniProt information? Check the protein node where you have direct access to SwissProt, TrEMBL, PDB, Expasy and InterPro. And I guess you get the picture. In the end, one window/tab and about 6/7 mouse clicks.

Other UA.PT Bioinformatics tools

At the University of Aveiro’s Bioinformatics research group we are mainly young and enthusiast computer science experts, simply trying to make biology easier (at least in terms of computer applications!). Our more relevant web-based tools include MIND (a microarray analysis tool), GeneBrowser (a gene expression tools, useful to process data gathered from systems like MIND) and QuExT (a comprehensive MEDLINE mining application).

-Pedro Lopes

Tip of the Week: International Cancer Genome Consortium

28 April, 2010 (09:10) | Genomics News, Genomics Research, Genomics Resource News, New Resource, Tip of the Week | By: Mary

So, remember that tidal wave of data we were going to get from the human genome project?  Yeah.  That was a puddle compared to what’s coming your way now. For this week’s tip of the week I will introduce the very ambitious big data project from the International Cancer Genome Consortium (ICGC).  In addition, you’ll get your first look at the shiny new interface for BioMart!

People reading this blog know that we have made great progress on many fronts in the war on cancer.  But there’s an awful lot we don’t know yet.  The ICGC network of researchers plans to change that.  This international group of researchers has organized and standardized an effort to learn about tumors.  From their homepage:

ICGC Goal: To obtain a comprehensive description of genomic, transcriptomic and epigenomic changes in 50 different tumor types and/or subtypes which are of clinical and societal importance across the globe.

Check that out:

  • 50 tumor types.  Oh–and by the way–they will also obtain a normal tissue same from the same individual so you can see what’s part of the normal constitution and what has changed in the tumor.
  • Hundreds of samples of that tumor type.  Except for some rare tumors, they intend to obtain 500 of each tumor.
  • More than a dozen types of cancer. Breast, lung, brain, pancreas, liver, leukemia…and on and on.
  • Genomic. Transcriptomic. Epigenomic.  Each of these is a separate data set that needs to be obtained.  Oh, and already there are simple variations (small numbers of nucleotides), CNVs, structural re-arrangements, expression data….And that’s just the initial release.

Are you overwhelmed yet?  50 x 500 x more than a dozen x 3+ types of data (and that’s just back-of-the-napkin, there’s more…)  I am daunted just thinking about the scale of this.

They have organized and standardized the protocols, technologies, data collection, data submissions, and more.  You should check out their marker paper for a complete description of their framework.  They are going to make 2 types of data available: open access data that is de-identified.  And there is a controlled access data set with clinical details that you’ll have to register for access to.

Do note though: the data (like all these large data projects) is subject to data usage policies that you need to be aware of.  There is a publication moratorium that enables the data submitters a window to publish their findings before others are allowed to publish.  It’s that typical balance of rapid access to data + a non-scoop window for the data providers.  Be sure to familiarize yourself with the policies if you are going to use this data.

But let’s say you are ready for it–you understand the framework, you understand the usage policies–how do you get the data?  You use the very cool new interface for BioMart to do it!  This is your first opportunity to look at the GUI developed for BioMart v 0.8.  There’s more coming, this is an early version.  But that’s how you are going to be able to build great custom queries on the underlying data and pull it down.  You may be familiar with BioMart from any number of places now (Ensembl, Gramene, FlyBase, WormBase….more).  But this is the first implementation of the new look–you are going to want to check that out.

For this week’s Tip of the Week you’ll see the ICGC site, and a quick query of the initial data that is available in the Data Coordination Center (DCC).  But this is just an appetizer.  Brace yourselves–the deluge is coming.

A Nature News article offers a nice overview, but be sure to check out the full paper for the project details.

The International Cancer Genome Consortium site: http://icgc.org/

Oh, and this made me laugh:

Be sure to contact the ICGC team if you have any questions.  they want to help you to use this data, and will be happy to answer your questions.  Personally, I’m making it a mission to help them populate the FAQ–I’ve sent in questions.  And so far my answers have been quite speedy :)

Oy. The reference is longer than the blog post.  Sigh.

Hudson (Chairperson), T., Anderson, W., Aretz, A., Barker, A., Bell, C., Bernabé, R., Bhan, M., Calvo, F., Eerola, I., Gerhard, D., Guttmacher, A., Guyer, M., Hemsley, F., Jennings, J., Kerr, D., Klatt, P., Kolar, P., Kusuda, J., Lane, D., Laplace, F., Lu, Y., Nettekoven, G., Ozenberger, B., Peterson, J., Rao, T., Remacle, J., Schafer, A., Shibata, T., Stratton, M., Vockley, J., Watanabe, K., Yang, H., Yuen, M., Knoppers (Leader), B., Bobrow, M., Cambon-Thomsen, A., Dressler, L., Dyke, S., Joly, Y., Kato, K., Kennedy, K., Nicolás, P., Parker, M., Rial-Sebbag, E., Romeo-Casabona, C., Shaw, K., Wallace, S., Wiesner, G., Zeps, N., Lichter (Leader), P., Biankin, A., Chabannon, C., Chin, L., Clément, B., de Alava, E., Degos, F., Ferguson, M., Geary, P., Hayes, D., Hudson, T., Johns, A., Kasprzyk, A., Nakagawa, H., Penny, R., Piris, M., Sarin, R., Scarpa, A., Shibata, T., van de Vijver, M., Futreal (Leader), P., Aburatani, H., Bayés, M., Bowtell, D., Campbell, P., Estivill, X., Gerhard, D., Grimmond, S., Gut, I., Hirst, M., López-Otín, C., Majumder, P., Marra, M., McPherson, J., Nakagawa, H., Ning, Z., Puente, X., Ruan, Y., Shibata, T., Stratton, M., Stunnenberg, H., Swerdlow, H., Velculescu, V., Wilson, R., Xue, H., Yang, L., Spellman (Leader), P., Bader, G., Boutros, P., Campbell, P., Flicek, P., Getz, G., Guigó, R., Guo, G., Haussler, D., Heath, S., Hubbard, T., Jiang, T., Jones, S., Li, Q., López-Bigas, N., Luo, R., Muthuswamy, L., Francis Ouellette, B., Pearson, J., Puente, X., Quesada, V., Raphael, B., Sander, C., Shibata, T., Speed, T., Stein, L., Stuart, J., Teague, J., Totoki, Y., Tsunoda, T., Valencia, A., Wheeler, D., Wu, H., Zhao, S., Zhou, G., Stein (Leader), L., Guigó, R., Hubbard, T., Joly, Y., Jones, S., Kasprzyk, A., Lathrop, M., López-Bigas, N., Francis Ouellette, B., Spellman, P., Teague, J., Thomas, G., Valencia, A., Yoshida, T., Kennedy (Leader), K., Axton, M., Dyke, S., Futreal, P., Gerhard, D., Gunter, C., Guyer, M., Hudson, T., McPherson, J., Miller, L., Ozenberger, B., Shaw, K., Kasprzyk (Leader), A., Stein (Leader), L., Zhang, J., Haider, S., Wang, J., Yung, C., Cross, A., Liang, Y., Gnaneshan, S., Guberman, J., Hsu, J., Bobrow (Leader), M., Chalmers, D., Hasel, K., Joly, Y., Kaan, T., Kennedy, K., Knoppers, B., Lowrance, W., Masui, T., Nicolás, P., Rial-Sebbag, E., Lyman Rodriguez, L., Vergely, C., Yoshida, T., Grimmond (Leader), S., Biankin, A., Bowtell, D., Cloonan, N., deFazio, A., Eshleman, J., Etemadmoghadam, D., Gardiner, B., Kench, J., Scarpa, A., Sutherland, R., Tempero, M., Waddell, N., Wilson, P., McPherson (Leader), J., Gallinger, S., Tsao, M., Shaw, P., Petersen, G., Mukhopadhyay, D., Chin, L., DePinho, R., Thayer, S., Muthuswamy, L., Shazand, K., Beck, T., Sam, M., Timms, L., Ballin, V., Lu (Leader), Y., Ji, J., Zhang, X., Chen, F., Hu, X., Zhou, G., Yang, Q., Tian, G., Zhang, L., Xing, X., Li, X., Zhu, Z., Yu, Y., Yu, J., Yang, H., Lathrop (Leader), M., Tost, J., Brennan, P., Holcatova, I., Zaridze, D., Brazma, A., Egevad, L., Prokhortchouk, E., Elizabeth Banks, R., Uhlén, M., Cambon-Thomsen, A., Viksna, J., Ponten, F., Skryabin, K., Stratton (Leader), M., Futreal, P., Birney, E., Borg, A., Børresen-Dale, A., Caldas, C., Foekens, J., Martin, S., Reis-Filho, J., Richardson, A., Sotiriou, C., Stunnenberg, H., Thomas, G., van de Vijver, M., van’t Veer, L., Calvo (Leader), F., Birnbaum, D., Blanche, H., Boucher, P., Boyault, S., Chabannon, C., Gut, I., Masson-Jacquemier, J., Lathrop, M., Pauporté, I., Pivot, X., Vincent-Salomon, A., Tabone, E., Theillet, C., Thomas, G., Tost, J., Treilleux, I., Calvo (Leader), F., Bioulac-Sage, P., Clément, B., Decaens, T., Degos, F., Franco, D., Gut, I., Gut, M., Heath, S., Lathrop, M., Samuel, D., Thomas, G., Zucman-Rossi, J., Lichter (Leader), P., Eils (Leader), R., Brors, B., Korbel, J., Korshunov, A., Landgraf, P., Lehrach, H., Pfister, S., Radlwimmer, B., Reifenberger, G., Taylor, M., von Kalle, C., Majumder (Leader), P., Sarin, R., Rao, T., Bhan, M., Scarpa (Leader), A., Pederzoli, P., Lawlor, R., Delledonne, M., Bardelli, A., Biankin, A., Grimmond, S., Gress, T., Klimstra, D., Zamboni, G., Shibata (Leader), T., Nakamura, Y., Nakagawa, H., Kusuda, J., Tsunoda, T., Miyano, S., Aburatani, H., Kato, K., Fujimoto, A., Yoshida, T., Campo (Leader), E., López-Otín, C., Estivill, X., Guigó, R., de Sanjosé, S., Piris, M., Montserrat, E., González-Díaz, M., Puente, X., Jares, P., Valencia, A., Himmelbaue, H., Quesada, V., Bea, S., Stratton (Leader), M., Futreal, P., Campbell, P., Vincent-Salomon, A., Richardson, A., Reis-Filho, J., van de Vijver, M., Thomas, G., Masson-Jacquemier, J., Aparicio, S., Borg, A., Børresen-Dale, A., Caldas, C., Foekens, J., Stunnenberg, H., van’t Veer, L., Easton, D., Spellman, P., Martin, S., Barker, A., Chin, L., Collins, F., Compton, C., Ferguson, M., Gerhard, D., Getz, G., Gunter, C., Guttmacher, A., Guyer, M., Hayes, D., Lander, E., Ozenberger, B., Penny, R., Peterson, J., Sander, C., Shaw, K., Speed, T., Spellman, P., Vockley, J., Wheeler, D., Wilson, R., Hudson (Chairperson), T., Chin, L., Knoppers, B., Lander, E., Lichter, P., Stein, L., Stratton, M., Anderson, W., Barker, A., Bell, C., Bobrow, M., Burke, W., Collins, F., Compton, C., DePinho, R., Easton, D., Futreal, P., Gerhard, D., Green, A., Guyer, M., Hamilton, S., Hubbard, T., Kallioniemi, O., Kennedy, K., Ley, T., Liu, E., Lu, Y., Majumder, P., Marra, M., Ozenberger, B., Peterson, J., Schafer, A., Spellman, P., Stunnenberg, H., Wainwright, B., Wilson, R., & Yang, H. (2010). International network of cancer genome projects Nature, 464 (7291), 993-998 DOI: 10.1038/nature08987

Tip of the Week: MitoCheck, a human functional genomics database

7 April, 2010 (09:15) | Genomics News, Genomics Research, Genomics Resource News, New Resource | By: Mary

As much as I love computational aspects of biology, there are times when the sort of flat and binary nature of the discipline leaves me craving some more three-dimensional,  real live cellular work.  My background was in cell biology and microtubule-associated proteins before I moved to the computational side of biology.  And there are days when I would love to see more linkage between the digital and the dimensional.  And days when I’d love to look around in the scope at mitosis and mitotic spindles again.

Today I saw it.  And I’m going to show you where.  We’ll be looking at the MitoCheck database.  Below I’ll offer some discussion of the associated research papers, and in the movie I’ll show you how to navigate around the MitoCheck site a bit to find their data online.

It was actually coverage on the BBC* that tipped me off to this resource.  And then I went looking for more.  A press release on the work provided details and links.  And then the Nature News article added additional information.

In short, this group of researchers used a couple of different genomics approaches to examine what happens to HeLa cells when you mess with the mitotic apparatus and processes.  They transform cells with either RNA interference constructs, or GFP-tagged proteins, and film what happens to the cells over time.  They analyze the movies, and make all this data available in the MitoCheck resource.  As we say here in Boston–this is wicked cool.

But now, on to the papers:  these researchers have 2 articles out that talk about the work, one focused more on the RNAi approach, and a separate one on the tagged proteins.  I’ll address them separately below.

RNA interference experiments:

In this series of experiments, the MitoCheck team started with over 20,000 protein coding genes in humans, transformed HeLa cells with the siRNAs, and let the cells divide over a couple of days.  The nuclei of the cells could be illuminated by a GFP-histone protein that they had already placed in the cells.  They could light up the cells and film them, and monitor whether cell division looked normal or not.  They were able to identify a number of cases where things were going awry.  And they were going wrong in various ways.  Sometimes there was cell death.  Other times they could see a variety of phenotypes such as delayed mitosis, binuclear, poly-lobed, or “grape” looking aberrations.   Some cells were too large.  These could all be categorized, and compared, quantified, and are now stored as movies, processed data, and phenotypic assignment in the MitoCheck database.

I have some minor concerns about how knocked-down the transcripts are–they say that the values of the target mRNAs drop a lot, but these numbers vary quite a bit (the amount of supplemental data with the paper is excruciating….).  It’s also hard to be sure what that means for the protein levels at this point.  Also, HeLa cells have some characteristics that may not be average.  But that said–as a general method and a hunting license to find genes to assess in more detail, I think this is a very excellent strategy.  If I was still in the lab, I’d try the same thing with the cell system I used to study: C2C12 cells for muscle development.  You could track whether cell fusion and myotube formation was disrupted….man, sometimes I do crave the lab still….

Tagged protein experiments:

In a second paper from the research teams, they use a similar strategy of monitoring the behavior of cells during mitosis via movies.  But this time instead of knocking down a gene, they put a GFP tag on some selected proteins (mostly mouse proteins) that they put into the HeLa cells. These are stably-transfected tagged proteins on BACs, and they call this BAC TransgeneOmics (ahem, another -omics?).  They look for where these proteins end up in dividing cells.  Again, they have movies of this available now in their database. They also pull down protein complexes and look at them in more detail with other techniques.

Again, I have minor questions about the approach: mouse proteins in HeLa cells, and the bulky GFP tag affecting interactions, expression levels, etc.  But again, as a hunting-license sort of effort, I think this is a very neat way to move downstream from digital genomics to real cells.  And it’s worth it.  The team demonstrates that you can begin to characterize the functions of unknown proteins with this strategy.

So, for this week’s tip of the week I show you MitoCheck.  I’ll show how to access this data so you can take it further if you like.  One technical note: I did have all of the issues that they talked about in their “troubleshooting” document (PDF) on my Windows machine.  I had to do all 4 of the things they recommend in there to get the movies to run. FYI.

MitoCheck site: http://mitocheck.org/

Nature paper with RNAi data:
Neumann, B., Walter, T., Hériché, J., Bulkescher, J., Erfle, H., Conrad, C., Rogers, P., Poser, I., Held, M., Liebel, U., Cetin, C., Sieckmann, F., Pau, G., Kabbe, R., Wünsche, A., Satagopam, V., Schmitz, M., Chapuis, C., Gerlich, D., Schneider, R., Eils, R., Huber, W., Peters, J., Hyman, A., Durbin, R., Pepperkok, R., & Ellenberg, J. (2010). Phenotypic profiling of the human genome by time-lapse microscopy reveals cell division genes Nature, 464 (7289), 721-727 DOI: 10.1038/nature08869

Sciencexpress article with protein and complexes data:
Hutchins, J., Toyoda, Y., Hegemann, B., Poser, I., Heriche, J., Sykora, M., Augsburg, M., Hudecz, O., Buschhorn, B., Bulkescher, J., Conrad, C., Comartin, D., Schleiffer, A., Sarov, M., Pozniakovsky, A., Slabicki, M., Schloissnig, S., Steinmacher, I., Leuschner, M., Ssykor, A., Lawo, S., Pelletier, L., Stark, H., Nasmyth, K., Ellenberg, J., Durbin, R., Buchholz, F., Mechtler, K., Hyman, A., & Peters, J. (2010). Systematic Localization and Purification of Human Protein Complexes Identifies Chromosome Segregation Proteins Science DOI: 10.1126/science.1181348

*Tip of the hat to Alex who heard the BBC story and told me about it.  I owe you a cider.

Cerebellar Development Transcriptome Database

1 April, 2010 (10:59) | Genomics Resource News, New Resource | By: Mary

Announcement today from the MGI-mailing list on the CDT-DB:

Updated: Cerebellar Development Transcriptome Database (CDT-DB)

Dear Colleagues,

We are pleased to announce the release of the updated CDT-DB version 4.2.

* CDT-DB ver4.2: www.cdtdb.brain.riken.jp

CDT-DB ver4.2 provides you with valuable information on spatiotemporal gene expression profiles during mouse cerebellar development.

You can download “User’s Guide” (provisional) by clicking “Download” tab menu.

We hope that you find the enhanced functions (including data search, data display, data analysis, ontology search, hyperlinks to various bioinformatics) and website design to be useful for your study.

We would be grateful for any feedback you have. [they offer an email address that I don't want to offer to spammers--Mary]

Cheers.

CDT-DB

Looks like a nice resource, but I haven’t had a chance to go too deep at this point.  It also links to the Allen Brain Atlas data.  If you are interested in brain development, this looks like a resource you should know about.

Tip of the week: ResearchBlogging and PubGet

31 March, 2010 (09:01) | General Science, Genomics Research, New Resource, Tip of the Week | By: Mary

A couple of years back the OpenHelix team attended the first ScienceOnline science blogging conference (and the subsequent ones too).  The repercussions of this continue to affect what we do today. We got tips on effective blogging, we got leads on great tools, and we became part of the science blogging community–a chatty and helpful network of people who really want to communicate science more broadly and effectively.

One of the tools we learned about back then we continue to use regularly is ResearchBlogging.  I thought today I’d introduce this utility because we’ve found ourselves in conversations with people who would like to have some kind of mechanism to discuss research in their fields, but weren’t aware of this opportunity.

The short story: ResearchBlogging is a blog post aggregator (and more). If you blog on peer-reviewed research papers, you can obtain a little bit of code from the ResearchBlogging citation generator.  You register your blog, and use this code, the RSS feed sweep from ResearchBlogging detects your post and brings it over to the main site.  It also distributes it to other sites that host the widget with recent posts.  If you hang out at ScienceBlogs you’ve probably seen the widget on the right when reading blogs over there.  ResearchBlogging also automatically tweets your entry via Twitter.  Every time we use this, we see increased traffic from both the main site, from the widget, and from Twitter.

The longer story: ResearchBlogging is a community of science communicators.  Some of them are in your field, some are in far dispersed fields.  But they want to talk science.  They offer substantive discussion on papers they’ve read. Sometimes this is praise for the work, sometimes not.  Sometimes it opens the discussion to new ideas. Sometimes it is a launching point for further discussion in other directions.  The posts vary, of course.  Sometimes they are like having a discussion around the water cooler about some paper a colleague read.  Other times they are more like a journal club.  There are guidelines that describe the goals in more detail, and there is a community forum for discussions about it.  There’s also an editor’s selection: if your post is selected by the editors for the quality, even more people will see your work.  They also recently held a competition for quality in science blogging, and recognized many science bloggers who are taking science out of the journals and on to the web.

We use this often to discuss new software papers we’ve seen.  As great as papers are, especially for software we find we want to give a bit of a movie about the software and how it is used–so for us the paper is usually a launching point for a software tip.

Recently ResearchBlogging has also teamed with the PubGet folks.  PubGet is a cool type of literature search that can be integrated into your local journal subscription set, and it’s a speedy way to get access to PDFs you might want.  But the bonus piece is that if a paper in PubGet has been blogged in the ResearchBlogging system, a little icon indicates this.  So you can go look at what the science blogger had to say about that paper as well.

So for this week’s tip of the week I demonstrate the mechanics of how to get that bit of code from ResearchBlogging, using the DOI or digital object identifier, where to put it back on your blog, and then show how PubGet can lead you to cool discussions of papers that you might be interested in.

For more details:

ResearchBlogging help: http://researchblogging.org/static/index/page/help

ResearchBlogging guidelines: http://researchblogging.org/news/?p=53

PubGet: http://pubget.com/

OpenHelix page of stuff we’ve done with ResearchBlogging: http://researchblogging.org/blog/home/id/154