Tag Archives: MGI

On a Mission for Protein Information

It’s probably just the human brain’s ability to connect dots  &  find patterns, but it can be interesting how many “unrelated” events and information bits accumulate in my head & eventually get mulled into an idea or theory. Take, for example, a recent biotech mixer, bits from an education leadership series & a past Nature article – each “event” has been meandering in my mind and now they are finding their way out as this blog post.

OK, now the explanation: At a recent local biotech event I heard about a company (KeraNetics) purifying keratin proteins & using them to develop therapeutic and research applications. The company & their research sounded very interesting & because a lot of it is aimed at aiding wounded soldiers, it also sounded directly beneficial. The talk was short, only about 20 minutes, so there wasn’t a lot of time for details or questions. I decided I’d venture forth through many of the bioscience databases and resources that I know and love, in order to learn more about keratin.

My quest was both fun and frustrating because of the nature of the beast – keratin is “well known” (i.e. it comes up in high school academic challenge competitions ‘a lot’, according to someone in the know), but is hard to work with (i.e. tough, insoluble, fibrous structural proteins) that is hard to find much general information on in your average protein database (because it is  made of many different gene products, all referred to as “keratin”). I decided to begin my adventure at two of my favorite protein resources, PDB & SBKB, but I found no solved structures for keratin. Because of the way model organism databases are curated and organized, I often begin a protein search there, just to get some basic background, gene names, sequence information, etc. I (of course) found nothing other than a couple of GO terms in the Saccharomyces Genome Database (SGD), but I found hundreds of results in both Mouse Genome Informatics (MGI) (660 genomic features) and Rat Genome Database (RGD) (162 rat genes, 342 human genes). I also found gene names (Krt*), sequences and many summary annotations with references to diseases with links to OMIM. When I queried for “keratin”, in OMIM I got 180 hits, including 61 “clinical synopsises”, in UniProt returned 505 reviewed entries and 2,435 unreviewed entiries, in Entrez Protein 10,611 results and in PubMed 26,430 articles with 1,707 reviews. I got my curiosity about KeraNetics’ research sated by using a PubMed advanced search for Keratin in the abstract or title & the PI’s name as author (search = “(keratin[Title/Abstract]) AND Van Dyke[Author]“).

I ended up with a lot of information leads that I could have hunted through, but it was a fun process in which I learned a lot about keratin. This is where the education stuff comes in. I’ve been seeing a lot of studies go by talking about reforming education to be more investigation driven, and I can totally see how that can work. “Learning” through memorization & regurgitation is dry for everyone & rough for the “memory challenged”, like me. Having a reason or curiosity to explore, with a new nugget of data or understanding lurking around each corner, the information just seems to get in better & stay longer. (OT, but thought I’d mention a related site that I found today w/ some neat stuff: Mind/Shift-How we will learn.)

And I could have done the advanced PubMed search in the beginning, but what fun would that have been? Plus there is a lot that I learned about keratin from what I didn’t find, like that there wasn’t a plethora of PDB structures for keratin proteins. That brings me to the final dot in my mullings – an article that I came across today as I worked on my reading backlog: “Too many roads not taken“. If you have a subscription to Nature you can read it, but the main point is that researchers are still largely focusing on the same set of proteins that they have been for a long time, because these are the proteins for which there are research tools (antibodies, chemical inhibitors, etc). This same sort of philosophy is fueling the Protein Structure Initiative (PSI) efforts, as described here. Anyway, I found the article interesting & agree with the authors general suggestions. I would however extend it beyond these physical research tools & say that going forward researchers need more data analysis tools, and training on how to use them – but I would, wouldn’t I? :)


  • Sierpinski P, Garrett J, Ma J, Apel P, Klorig D, Smith T, Koman LA, Atala A, & Van Dyke M (2008). The use of keratin biomaterials derived from human hair for the promotion of rapid regeneration of peripheral nerves. Biomaterials, 29 (1), 118-28 PMID: 17919720
  • Edwards, A., Isserlin, R., Bader, G., Frye, S., Willson, T., & Yu, F. (2011). Too many roads not taken Nature, 470 (7333), 163-165 DOI: 10.1038/470163a

Tip of the Week: Database of mouse databases

We are acutely aware of the thousands of bioinformatics resources out there, and we are often asked for guidance on finding a particular type of tool for some function or other.  There are some excellent lists out there which attempt to catalog the various tools–the NAR Database Issue and corresponding list, the Resource Collection at the Univ. of Pittsburg, and others.  But recently we saw one developed with a specific focus, which claims to bring together over 200 resources for the mouse.  The Mouse Resource Browser collects and categorizes a number of different types of things–not just databases, as we’ll see.  Find them here: http://bioit.fleming.gr/mrb

The curated collection of sites that may  be of use to mouse researchers has a number of features.  The developers used a questionnaire to elicit some information from the resource providers, and when they don’t have that input they have created some basic information for the records themselves. You can do a basic search for resources with a quick search box. There is an advanced search option.  I found the option of browsing by category (they have 22 categories) the most informative to figure out what kind of resources they had collected.

The data for a given record is organized across a series of tabs:

  • General: description, highlights and subject matter of the resource
  • Ontologies and Standards: if the resource relies on any of the important vocabularies or standards formats in the field, they are listed here
  • Technical: details of implementation, type of database, access methods, if there is a web services component, whether there are downloads or not
  • CASIMIR DDF: this is an interesting tab that assesses some of the features of the resources such as currency/updates, quality control process, versioning, technical documentation, user support, and more.

Although the focus is mouse, you’ll see some more broad types of resources in there.  For example, UCSC Genome Browser is listed as there is a mouse database there.  Reactome is listed.  These have a species range and include mouse, but are certainly not focused on mouses.  Other types of resources include commercial suppliers such as Charles River. So it isn’t limited just to things like sequence databases and things of that nature–it’s got more aspects that researchers employing mouse as a model system might find useful.

There are some choices they have made that I’m not sure I would have.  They list the MGI mailing list as a separate feature from MGI.  But as I thought more about it, I could see why.  There is good information there, and if you don’t know of it already a pointer might help.  But as I was thinking of the 200+ resources just for mouse, I thought that sort of affected the total.

If you use mouse as your model system, you will probably find some useful databases and other web sites that are handy for your work.  If you don’t work with mice, there are probably still some useful resources for your work as well.  Check out MRB’s site for more information: http://bioit.fleming.gr/mrb

Zouberakis, M., Chandras, C., Swertz, M., Smedley, D., Gruenberger, M., Bard, J., Schughart, K., Rosenthal, N., Hancock, J., Schofield, P., Kollias, G., & Aidinis, V. (2010). Mouse Resource Browser–a database of mouse databases Database, 2010 DOI: 10.1093/database/baq010

Phenotype resources and databases

For another project I’m on, I had to research some of the sources of information around phenotyping experimental animal models.  And just as I needed it, the RGD team produced a very nice screencast of access to the phenotyping data and resources that they offer about rat phenotyping.  If you are interested in that type of data, go have a look at their video and explore the resources that they introduce:  RGD Phenotyping Portal screencast.

If that is data you are interested in, you might also explore some of the other things I’ve been looking at.  The MGI team Jackson Laboratory has a division that focuses on mouse phenotype data as well.  Called Mouse Phenome Database (MPD), you can access quite a range of data that already exists. They also have protocols for phenotyping that may be useful to folks who are characterizing animal models.

I also came across a project based in the EU that offers data and standardized protocols for phenotyping.  EUMORPHIA offers the Empress SOPs that they are developing, as well as access to the Europhenome database that contains the results.

As much as I still love to look at genomes, I’m ready to move down the path and look at the phenomes too :)

Does anybody have a Cre-whatever mouse…?

By: Darryl Leja, NHGRI

By: Darryl Leja, NHGRI

Ever since I was a post-doc at Jax, I’ve been on the MGI mailing list. Some days it brings back memories.  Some days it brings laughter.  The other day I had a major problem with the spam filter because the discussion was about:  “Breeding from male with low sex drive” which, for obvious reasons, my mail filters thought was naughty.

But most often it is informative about research topics, mutant mice, and about resources that are useful either at MGI or other sites that mouse researchers like to use. Yesterday it was an announcement about a new segment of the MGI database: a Cre-Recombinase Portal.  One of the frequent questions on the list is “does anyone have a floxed mouse with _____ tissue expression….?” or “under control of ______ promoter….?”

The new portal will help researchers to find the right models.  The first part of the announcement mail (you’d have to go in and find Thursday’s mail; it has more details about how to us Creportal):

MGI has released a Cre Recombinase Data Portal (www.creportal.org) that specifically addresses the need to access cre expression and specificity data. Through this portal, users can access information about all existing cre transgenes and knock-ins. Data include the molecular description of the cre transgene or knock-in, the driver / promoter used, inducibility information, publications, and availability of cre mice through the IMSR. Detailed data, including annotated images showing cre activity / expression for the tissues analyzed are being added as available. Access to phenotypes displayed by cre-deleted mice is provided via integration with MGI’s phenotype data.

Currently, there are over 1,040 recombinase-containing transgenes or knock-in alleles cataloged.

Check it out if you or people you know need these mice.  Might save a lot of time if you can find the right mouse in the database rather than on the mailing lists….

Cre-Recombinase Portal: http://www.creportal.org/

New SNPs in the Mouse Phenome Database

From the MGI mailing list the other day came notice of 2 new SNP sets that have been added to the Mouse Phenome Database SNP collection (MPD).

From their note:

- Center for Genome Dynamics (CGD) – SNP data from Mouse Diversity Genotyping Array (CGD2). 582,000+ locations and 72 strains.

- Palmer A – SNP data, 8200+ locations, 58 strains (Chicago1)

I like the mouse SNP tools at MGI.  I covered them in my full tutorial that you can watch for free here.  But there’s a separate access point for mouse SNPs from the MPD interface as well.  Phenotype might be an interesting way to be thinking about your genes and topics of interest if you haven’t considered that before.  A lot of people start with their genes of interest and look under the flashlight, but maybe look around at phenotypes as a starting point for some new ideas and directions.

Tip of the week: A mouse for all reasons

knockoutmouse_tipAt first the title of this paper made laugh, as I am a major fan of Paul Scofield’s performance in A Man for All Seasons.  And then I remembered what happened to Thomas More.  Well, the analogy drops away for me there…. A Mouse for All Reasons by the International Mouse Knockout Consortium presented the framework and foundations of the project to knock out every single protein-coding gene in mice, generate the corresponding ES (embryonic stem) cells, and make them available for development of subsequent transgenic mice.  Some of these mice will go on to give their life for science in a noble manner, I guess–so maybe the analogy picks back up :)

The project has made tremendous progress since that paper was published, and there are a lot of knockouts you should know about if you are interested in using mouse as a model organism.

For this tip of the week we’ll explore the new portal for the International Mouse Knockout Consortium (IMKC), which used to be at the URL for the KOMP, or Knock Out Mouse Project. It appears that the groups referenced in the Mouse for All Reasons paper have now harmonized on to the knockoutmouse.org site, and use a single portal for access to the information and reagents.  There are a variety of ways to search: browsing genes, specific text searching, and even a BioMart interface for the portal.  This short movie takes a look at those pieces to introduce you to the site.

The announcement for this came over the MGI mailing list as this:

The IKMC web portal

The International Knockout Mouse Consortium (IKMC) has launched its official web site at www.knockoutmouse.org, formerly the URL for the Knockout Mouse Project (KOMP). This extended site, supported by the NIH and EU, now serves as the common web portal for access to information on knockout vectors, ES cells and mice available from the international high-throughput knockout projects: KOMP, EUCOMM, NorCOMM and TIGM. Stay tuned for future enhancements as the content continues to evolve. We welcome your comments and feedback. (Please email to contact@knockoutmouse.org).

This site is maintained by the I-DCC and the KOMP-DCC

(http://www.knockoutmouse.org/about) . Supported by the European Union (Project number: 223592) and the National Institutes of Health (Grant number: NIH HG004074).

The International Mouse Knockout Consortium (2007). A Mouse for All Reasons Cell, 128 (1), 9-13 DOI: 10.1016/j.cell.2006.12.018

Pointing us out at Genome.gov :)

ohonnhgripageNHGRI recently pointed out our new set of tutorials on model organism databases (funded mainly by NHGRI :) on their home page, genome.gov. Always nice to be recognized :D.

And it gives me the opportunity to again point out that we do indeed have seven publicly available tutorials and training materials (slides, exercises, etc) on model organism databases including SGD, RGD, MGI, WormBase, FlyBase and ZFIN… and a seventh on GBrowse, a generic genome browser used by some of these and other genome databases.

Check them out (and fill out the new poll to the left :D.

Free Tutorials on Model Organism Genomic Databases Released by OpenHelix

OpenHelix today announced the free availability of tutorial suites on model organism databases and resources used extensively in research. The first tutorial suites available are GBrowse, Rat Genome Database (RGD), Mouse Genome Informatics (MGI), and WormBase. To be added in the coming weeks are Zebrafish Information Network (ZFIN), FlyBase and Saccharomyces (Yeast) Genome Database (SGD).

The tutorial suites, funded in part by a grant from the National Human Genome Research Institute of the National Institutes of Health, include a self run, narrated tutorial introducing the resource and how to use its feature and functions. Each suite also includes PowerPoint slides, handouts, and exercises that can be used for reference or for training others.

One of the first tutorials available is on GBrowse, developed by the Generic Model Organism Database (GMOD) project, a popular tool used by researchers to develop genome browsers for model organisms, species of interest, and particular topics. By learning how to use this “generic” genome browser, you can leverage that knowledge to use dozens of resources devoted to a wide range of research areas.

“The OpenHelix GBrowse user tutorial is very well done and will be an excellent resource for the many research communities that use GBrowse to visualize genomic data,” said Dave Clements of the National Evolutionary Synthesis Center who runs the GMOD help desk.

Model organisms, such as yeast, mouse, rat, flies, and many others, have long been used by researchers to expand our understanding of biology and to assess the effectiveness and safety of therapies before going to human trial. Many of the genomes of these organisms have been completely sequenced, giving the scientific community even greater insight into the organisms and their relation to human biology. The genome data is now available and searchable on publicly available online databases and resources.

You can view the Model Organism tutorials at http://www.openhelix.com/model_organisms.shtml. OpenHelix provides over 60 other tutorial suites on a number of genomic databases and resources through an individual, group, or institutional subscription. Further information can be found at www.openhelix.com.

About OpenHelix
OpenHelix, LLC, (www.openhelix.com) provides the genomics knowledge you need when you need it. OpenHelix provides online self-run tutorials and on-site training for institutions and companies on the most powerful and popular free, web based, publicly accessible bioinformatics resources. In addition, OpenHelix is contracted by resource providers to provide comprehensive, long-term training and outreach programs.

GXD wants your input

From the MGI mailing list comes a request for your input on the Gene Expression Database. C’mon, fill it out if you use the site.

The Gene Expression Database for Mouse Development (GXD)
(http://www.informatics.jax.org/expression.shtml) is a well-established public database funded by NIH and constitutes an important component of the Mouse Genome Informatics (MGI) resource.

I would like to invite you to take our survey about the GXD resource.
The survey form is accessible through our web site, or directly at http://www.surveymonkey.com/s.aspx?sm=Er4G5wADG9gd4SGsEKwuNA_3d_3d

Your feedback and input is very important for the GXD project. Please take a few minutes of your valuable time to help us improve GXD.

Mouse KOMP, all over the browsers

An email from the MGI mailing list alerted me to some interesting new data on the browsers. The mouse KOMP project is generating knock-out mouse ES cells for every gene in the genome (well, that’s the goal anyway). This means you will be able to buy off-the-shelf mouse knockout cells for lots of regions you might want to study. You can grow ‘em up, and then breed ‘em with other mutants. You can characterize them in your favorite developmental stages and tissues. What a terrific reagent collection. In fact, if I was a post-doc, I would be looking for very interesting genes in this data set to pursue. You could start a whole career characterizing some of these beasts.

The email from MGI says:

The NIH funded Knockout Mouse Project (KOMP) is in full swing and reagents (vectors, ES cells, mice) from this project are becoming available to the research community.

Find out more about the KOMP project, which genes are being targeted, and which genes have reagents available for distribution by visiting the Knockout Mouse Project Data Coordination Center at http://www.knockoutmouse.org.

The UCSC genome browser (http://genome.ucsc.edu) now has a “KOMP Gene” track that shows which mouse genes are being targeted by the KOMP project. The tracks are linked to the KOMP Data Coordination Center site for the most recent information on project status and reagent availability.

KOMP gene information is also available in the Ensembl genome browser for mouse (http://www.ensembl.org/Mus_musculus/index.html) and will soon be available on the MGI Mouse Genome Browser (http://gbrowse.informatics.jax.org/cgi-bin/gbrowse/mouse_current/).

Of course I went looking for some examples. I found a couple to show you on the UCSC Genome Browser and I created sessions to share. If you would like to look at tracks that indicate the region of the knockout you can see this one that indicates the gene Xpr1 is knocked out and ready for you–see the bright green bar track in about the center of the page.

Another example is March4. Here blue and yellow tracks indicate a different status at the 2 groups performing the knockouts. Blue is “not started/on hold” and yellow is “in progress” according to the code on the details page (click the blue or yellow track to see that color code info on the description page).

Although it says the data is also in Ensembl I couldn’t find it–I have a query in to the help desk on that. Will let you know what I learn.

Edit: word back from ensembl = “KOMP data is in a track along with other KO alleles (EUCOMM and NorCOMM). In Ensembl it is called “Alleles”, available in the drop-down list in contigview.” But I’m looking in those same regions as I know data exists from the UCSC stuff and I still don’t see anything. You can go to those URLs but you’ll have to open the DAS sources menu and check the “alleles” box. March4 in ensembl. Xpr1 gives me an error message in the track (Error retrieving KO_vectors features (Can’t connect to the host!). Second attempt said No KO alleles in that section.

And it will be on the MGI GBrowse soon, too. I’ll try to find a sample of that when it is available too.