Tag Archives: RGD

Lior_RatVenn_sm

Video Tip of the Week: RGD’s OLGA tool, Object List Generator and Analyzer

Lior_RatVenn_smOne of the really persistent issues in genomics is how to either get a list of things, or handle a list of things. or the overlap among the things. I think that was one of the most popular topics we dealt with in the early days of OpenHelix, but it’s still a issue that people need to handle in various ways. Some of the most interesting solutions have been various organism Venn diagrams, and the Rat Genome one is a classic, modeled here by Lior Pachter. I’m certain the need to list and organize genome features won’t go away. So when I saw that the RGD folks had another tool to offer ways to do this, I put it right in my list of upcoming tips. And then the draft post got buried under a list of other things I had to do. But I wanted to get back to it–so here is their step-by-step guide to the OLGA tool they offer, as this week’s Video Tip of the Week.

OLGA stands for: Object List Generator and Analyzer tool. Their newsletter announcement describes it in more details.

OLGA is a straightforward list builder for rat, human and mouse genes or QTLs, or rat strains, using any (or all) of a variety of querying options.  The new tutorial video will walk you through the process of querying the RGD database using OLGA, including

  • how to perform a simple query in OLGA
  • how to further expand or filter your result set using additional criteria
  • how to change your query parameters on the fly to refine your result set
  • what options OLGA gives for analysis of your list once you have it.

You can get a list of items using various ontologies–maybe you want a specific type of receptor, for example, you can get a list of them. Or you can quickly create a list of genes in a certain genomic span. You can get the items that fall in a QTL. Or you can start with a list and get annotations. You can also look for overlaps among sets.

The video is a nice walk-through of how to construct your query and what you can access. One key feature is that it’s not just rat data as you might expect at RGD. Mouse and human data are also available.

You can create complex and clever queries, and link to all sorts of related data in very easy steps. Have a look at their resources, and their other videos for more help with different aspects of their collections.

Quick links:

RGD main site: http://rgd.mcw.edu/

OLGA directly: http://rgd.mcw.edu/rgdweb/generator/list.html

Reference:

Shimoyama, M., De Pons, J., Hayman, G., Laulederkind, S., Liu, W., Nigam, R., Petri, V., Smith, J., Tutaj, M., Wang, S., Worthey, E., Dwinell, M., & Jacob, H. (2014). The Rat Genome Database 2015: genomic, phenotypic and environmental variations and disease Nucleic Acids Research, 43 (D1) DOI: 10.1093/nar/gku1026

On a Mission for Protein Information

It’s probably just the human brain’s ability to connect dots  &  find patterns, but it can be interesting how many “unrelated” events and information bits accumulate in my head & eventually get mulled into an idea or theory. Take, for example, a recent biotech mixer, bits from an education leadership series & a past Nature article – each “event” has been meandering in my mind and now they are finding their way out as this blog post.

OK, now the explanation: At a recent local biotech event I heard about a company (KeraNetics) purifying keratin proteins & using them to develop therapeutic and research applications. The company & their research sounded very interesting & because a lot of it is aimed at aiding wounded soldiers, it also sounded directly beneficial. The talk was short, only about 20 minutes, so there wasn’t a lot of time for details or questions. I decided I’d venture forth through many of the bioscience databases and resources that I know and love, in order to learn more about keratin.

My quest was both fun and frustrating because of the nature of the beast – keratin is “well known” (i.e. it comes up in high school academic challenge competitions ‘a lot’, according to someone in the know), but is hard to work with (i.e. tough, insoluble, fibrous structural proteins) that is hard to find much general information on in your average protein database (because it is  made of many different gene products, all referred to as “keratin”). I decided to begin my adventure at two of my favorite protein resources, PDB & SBKB, but I found no solved structures for keratin. Because of the way model organism databases are curated and organized, I often begin a protein search there, just to get some basic background, gene names, sequence information, etc. I (of course) found nothing other than a couple of GO terms in the Saccharomyces Genome Database (SGD), but I found hundreds of results in both Mouse Genome Informatics (MGI) (660 genomic features) and Rat Genome Database (RGD) (162 rat genes, 342 human genes). I also found gene names (Krt*), sequences and many summary annotations with references to diseases with links to OMIM. When I queried for “keratin”, in OMIM I got 180 hits, including 61 “clinical synopsises”, in UniProt returned 505 reviewed entries and 2,435 unreviewed entiries, in Entrez Protein 10,611 results and in PubMed 26,430 articles with 1,707 reviews. I got my curiosity about KeraNetics’ research sated by using a PubMed advanced search for Keratin in the abstract or title & the PI’s name as author (search = “(keratin[Title/Abstract]) AND Van Dyke[Author]“).

I ended up with a lot of information leads that I could have hunted through, but it was a fun process in which I learned a lot about keratin. This is where the education stuff comes in. I’ve been seeing a lot of studies go by talking about reforming education to be more investigation driven, and I can totally see how that can work. “Learning” through memorization & regurgitation is dry for everyone & rough for the “memory challenged”, like me. Having a reason or curiosity to explore, with a new nugget of data or understanding lurking around each corner, the information just seems to get in better & stay longer. (OT, but thought I’d mention a related site that I found today w/ some neat stuff: Mind/Shift-How we will learn.)

And I could have done the advanced PubMed search in the beginning, but what fun would that have been? Plus there is a lot that I learned about keratin from what I didn’t find, like that there wasn’t a plethora of PDB structures for keratin proteins. That brings me to the final dot in my mullings – an article that I came across today as I worked on my reading backlog: “Too many roads not taken“. If you have a subscription to Nature you can read it, but the main point is that researchers are still largely focusing on the same set of proteins that they have been for a long time, because these are the proteins for which there are research tools (antibodies, chemical inhibitors, etc). This same sort of philosophy is fueling the Protein Structure Initiative (PSI) efforts, as described here. Anyway, I found the article interesting & agree with the authors general suggestions. I would however extend it beyond these physical research tools & say that going forward researchers need more data analysis tools, and training on how to use them – but I would, wouldn’t I? :)

References:

  • Sierpinski P, Garrett J, Ma J, Apel P, Klorig D, Smith T, Koman LA, Atala A, & Van Dyke M (2008). The use of keratin biomaterials derived from human hair for the promotion of rapid regeneration of peripheral nerves. Biomaterials, 29 (1), 118-28 PMID: 17919720
  • Edwards, A., Isserlin, R., Bader, G., Frye, S., Willson, T., & Yu, F. (2011). Too many roads not taken Nature, 470 (7333), 163-165 DOI: 10.1038/470163a

Tip of the Week: InterMine for mining “big data”

Integrating large data sets for queries within–and across–various collections is one of the arenas that has lately been pretty active in bioinformatics. As more and more “big data” projects yield huge numbers of data points and data types, this is only becoming more necessary.  I love to browse data, but there are times when a large-scale customized query is what you’ll want to make some broader discoveries.

Right now there are a number of resources and interfaces that I turn to for structured and customized queries of data collections. The UCSC Table Browser, BioMart, Galaxy–these are the ones I have my hands on almost continuously. But there is another warehouse and interface system that we’re seeing more and more: InterMine.

My first real encounter with InterMine was for the modENCODE data. There’s some really terrific data flowing out of that project now (I talked a bit about that recently here), and the interface and storage system they are using is InterMine.

FlyMine was the initial impetus for the “Mine” system. Some years back FlyMine was created as a warehouse and query system for the increasing amounts of fly data that was coming from various projects. The goal was to have a system powerful enough for bioinformatics + super users, but also a friendly yet powerful interface for bench biologists to use.

The initial paper described the basic components: a user interface with 3 primary components: a Quick Search that’s great for browsing; a Template library that lets users access some pre-defined standard or likely query types that they can tweak for their needs; and a fully customizable Query Builder for the most advanced access. Since this paper development has continued, and there are other new and cool features present as well.

Another big goal of the FlyMine effort was to be able to deal with lists. One of the most common questions we still get in workshops is: “I have a list of _____.  What’s the best way to deal with that?” FlyMine–and the InterMines in general–help people to query and manage their explorations with lists of stuff.

The MyMine feature of the InterMines is also a nice component. You can create a login and store things you want to have repeated access to: queries, lists, etc.

There are other people using InterMine for their systems too–a recent paper on TargetMine, for “Gene Prioritization and Target Discovery” is available, and might appear as an upcoming tip! Jennifer did a tip on YeastMine from SGD once as well.

But what triggered me to do this tip is that a letter came from the RGD mailing list last week that said this:

Effective Friday, May 20th, 2011 the MCW BioMart tool will be retired by RGD and the MCW Proteomics Center.  For mining rat data, we have found that the RatMIne tool is easier to use, more flexible and incorporates more types of data than BioMart.  In addition, RatMine includes analysis tools not found in BioMart, giving RatMine users a single, intuitive interface for both obtaining and analyzing data.

So they are moving fully to InterMine and retiring the Rat BioMart, exclusively using RatMine at their installation. So this tip of the week will explore InterMine, RatMine, and some other Mines. That’s a lot of ground to cover–but it’s probably worth your time to know about InterMine as it becomes more broadly available.  It’s also important to understand how to query with the Mines if you want to bring the data to Galaxy for further analysis. If you visit Galaxy you’ll see that their “Get Data” section lets you access Mine tools–but you still need to know how to do the basic queries at the host site first.

Although this tip will touch on RatMine, the focus is the more general InterMine suite. RGD also said this in their notice:

For an overview of RatMine and how to use it, go to the RGD tutorial video, “An Introduction to the RatMine Database”, at http://rgd.mcw.edu/wg/home/rgd_rat_community_videos/an-introduction-to-the-ratmine-database2.  Alternatively, follow the “self-guided tour” of RatMine by clicking the “Take a tour” link at the top of any RatMine page.

To try out RatMine for yourself, go to http://ratmine.mcw.edu/ and get started with simplified data mining and analysis.

So if you want to have more specific information about using RatMine, be sure to check out their introduction.

Quick Links:

InterMine: http://intermine.org/

RatMine: http://ratmine.mcw.edu/

modENCODE: http://www.modencode.org/

Galaxy: http://usegalaxy.org/

Reference:
Lyne, R., Smith, R., Rutherford, K., Wakeling, M., Varley, A., Guillier, F., Janssens, H., Ji, W., Mclaren, P., North, P., Rana, D., Riley, T., Sullivan, J., Watkins, X., Woodbridge, M., Lilley, K., Russell, S., Ashburner, M., Mizuguchi, K., & Micklem, G. (2007). FlyMine: an integrated database for Drosophila and Anopheles genomics Genome Biology, 8 (7) DOI: 10.1186/gb-2007-8-7-r129

Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

Tip of the Week: Ratmine

Ratmine is a ‘data warehouse’ that allows the user to construct queries across different areas of biological knowledge from SNPs to Pathways. It’s developed by the people at RGD and uses Intermine a project developed for Flymine and as part of a project between RGD, SGD and ZFIN to implement Intermine for these databases and ” develop new methods of interoperability for cross-organism research.” We’ve mentioned Intermine before and it’s also used in ModEncode Intermine is going to have to be a subject of a later post I think :).

This tip is actually a video done by the RGD group and one of those gems I’ve found at SciVee in our attempts to integrate our tips at SciVee (which will be coming). We occasionally will highlight a short tutorial done by someone else here at our tips (occasionally) and since I’ve found this gem and just got back from vacation in Florida :)…
Btw, while you are at it, you might want to check out this interesting set of tutorials on biomedical ontologies.

Phenotype resources and databases

For another project I’m on, I had to research some of the sources of information around phenotyping experimental animal models.  And just as I needed it, the RGD team produced a very nice screencast of access to the phenotyping data and resources that they offer about rat phenotyping.  If you are interested in that type of data, go have a look at their video and explore the resources that they introduce:  RGD Phenotyping Portal screencast.

If that is data you are interested in, you might also explore some of the other things I’ve been looking at.  The MGI team Jackson Laboratory has a division that focuses on mouse phenotype data as well.  Called Mouse Phenome Database (MPD), you can access quite a range of data that already exists. They also have protocols for phenotyping that may be useful to folks who are characterizing animal models.

I also came across a project based in the EU that offers data and standardized protocols for phenotyping.  EUMORPHIA offers the Empress SOPs that they are developing, as well as access to the Europhenome database that contains the results.

As much as I still love to look at genomes, I’m ready to move down the path and look at the phenomes too :)

Pointing us out at Genome.gov :)

ohonnhgripageNHGRI recently pointed out our new set of tutorials on model organism databases (funded mainly by NHGRI :) on their home page, genome.gov. Always nice to be recognized :D.

And it gives me the opportunity to again point out that we do indeed have seven publicly available tutorials and training materials (slides, exercises, etc) on model organism databases including SGD, RGD, MGI, WormBase, FlyBase and ZFIN… and a seventh on GBrowse, a generic genome browser used by some of these and other genome databases.

Check them out (and fill out the new poll to the left :D.

Tip of the Week: Model Organism Database tutorials

gbrowseFor the tip of the week today, we’d like to point out a number of new (free to you) tutorials on model organism database resources. These seven tutorials (include flash movie tutorial, slides for downloading, exercises and handouts) were partly funded by a NHGRI grant. We just put out a press release on this, but I thought the Tip of the Week would be a great place to introduce you to these tutorials. We have seven tutorials that are (or will soon be) publicly available (this link takes you to a list and links to all these tutorials). The first four available are on GBrowse, WormBase, RGD (Rat Genome Database) and MGI (Mouse Genome Informatics. GBrowse (the tutorial linked to here), was developed by the Generic Model Organism Database (GMOD) project and is a great tool to develop genome browsers for model and research organisms. Many model organism databases use GMOD resources in full or part, including many of the ones we have tutorials on here. Three more will be coming very soon on ZFIN (Zebrafish), FlyBase (Drosophila) and SGD (yeast). Check them out :).

Free Tutorials on Model Organism Genomic Databases Released by OpenHelix

OpenHelix today announced the free availability of tutorial suites on model organism databases and resources used extensively in research. The first tutorial suites available are GBrowse, Rat Genome Database (RGD), Mouse Genome Informatics (MGI), and WormBase. To be added in the coming weeks are Zebrafish Information Network (ZFIN), FlyBase and Saccharomyces (Yeast) Genome Database (SGD).

The tutorial suites, funded in part by a grant from the National Human Genome Research Institute of the National Institutes of Health, include a self run, narrated tutorial introducing the resource and how to use its feature and functions. Each suite also includes PowerPoint slides, handouts, and exercises that can be used for reference or for training others.

One of the first tutorials available is on GBrowse, developed by the Generic Model Organism Database (GMOD) project, a popular tool used by researchers to develop genome browsers for model organisms, species of interest, and particular topics. By learning how to use this “generic” genome browser, you can leverage that knowledge to use dozens of resources devoted to a wide range of research areas.

“The OpenHelix GBrowse user tutorial is very well done and will be an excellent resource for the many research communities that use GBrowse to visualize genomic data,” said Dave Clements of the National Evolutionary Synthesis Center who runs the GMOD help desk.

Model organisms, such as yeast, mouse, rat, flies, and many others, have long been used by researchers to expand our understanding of biology and to assess the effectiveness and safety of therapies before going to human trial. Many of the genomes of these organisms have been completely sequenced, giving the scientific community even greater insight into the organisms and their relation to human biology. The genome data is now available and searchable on publicly available online databases and resources.

You can view the Model Organism tutorials at http://www.openhelix.com/model_organisms.shtml. OpenHelix provides over 60 other tutorial suites on a number of genomic databases and resources through an individual, group, or institutional subscription. Further information can be found at www.openhelix.com.

About OpenHelix
OpenHelix, LLC, (www.openhelix.com) provides the genomics knowledge you need when you need it. OpenHelix provides online self-run tutorials and on-site training for institutions and companies on the most powerful and popular free, web based, publicly accessible bioinformatics resources. In addition, OpenHelix is contracted by resource providers to provide comprehensive, long-term training and outreach programs.

Tip of the Week: RGD's GViewer

This week has been quite busy here at OpenHelix and the U.S. For the tip of this week, as we do occasionally, I’m going to highlight a short tutorial tip done by someone else. This week’s is one done by the developers of RGD and showing the uses of the GViewer. This tutorial is at SciVee, as the description states:

The rat genome comes to life through the use of the Gviewer tool. This video will show you how to use this helpful tool within the RGD website at http://rgd.mcw.edu/. Genes, QTLs, and species syntenies of interest can all be visualized with ease as the Gviewer zooms in and navigates through the rat genome with a few clicks.