Tag Archives: KEGG

Video Tip of the Week: TargetMine, Data Warehouse for Drug Discovery

Browsing around genomic regions, layering on lots of associated data, and beginning to explore new data types I might come across are things that really fire up my brain. For me, visualization is key to forming new ideas about the relationships between genomic features and patterns of data. But frequently I want to take this to the next step–asking where else these patterns appear, how many other instances of this situation are there in a data set, and maybe adding additional complexity to the problem and refine the quest. This is not always easy to do with primarily visual software tools. This is when I turn to tools like the UCSC Table Browser, BioMart, and InterMine to handle some list of genes, or regions, or features.

We’ve touched on all of these before–sometimes with full tutorial suites (UCSC, BioMart), and sometimes as a Tip of the Week, InterMine and InterMine for complex queries. Learning about the foundations of these tools will let you use various versions or flavors of them at other sites. I love to see tools that are re-used for different topics when that’s possible, rather than building a whole new system. There are ModENCODE, rat, yeast mines, and more. This week’s tip is about one of those others–TargetMine is built on the InterMine foundation, with a specific focus on prioritizing candidate genes for pharmaceutical interventions. From their site overview, I’ll add this description they use: TargetMine

TargetMine is an integrated data warehouse system which has been primarily developed for the purpose of target prioritisation and early stage drug discovery.

For more details about their framework and philosophy, you should see their papers (linked below). The earlier one sets out the rationale, the data types, and the data sources they are incorporating. They also establish their place in the ecosystem of other databases in this arena, which helps you to understand their role.  But you should see the next paper for a really good grasp of how their candidate prioritization work with the “Integrated Pathway Clusters” concept they’ve added. They combined data from KEGG, Reactome, and NCI’s PID collections to enhance the features of their data warehouse system.

This week’s Video Tip of the Week highlights one of the tutorial movies that the TargetMine team provides. There’s no spoken audio with it, but the captions that help you to understand what’s going on are in English. I followed along on a browser with their example–they have a sample list to simply click on, and you can see various enrichments of the sets–pathways, Gene Ontology, Disease Ontology, InterPro, CATH, and compounds. They call these the “biological themes” and I find them really useful. You can create new lists from these theme collections. They also illustrate the “template” option–pre-defined queries with typical features people may wish to search. The example shows how to go from the list of genes you had to pathways–but there are other templates as well.

Another section of the video has an example of a custom query with the Query Builder. They ask for structural information for proteins targeted by acetaminophen. It’s a nice example of how to go from a compound to protein structure–a question I’ve seen come up before in discussion threads.

In their more recent paper (also below), they have some case studies that illustrate the concepts of prioritizing targets for different disease situations with their system.  They also expand on the functions with additional software to explore the pathways: http://targetmine.mizuguchilab.org/pathclust/ .

So have a look at the features of TargetMine for prioritization of candidate genes. I think the numerous “themes” are a really useful way to assess lists of genes (or whatever you are starting with).

Quick Links:

TargetMine: http://targetmine.mizuguchilab.org/ [note: their domain name has changed since the publications, this is the one that will persist.]

InterMine: http://intermine.github.io/intermine.org/

References:

Chen, Y., Tripathi, L., & Mizuguchi, K. (2011). TargetMine, an Integrated Data Warehouse for Candidate Gene Prioritisation and Target Discovery PLoS ONE, 6 (3) DOI: 10.1371/journal.pone.0017844

Chen, Y., Tripathi, L., Dessailly, B., Nyström-Persson, J., Ahmad, S., & Mizuguchi, K. (2014). Integrated Pathway Clusters with Coherent Biological Themes for Target Prioritisation PLoS ONE, 9 (6) DOI: 10.1371/journal.pone.0099030

Kalderimis A.,  R. Lyne, D. Butano, S. Contrino, M. Lyne, J. Heimbach, F. Hu, R. Smith, R. Stěpán, J. Sullivan & G. Micklem & (2014). InterMine: extensive web services for modern biology, Nucleic Acids Research, 42 (W1) W468-W472. DOI: http://dx.doi.org/10.1093/nar/gku301

NAR database issue (always a treasure trove)

The advance access release of most of the  NAR database issue articles is out. As usual, this this database issue includes a wealth of new and updated data repositories and analysis tools. We’ll be writing up additional more extensive blog posts on it and doing some tips of the week over the next couple months, but I thought I’d highlight the issue and some of the reports:

There are a lot of updates to many of the databases we know and love (links to go full text article): UCSC Genome Browser, Ensembl, UniProt, MINT, SMART, WormBase, Gene Ontology,  ENCODE, KEGG, UCSC Archaeal Browser, IMG/M, DBTSS, InterPro and others (we have tutorials on all those listed here).

And, as an indication of the explosion of data available (itself a subject of a database issue article, SRA), there are a lot of new(ish) databases on specific datatypes such as MINAS, a database of metal ions in nucleic acids (nice name :D); doRiNA, a database of RNA interactions in post-transcriptional regulation; BitterDB, a database of bitter compounds and well over 100 more.

And I’ll give a special shout out to my former PI at EMBL because I can, Peer Bork’s group has 4 databases listed in the issue: eggNOG, SMART, STITCH and OGEE. (and he and a couple members are on the InterPro paper also).

This is going to be a wealth of information to wade through!

UCSC Genome Browser: http://genome.ucsc.edu
Ensembl: http://www.ensembl.org/
UniProt: http://www.uniprot.org/
MINT: http://mint.bio.uniroma2.it/mint/
SMART: http://smart.embl.de/
WormBase: http://www.wormbase.org/
Gene Ontology: http://www.geneontology.org/
ENCODE: http://genome.ucsc.edu/ENCODE/
KEGG: http://www.kegg.jp
UCSC Archaeal Brower: http://archaea.ucsc.edu/
IMG: http://img.jgi.doe.gov/cgi-bin/w/main.cgi
DBTSS: http://dbtss.hgc.jp/
InterPro: http://www.ebi.ac.uk/interpro

 

sciseekclaimtoken-4ec6d4e6da3c3

sciseekclaimtoken-4ec6cf9447e17

Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

  • This is not a bad idea. Boston Sci-Geek Tours. I used to work for the Park Service. Hmmm… RT @YishaiKnobel: Fascinating tour and lecture on genomics at Broad Institute today.  Broad should be turned into a Boston tourist attraction. [Mary]
  • RT @BioCatalogue: The BioCatalogue iPhone/iPad app by @manniet3 is now out! http://bit.ly/p2tUQF Please do let us know what you think. [Mary]
  • Includes KEGG, iPATH 2.o, PathwayProjector, metaSHARK, MEGAN 4,  HUMAnN: RT @phylogenomics: A Survey of Metabolic Reconstruction Tools for Metagenomic Datasets » The Bioinformatics Knowledgeblog http://shar.es/HK0U5 [Mary]
  • A Special Symposium Celebrating the 40th Anniversary of the Protein Data Bank, October 28 – 30, 2011, Cold Spring Harbor, NY – Poster Abstract deadline: August 15 [Jennifer]
  • Oh yes, plz: RT @nutrigenomics: like RT @grapealope: Bioinformatics is not just about building tools. We know our tools; we should use them first. @atulbutte #singularityu [Mary]
  • Giggle. I’ve done this with several species, not just plants… : @pcronald: Botanist holds up the entire salad bar. http://onion.com/pojv2t [Mary]
  • RT @wahwahnyc: On PubMed Central: The PathOlogist: an automated tool for pathway-centric analysis. BMC Bioinformatics. http://1.usa.gov/oDyBpc [Mary]
  • Looking for some geeky fun? The Twenty-First 1st Annual Ig Nobel Prize ceremony will occur Thursday, September 29, 2011 and tickets are on sale now. Note: “The Ig Nobel Prizes honor achievements that first make people laugh, and then make them think.” [Jennifer]
  • RT @genetics_blog: . @PLoS and @mendeley_com Call for Apps: http://bit.ly/oc2NGL and http://bit.ly/nHYqNa [Mary]
  • Just saw this in Nature News about Google and Microsoft: Computing giants launch free science metrics [Jennifer]
  • FameLab, a science and engineering communication competition – I haven’t seen an uninteresting one yet… [Jennifer]

Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

  • Interesting story, but NOT what our legislation did, unfortunately: GINA does NOT cover life + long term care insurance. RT @gmopundit: RT @Genomengin: Genome power is about to sweep world: Nobel laureate http://t.co/CS41Rby via @theage [Mary]
  • RT @bffo: ICGC Data Coordination Center released version 5 of the ICGC data portal http://dcc.icgc.org/ #cancer #genomics #bioinformatics [Mary]
  • RT @westr: ISCB Honors Michael Ashburner and Olga Troyanskaya with Top Bioinformatics/Computational Biology Awards for 2011 http://bit.ly/ivsA2q [Mary]
  • Nature mentoring awards, this year in France. Nominations are due by June 27th, 2011. [Jennifer]
  • Sounds good–investigating now: RT @EinsteinMed: Einstein offers easy-to-use, open-source GenPlay #genome analyzer to scientific community http://ein.st/mkqCtg [Mary]
  • RT @davidweisss: RT @GenePattern: InSilicoDB, a sophisticated query engine for selecting GEO datasets and analyzing in @GenePattern: http://insilico.ulb.ac.be #bioinformatics [Mary]
  • Articles about how synonymous mutations possibly cause disease by altering miRNA regulation: The Sound of Silence and the original Crohn’s research article (subscription required for both) [Jennifer]
  • ‘nother day, ‘nother genome: RT @GenomeBrowser: Today we released the newest genome assembly of the green anole lizard, Anolis carolinensis (produced by @broadinstitute): anoCar2. [Mary]
  • KEGG 3D Mapping tool–here’s a sample image: http://bit.ly/iWTI3Q and you can see the recent changes indicated currently on the KEGG Mapper page, including the new Color Pathway features (with more planned). Hat tip to @d_kihara for retweeting @takujida ‘s item that I would have missed otherwise. [Mary]

KEGG needs your help

Seriously. This just came over the twitter, and here’s what we saw:

RT @karlkugler: What? That’s bad news RT @onertipaday KEGG will be available only to paid subscribers: http://bit.ly/lxyE8H #bioinformatics

I went over to look at the page, and it looks like it’s not a done deal yet. There may be a way to help, from the Plea to Support KEGG page (emphasis mine):

KEGG is now one of the most widely used biological databases in the world as indicated by the web access statistics (150 to 200 thousand unique visitors per month) and the number of KEGG paper citations (one thousand per year). I intend to ensure that KEGG remains a freely available web resource. However, this will be possible only with your support. First, I would like to ask all of you who have benefited from KEGG to write, email, tweet, and blog about your support for KEGG. I hope, in the long run, your voices will increase our chances of getting more stable funding.

Talk this up!

KEGG was one of the earliest computational /database resources I can remember using in my career. And it’s everywhere–you know that there are going to be links to KEGG, and that you can expect quality information there. I can’t imagine a world without it….

Tip of the Week: BioGPS for expression data and more

This week’s tip introduces BioGPS, or Gene Portal System. We get a lot of questions about two things that BioGPS can help you to tackle: what do I do with a list of genes to find out what they are? And the next question people have after that is: and where are they expressed? BioGPS can help you with both of those problems. It is a tool that integrates and displays many types of data that researchers would be interested in. It also allows you to customize your display with the types of data that are most relevant to you–using their extensive plug-in collection. And it can do so from your browser, or access the basic portal from your iPhone!

Recently there was a question at BioStar about ways to quickly access some human gene expression data. The top rated answer over there was BioGPS, so we thought we’d provide a look at the kinds of things available to users via BioGPS. This 5-minute movie introduces some of the features.

Basically you can search for a gene or a list of genes, you can search with various types of IDs, you can search by keyword, or you can even search by genomic intervals. Your resulting list will quickly link you to all kinds of information from expression data, to annotation details and wikis, and more.  The results are provided in a handy default view with panels of information which may offer what you are looking for.

But you can go further with BioGPS using their customization and plug-in features. You aren’t tied to the default view. The system offers plug-ins: other tools can pipe their information over to BioGPS so you can use it within that framework. You can  register/create a login and then store views that are suited to your research needs.

At the time they wrote the paper provided below, they already had over 150 plug-ins available. As I write this today there are nearly 400 things you could bring in to supplement the views of the genes you are interested in. And the range of plug-ins is tremendous: interaction data sets, SNPs, phylogenetic data… The Figure 2 in their paper gives a partial list of the plug-ins at that time, and the categories they highlight include: literature searching (such as PubMed, iHop, patents, more), gene portals (such as Entrez Gene, UniProt, Gene Cards, more), genetics (dbSNP, HapMap, HuGE, more), pathway tools (KEGG, Reactome, STRING, more) and even reagent providers. But there are more now, and it looks like more will continue to be developed and added. It really depends on what you need and want to display for your searches. You can browse around or search the plug-in collection to explore what’s available to view.

There are other tools you can use to explore expression data specifically. We like the UCSC Gene Sorter for some types of queries. Of course the large repositories of GEO and ArrayExpress can offer expression data as well. But for some users the BioGPS portal may offer integration and customization features that will suit their research needs. Go over and check it out. Register, set up some views, and you’ll be finding all sorts of useful annotations for your genes or regions of interest.

Just to also quickly mention: you can do searches from your lab bench, or from seminars, with the iPhone version of BioGPS as well. I didn’t have time to cover that in the movie but there’s more information over at their site about the tool. I’ve got mine installed and I’ve found it handy during talks!

Quick links:

BioGPS homepage: http://biogps.gnf.org/ EDIT: has moved: http://biogps.org/

BioGPS iPhone app: http://biogps.gnf.org/iphone/

Reference:
Wu, C., Orozco, C., Boyer, J., Leglise, M., Goodale, J., Batalov, S., Hodge, C., Haase, J., Janes, J., Huss, J., & Su, A. (2009). BioGPS: an extensible and customizable portal for querying and organizing gene annotation resources Genome Biology, 10 (11) DOI: 10.1186/gb-2009-10-11-r130

Tip of the Week: PathCase for pathway data

We spend a lot of time exploring genomic data, variations, and annotations. But of course a linear perspective on the genes and sequences is not the only way to examine the data. Understanding the pathways in which genes and molecular entities interact is crucial to understanding systems biology.

There are a number of tools which can help you to visualize and explore this kind of data. KEGG is one of the most venerable tools in bioinformatics, BioCyc is well known and used, Reactome is one of our favorites. Recently NCBI BioSystems has come along, and the BioModels tool at EBI provides more data of this type as well. Pathway Interaction Database is another place to try. What you’ll find is that each one has different emphasis, species focus or data sets available, and different tools to use to graphically display the databases. The ways to customize or interact with the data will vary as well. So you may need to try several to find the one you want for your purposes.

But for today’s tip of the week I will highlight PathCase, a Pathways Database System from Case Western Reserve University. This is a  tool I’ve  had my eye on for a number of years, and they continue to add new features and data sets to their visualization and search interface which are very nicely done.

PathCase offers you several ways to browse and search for pathways, processes, organisms, and also molecular entities (such as ATP, ions, etc) as well as genes and proteins. It’s all integrated into the system, so when you find an item of interest you can move to the other related pieces.  For example, from the Pathways you can find genes and learn more about the genes. From genes you can load the pathways in which they participate.

When you have the pathway graphics loaded, you can interact with that pathway by clicking, dragging, re-organizing and more. Right-clicking offers more details about the items and ways to visualize the data. One option I didn’t have time to show in the movie is that you can use the H20/CO2 box to load up pathways that are linked to the one you are looking at and load those up, going even further along any route that you might be interested in. Here’s just a quick sample of that: from the NARS2 gene page I loaded the alanine pathway, and then added the fatty acid metabolism pathway. Now I can explore both of them with all the standard PathCase tools and understand many of their relationships. Once you start exploring these pathways you be amazed at how complex visualizations are possible.

So if you are interested in biological pathways, exploring them and representing them, check out PathCase.

PathCase site: http://nashua.cwru.edu/pathwaysweb/

Reference:
Elliott, B., Kirac, M., Cakmak, A., Yavas, G., Mayes, S., Cheng, E., Wang, Y., Gupta, C., Ozsoyoglu, G., & Meral Ozsoyoglu, Z. (2008). PathCase: pathways database system Bioinformatics, 24 (21), 2526-2533 DOI: 10.1093/bioinformatics/btn459

Guest Post: New features at CTD – Allan Peter Davis

This next post in our continuing semi-regular Guest Post series is from Allen Peter Davis, of Comparative Toxicogenomics Database (CTD) at Mount Desert Island Biological Laboratory (MDIBL). If you are a provider of a free, publicly available genomics tool, database or resource and would like to convey something to users on our guest post feature, please feel free to contact us at wlathe AT openhelix DOT com.

The Comparative Toxicogenomics Database (CTD) is a free, public resource that promotes understanding about the effects of environmental chemicals on human health.  Since Trey’s original Tip of the Week about CTD, we’ve added many new features we’d like to highlight.

* The redesigned CTD homepage makes navigation easier and more intuitive.  Check out the keyword quick search box on every page, and try the “All” setting to see the scope of information available at CTD.

* A new Data Status page uses tag clouds to display the updated content for that month.

* We are particularly pleased to announce new statistical analyses of CTD data.  Chemical pages now feature enriched Gene Ontology (GO) terms, garnered from the genes that interact with a chemical.  In this release, CTD connects over 5,000 enriched GO terms to more than 4,500 chemicals.  As well, now our inferred chemical-disease relationships are also statistically scored and ranked.  Both new features will help users explore and generate testable hypotheses about the biological effects of chemicals.

* GeneComps and ChemComps discover genes or chemicals with a similar toxicogenomic profile to your molecule of interest.  Learn more about this feature in our recent publication.

* Reactome data are now also included with KEGG, for a more comprehensive view of pathways affected by chemicals.

VennViewer and MyGeneVenn are new tools that compare datasets for chemicals, diseases, or genes (including your own gene list) using Venn diagrams to discover shared and unique information.  These two visualization tools are a nice accompaniment to our original Batch Query tool for meta-analysis.

* The FAQ section under the “Help” menu provides examples of how to maximize your experience with CTD.

* Download our Resource Guide (pdf link) to keep as a handy reference card for CTD.

From the homepage, you can also subscribe to our monthly email newsletter to keep current with CTD’s growing content and features.  You can always contact us to request curation of your favorite chemical or paper.  And with our new “Author Alert” email program, we’ll even contact you to let you know when we’ve curated data from one of your publications in CTD.

We strive to be the best possible resource of chemical-gene-disease networks for the biological community, so feedback and input from users are of great importance to us.

- Allan Peter Davis

Tip of the Week: WAVe, Web Analysis of the Variome

Today’s Tip of the Week is a short introduction to WAVe, or the Web Analysis of the Variome. The tool was recently introduced to us, and I’ve found it a welcome introduction to the tools available to the researcher to analyze human variation. This is apropos considering the recent paper we’ve been discussing on the clinical assessment of a personal genome (here, here and here) and that papers implications for personalized medicine and the use of online variation resources. WAVe also has introduced me to some additional tools I’ve either not been aware of, or haven’t used, which might be of use such as: LOVD (Leiden Open Variation Database), QuExT (Query Expansion Tool, also from the same developers as WAVe), and others. Of course there are also database information pulled in from Ensembl, Reactome, KEGG, InterPro, PDB, UniProt, NCBI and many others. Take some time to check it out.

Video Tip of the Week: Caleydo for gene expression and pathway visualization

Recently while watching the #bioinformatics tag on Twitter I saw Khader Shameer mention Caleydo.  I was instantly hooked at the very clever visualization strategy that they are using to provide more surface area for examining the data you are interested in viewing.  Their specific topics are pathways and gene expression, but it got me thinking about various data types that I would like to see connected in this way. This week’s Video Tip of the Week is about this sofware.

To skip right over to Caleydo and start trying it out, go here: http://www.caleydo.org/

Caleydo delivers a 3D representation of the expression and pathway data.  The main user interface has an area that is a box.  They call it a bucket, but in my head buckets are round, so I think of this as a box.  On the floor of the box you have a graphic.  But because you also have 4 interior surfaces of the box you have 4 more places to display and link the data.  You can have a heat map microarray representation on one side, and various pathways associated with the genes in that microarray on the other sides.

There’s a short systems biology Application Note in Bioinformatics that describes the framework and gives an overview of the tool.  But there’s also a more detailed paper over at their publication site that will get you started (that 2010 paper for the Visualization conference in Taipei).

My computer is a bit underpowered, but I was able to load their webstart version and begin to look around.  They provide some sample data you can select and examine.  For the movie this week, though, I was unable to load that and run the recording software at the same time.  So mostly it’s an introduction to the concept and the site.  You’ll have to go over and load it up yourself to try it out.  If the webstart version doesn’t work for you, there are a couple of other download options for different platforms.

The Caleydo team has also done a YouTube overview of the features that you can examine.

http://www.caleydo.org

So try out this visualization strategy and see what you think.  I really like the concept.

+++++++++++++++++++++

Streit, M., Lex, A., Kalkusch, M., Zatloukal, K., & Schmalstieg, D. (2009). Caleydo: connecting pathways and gene expression Bioinformatics, 25 (20), 2760-2761 DOI: 10.1093/bioinformatics/btp432