Tag Archives: string

Video Tip of the Week: Chromohub, annotated trees of chromatin-mediated signaling

Today’s tip of the week is a quick introduction to ChromoHub. ChromoHub is an annotated phylogeny of chromatin-mediated signaling genes. As the ChromoHub site says these are “genes involved in writing, reading and erasing the histone code.” These are epigenetic modifications that emerging as target classes for future drug therapies.

ChromoHub maps annotated information about these genes onto a phylogeny of the genes where the researcher can find a wealth of information. The information one can find ranges from cancer data, SNPS, protein structure, protein-protein interactions to PubMed and funding information. There is a lot of information to view.

Today’s tip introduces you to the tool and how to add and view the annotations. There is a lot more at ChromoHub. You can suggest data that the developers have missed and download the information, alignment files and images and more.

ChromoHub was developed by SGC, the Structural Genomics Consortium. This is a private-public partnership that supports discovery of new medicines through open access research. ChromoHub is just one of the tools and resources developed by the consortium.

To find out more about the resource, check out the links and reference below.

Quick Links:

UCSC Genome Browser
Structural Genomics Consortium  (SGC)

ChromoHub Reference:

Liu L, Zhen S, Denton E, Marsden B, Schapira M. (2012). ChromoHub: a data hub for navigators of chromatin-mediated signalling. Bioinformatics DOI: 10.1093/bioinformatics/bts340 (open access)

Video Tip of the Week: eggNOG for the holidays (or to explore orthologous genes)

ResearchBlogging.org Who can resist a nice cup of eggnog for the holidays (especially with added brandy). I know I can’t. I make my grandpa’s recipe every December and, considering it uses tons of sugar, eggs, heavy cream and alcohol and that 1/2 & 1/2 is the lightest ingredient, only December.

Oh, that’s not what this tip is about, it’s about database of orthologous groups of genes, eggNOG. We’ve mentioned eggNOG before several times, but only in passing or in relation (orthologous? :D) to another database or tool. Today, in perfect timing for the season, thought I’d do a quick tip to introduce eggNOG.

eggNOG is brought to you by the same research group that developed a lot of other excellent tools such as SMART (protein domains), STRING (protein-protein interactions, STITCH (protein-chemical interactions) , iTOL and so much more. Of course they do some fascinating research too.

eggNOG is a relatively straightforward database to use, but it has a wealth of information you might want to check out. As the recent paper in NAR states:

Orthologous relationships form the basis of most comparative genomic and metagenomic studies and are essential for proper phylogenetic and functional analyses…. Orthology, defined as homology via speciation, is a crucial concept in evolutionary biology and is essential for disciplines such as comparative genomics, metagenomics and phylogenomics. The concepts of orthology and paralogy, with the latter being defined as homology via duplication, have been used as a foundation to introduce the concept of clusters of orthologous groups: proteins that have evolved from a single ancestral sequence existing in the last common ancestor (LCA) of the species that are being compared, through a series of speciation and duplication events. Orthologous groups (OGs) have proven useful for functional analyses and the annotation of newly sequenced genomes  as orthologs tend to have equivalent functions.

eggNOG contains:

721 801 orthologous groups, encompassing a total of 4 396 591 genes…. from 1133 species.

For more about orthologous groups, methods used and pros and cons of methodology, you might want to check out the paper referenced below. They’ve included several informative and helpful reviews and references.

Right now, take a quick tour of what eggNOG can offer.

Quick Links:
STRING tutorial
SMART tutorial
Tip of the Week: iTOL

Powell, S., Szklarczyk, D., Trachana, K., Roth, A., Kuhn, M., Muller, J., Arnold, R., Rattei, T., Letunic, I., Doerks, T., Jensen, L., von Mering, C., & Bork, P. (2011). eggNOG v3.0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges Nucleic Acids Research DOI: 10.1093/nar/gkr1060

Video Tip of the Week: Phosida, a post-translational modification database

Over 2 years ago I did a tip of the week on Phosida (links to Phosida). Phosida is a database of phosphorylation, acetylation, and N-glycosylation data. Since the last tip, Phosida has undergone significant growth and some changes, including the addition of much more data (80,000 phosphorylation, acetylation and N-glycosylated sites from 9 different species) and tools (prediction and motif analysis). You can read more about those changes in this year’s NAR database issue article.


Today’s tip will revisit the database and redo a search that was done in the tip from 2009, this time using a protein search instead of a category search.
Gnad, F., Gunawardena, J., & Mann, M. (2010). PHOSIDA 2011: the posttranslational modification database Nucleic Acids Research, 39 (Database) DOI: 10.1093/nar/gkq1159

Quick link to Phosidahttp://www.phosida.com/

Below the fold you’ll find the text of the last tip of the week more information:

Continue reading

Many Protein Resources Have Recently Announced Updates

PDB structure 3rg9





In our ongoing pursuit of up-to-date tutorials, I’ve been tracking changes that are occurring at resources and planning our updates accordingly. Protein resources are especially going to keep me out of trouble this summer, because their developers and curators have been busy! I’ve compiled a short synopsis below, and would appreciate comments on any other resources you know about, or want to brag about! :)

  • I featured the ExPASy list of proteomic tools in a past tip. As of  Tuesday this list is no longer being kept up-to-date, but the ExPASy resource has been expanded beyond being “just” a proteomics resource and is now the new SIB Bioinformatics Resource Portal. According to its developers, the portal:

    “provides access to scientific databases and software tools in different areas of life sciences including proteomics, genomics, phylogeny, systems biology, population genetics, transcriptomics etc. … On this portal you find resources from many different SIB groups as well as external institutions.”

    And never fear, there is still an up-to-date list of proteomics tools found here.

  • I mentioned in my tip last week that NCBI’s MMDB has undergone an update & I’ll be updating our tutorial on it soon.
  • NCI/Nature Pathway Interaction Database, or PID, had an update June 14th that includes new and updated pathway information.
  • PROSITE had an update June 21st, which is Release 20.73, and now includes 1618 documentation entries, 1308 patterns, 936 profiles and 925 ProRules.
  • The RCSB PDB resource has announced updates to their Browse Database function, enhanced sequence displays from structure summary pages and the PDB-101 educational resource available from blackboard logos on PDB pages. For more details on using PDB, please see our free PDB Introductory tutorial sponsored by the RCSB.
  • STRING’s 9.0 release is now available, and we’ll be looking into anything we need to update in our tutorial as a result.
  • UniProt released an update June 28th that included a major update on many bacterial and archaeal Type II Toxin-Antitoxin modules, as is described here.

Enjoy all the new information – I know I will! :)

What’s the Answer? Open Thread (gene networks)

BioStar is a site for asking, answering and discussing bioinformatics questions. We are members of thecommunity and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those questions and answers here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

Question of the week:

I have 10 genes linked to a particular disease (for the sake of example say cancer).
I want to build a gene network for these 10 genes.
Any web based tools available which can do the job?

I immediately thought of GeneMania (publicly available OpenHelix tutorial) and STRING (tutorial, by subscription) . The best answer beat me too it, with a lot of other excellent tools. Go check it out.





Peer Bork wins 2009 award

Royal Society and Académie des sciences Microsoft Award was won by Peer Bork this year. The award is funded by Microsoft (250,000 euro) and is given to

recognise and reward scientists working in Europe who have made a major contribution to the advancement of science through the use of computational methods.

It was awarded to Peer Bork for his work on the human microbiome. Peer definitely deserves it, as does his lab.The science and scientists that come from the Bork group are stellar. Ok, so I have a personal interest in this: I worked in his lab for 4 years, from 1999-2003. It was one of the best experiences (science and personal) of my life. Also, BioByte Solutions, started by a Bork lab researcher, has helped put together our new free database and resource search (which we’ll be introducing next week).

Congratulations Peer! Now, what is he going to do with that 368,000 dollars?!

And let me use this opportunity to point out some of the great tools and databases developed by the Bork group:
STRINGAnalysis of known and predicted protein-protein interactions in all known genomes (OpenHelix Tutorial, by subscription)
STITCHDatabase of known and predicted interactions of chemicals and proteins.
SMARTDomain analysis (OpenHelix Tutorial, by subscription)
iTOLan online tool for the display and manipulation of phylogenetic trees.
XplorMedDataming in MedLine (OpenHelix Tutorial, by subscription)

And a whole lot more

Tip of the Week: Acytelome, String and a new database

phosida_thumbI recently read an article in Science entitled “Lysine Acetylation Targets Protein Complexes and Co-Regulates Major Cellular Functionswritten by Choudhary et al. The research uses “high-resolution mass spectrometry to identify 3600 lysine acetylation sites on 1750 proteins” and “demonstrate[s] that the regulatory scope of lysine acetylation is broad and comparable with that of other major posttranslational modifications.”

ResearchBlogging.orgI’m going to admit, I know little of acetylation as a regulatory mechanism, though after reading through the paper, I found this quite and interesting find and it suggests to me that genomics has a lot to offer in the advance in our understanding of regulation and evolution.

Three things jumped out at me though.

The first is minor. The authors use the term Acytelome. You can now add that to the huge list of -omics terms to keep straight :D.

acetalnetworkThe second is that they use STRING to complete an analysis of networked interactions of the proteins discovered in their study and the processes where they are found, as you can see in their figure.

I did my postdoc and some later research in the lab (Peer Bork, EMBL) that developed STRING, and I’ve created a tutorial on it, so any time it’s used, I’m interested :D. So, I went to Methods and Materials to see how the analysis was done. Though there was a decent explanation of the process, it was not enough for me to recreate the analysis. This is not a criticism of the paper or the authors, but of how papers are being published. More and more, papers include genomics analysis, but rarely are these reported in the research paper in the detail needed to easily reproduce the analysis. Projects like Galaxy (publicly available tutorial) and Taverna are filling that void, so I’d like to see more Methods and Materials sections include analysis histories and workflows. It definitely would help in the advancement of science.

And now to the tip of the week. The paper also refers to a new database (at least new to me, it’s at least two years old and was reported in “Phosida: management, structural and evolutionary investigation and prediction of phosphosites.“) called Phosida. The database “allows retrieval of phosphorylation and acetylation data of any protein of interest.” The Tip-of-the-Week today is a quick introduction to that database.

Choudhary, C., Kumar, C., Gnad, F., Nielsen, M., Rehman, M., Walther, T., Olsen, J., & Mann, M. (2009). Lysine Acetylation Targets Protein Complexes and Co-Regulates Major Cellular Functions Science, 325 (5942), 834-840 DOI: 10.1126/science.1175371

Tip of the week: Harvester, a "Swiss army knife" of bioinformatics

harvester.jpgThis week I’m going to introduce a tool that searches a whole bunch of resources for you with one single click. Harvester, from the Karlsruhe Institute of Technology, offers a really simple interface for searching. If your species is one of the ones collected in their search, you will find that Harvester will enable you to search a slew of databases with just one query–NCBI, UCSC, MINT, STRING and many others. The results will provide quick links to some databases, and some results pages will be embedded in one big web page that you can scroll down and overview really quickly. The embedded pages aren’t just summary text–they are the actual database pages in situ! You can see them and interact with them just as if you were on that site doing the search.

This 3 minute movie introduces you to Harvester. If you quickly need a summary of what’s in all the databases they collect, it is a very handy tool. It does remind me of a Swiss army knife–not earth shatteringly novel from an algorithmic perspective, but many useful tools pulled together in one place. Try it out!

Learn about protein-protein interactions.

Bioinformatics.org is a great organization and web site (disclosure: I’ve taught an online course with them :D) and they regularly have online course in the field of bioinformatics that are more in the theory and analysis area of bioinformatics (where ours is more in the use and access of resources). If you need bringing up to speed on protein-protein interactions, there is room in next week’s course on said subject.

We have training in several protein-protein interaction resources such as STRING, soon MINT, so this bioinformatics course seems a nice complement. To learn more about the course, follow me under the fold…

Continue reading

Speaking of the Bork lab…

In the previous post I briefly mentioned a paper coming out of the Bork lab at EMBL.

The lab just made public a new tool: STITCH, “a resource to explore known and predicted interactions of chemicals and proteins.” This is a sister project to STRING, a great tool for exploring the interactions of proteins