Tag Archives: protein

Tip of the Week: From UniProt to the PSI SBKB and Back Again

It is often beneficial to visit multiple biomedical databases or resources, even if they seem to provide overlapping  information because no two resources focus on the exact same information, or present it in exactly the same way. Instead of duplicating each others’ curation efforts, database often link out to related information at other resources. You can think of these links as “social connections”, if you want and in today’s tip I want to show you a couple of connections between protein information resources, including a new connection that really features some of the core value of the PSI’s Structural Biology Knowledgebase, or SBKB.

I begin the tip at the UniProtKB, where I search for a UniProt ID number. From the resulting protein report I first briefly show you how to link out to a corresponding RCSB PDB report, where you can find high quality protein structure information and more. If you are interested in learning more about the RCSB PDB & how to use it, please check out OpenHelix’s full, free tutorial that is sponsored by the RCSB PDB.

From there I return to the UniProt report and demonstrate a new link out option that links to protein protocols, available materials, as well as information about theoretical models and predicted protein targets from the SBKB. I don’t have time to show it but a recent update to the SBKB allows users to now search the Structure Biology Knowledgebase with a UniProt accession number. These searches provide users with additional information including protein structure information and information about pre-released structure sequence. As with the RCSB PDB, we have a free tutorial on the SBKB that is sponsored by the Protein Structure Initiative.

As I scroll through the UniProt protein report users will see information and links for a wide variety of bioscience resources. OpenHelix, as I’m sure many of you are aware, has tutorials on how to use many of these resources. Our tutorials on the RCSB PDB and the PSI SBKB are both free. Our tutorials on UniProt and many other resources are available through a subscription to our database of trainings or through purchase of individual access. Whether you learn the resources through our tutorials, through the references I list below, or through your own explorations of the databases, there really is an amazing amount of information available through these interlinked, publicly-funded resources – please make use of them in your research!

Quick Links:

UniProt Knowledgebase -  http://www.uniprot.org/

OpenHelix Tutorial on UniProt – http://www.openhelix.com/cgi/tutorialInfo.cgi?id=77

RCSB PDB – http://www.pdb.org

OpenHelix Tutorial on the RCSB PDB – http://www.openhelix.com/pdb

The Protein Structure Initiative Structural Biology Knowledgebase (SBKB) -  http://www.sbkb.org/

OpenHelix Tutorial on the SBKB – http://www.openhelix.com/sbkb

Catalog of all OpenHelix tutorials – http://www.openhelix.com/cgi/tutorials.cgi

The UniProt Consortium. (2009). The Universal Protein Resource (UniProt) in 2010 Nucleic Acids Research, 38 (Database) DOI: 10.1093/nar/gkp846

Rose, P., Beran, B., Bi, C., Bluhm, W., Dimitropoulos, D., Goodsell, D., Prlic, A., Quesada, M., Quinn, G., Westbrook, J., Young, J., Yukich, B., Zardecki, C., Berman, H., & Bourne, P. (2010). The RCSB Protein Data Bank: redesigned web site and web services Nucleic Acids Research, 39 (Database) DOI: 10.1093/nar/gkq1021

Berman, H., Westbrook, J., Gabanyi, M., Tao, W., Shah, R., Kouranov, A., Schwede, T., Arnold, K., Kiefer, F., Bordoli, L., Kopp, J., Podvinec, M., Adams, P., Carter, L., Minor, W., Nair, R., & Baer, J. (2009). The protein structure initiative structural genomics knowledgebase Nucleic Acids Research, 37 (Database) DOI: 10.1093/nar/gkn790

Tip of the Week: DomainDraw for quick motif diagrams

Bioinformatics resources can be really complex–sometimes daunting, heavily loaded with crucial data, and provide amazing visualization of large data sets and various features of the underlying data. And other times, that’s way more than you need. Overkill. Like aiming an elephant gun at a mosquito.

A couple of times in the last week I was noticing that questioners at BioStar wanted something more simple to just illustrate a few features of their genes/proteins of interest in a slide or figure format. They didn’t want to do them in PowerPoint, because they wanted some specificity in the locations–not just something freehand–but the UCSC Genome Browser has way too many things for what they wanted to show, for example.

If you have a known protein, one of my favorite tools to show a domain diagram is SMART. If you go and call up a protein like the sample one they offer TEC_Human:

When you are using SMART not only do you get these great diagrams, but you can link to details of the domains, and so a whole bunch of searches, and more. It also gives you a nice numerical output of the domains, which I’ll show here because I’m going to use in the example:

But there may be times when you want to show a diagram like this, but customized for your work. Maybe you made this protein without the SH2 domain.  Maybe you discovered a splice variant that lacks the BTK domain.  Or you want to show several homologs that have the domains in slightly different places. There are plenty of reasons to want a figure like that.

Although there were multiple answers at BioStar to solve this, I wanted to focus on one of the cool answers that was provided. The answer that solved the problem for the questioner was DomainDraw.

DomainDraw has been around for a while, but I can see it being quite useful for a long time. It can also be used for any organism–or even any synthetic construction. You can create a more complex file, but for simple drawings you can just enter a few parameters and draw a little protein. Using the numbers I had for TEC above, I drew this:

It’s not quite as striking as the SMART one, but it gets the job done. And you could very quickly tweak it to have smaller pieces, or remove domains, or whatever you might want to do to illustrate features of interest among splice variants or related proteins.

I wish there were more simple problem-solving sort of apps like this. If anyone has other handy little items like this, let me know in the comments. We are always looking for new ones to highlight in our weekly tips!

Quick link to DomainDraw:  http://domaindraw.imb.uq.edu.au/

Previous tip of MyDomains: http://blog.openhelix.eu/?p=679

OpenHelix Tutorial on SMART (subscription required)

BioStar questions related to this:

How to generate a simple isoform diagram

Protein domain display


Fink JL, & Hamilton N (2007). DomainDraw: a macromolecular feature drawing program. In silico biology, 7 (2), 145-50 PMID: 17688439

Many Protein Resources Have Recently Announced Updates

PDB structure 3rg9





In our ongoing pursuit of up-to-date tutorials, I’ve been tracking changes that are occurring at resources and planning our updates accordingly. Protein resources are especially going to keep me out of trouble this summer, because their developers and curators have been busy! I’ve compiled a short synopsis below, and would appreciate comments on any other resources you know about, or want to brag about! :)

  • I featured the ExPASy list of proteomic tools in a past tip. As of  Tuesday this list is no longer being kept up-to-date, but the ExPASy resource has been expanded beyond being “just” a proteomics resource and is now the new SIB Bioinformatics Resource Portal. According to its developers, the portal:

    “provides access to scientific databases and software tools in different areas of life sciences including proteomics, genomics, phylogeny, systems biology, population genetics, transcriptomics etc. … On this portal you find resources from many different SIB groups as well as external institutions.”

    And never fear, there is still an up-to-date list of proteomics tools found here.

  • I mentioned in my tip last week that NCBI’s MMDB has undergone an update & I’ll be updating our tutorial on it soon.
  • NCI/Nature Pathway Interaction Database, or PID, had an update June 14th that includes new and updated pathway information.
  • PROSITE had an update June 21st, which is Release 20.73, and now includes 1618 documentation entries, 1308 patterns, 936 profiles and 925 ProRules.
  • The RCSB PDB resource has announced updates to their Browse Database function, enhanced sequence displays from structure summary pages and the PDB-101 educational resource available from blackboard logos on PDB pages. For more details on using PDB, please see our free PDB Introductory tutorial sponsored by the RCSB.
  • STRING’s 9.0 release is now available, and we’ll be looking into anything we need to update in our tutorial as a result.
  • UniProt released an update June 28th that included a major update on many bacterial and archaeal Type II Toxin-Antitoxin modules, as is described here.

Enjoy all the new information – I know I will! :)

Tip of the Week: Update to NCBI’s Cn3D Viewer

As I say in the tip movie, I like to visit NCBI’s homepage & just roam around over there to get an idea of what’s new. They develop so many bioscience tools, algorithms & other resources that there’s always SOMETHING new. Today I found out that they have updated their Cn3D interactive 3D viewer software from version 4.1 to 4.3 – version 4.2 was a preview version released only in a bundle with CDTree software. The 4.3 version is a stand alone version that can communicate with CDTree, and that allows users some advanced features compared to the 4.1 version of the Cn3D software. It may be important to note for some of you that 4.3 is only offered for Windows & Mac. Users wanting to use a Unix version will have to continue using version 4.1, as explained here.

Before using the Cn3D software you must download it to your computer. Previously downloading the 4.1 version of Cn3D does not affect your ability download and use the 4.3 version. In fact you can use them side-by-side, if you wish. As I said, the Cn3D software is a 3D interactive viewer so once you download the software you will probably visit another database that provides protein structure views in a Cn3D format. In the tip I use the myosin VI structure summary page from NCBI’s Molecular Modeling Database (MMDB) as my example. I didn’t have time to show you how to access the page in the tip movie, but it is the exact page we use in our full MMDB tutorial so you can find all the details there. (NOTE: I’ll be updating the MMDB tutorial soon, so watch for that announcement later.)

Opening the myosin vi file in the Cn3D viewer allows you rotate the molecule, label it as you desire, or view it in stereo. The controls are pretty intuitive & you can click around & see how it affects your image. For example, under the “Style>Rendering Shortcuts” menu you can render the molecule as worms, tubes, wire, ball & stick or space filled using the . You can also go under the “Style>Coloring Shortcuts” and select to color the image by domains, residues, charge, and many other options. If you want more details, check out the Cn3D “Help” menu or the Cn3D citation (below), or work through their tutorial.

I know MMDB and GoMiner use the Cn3D viewer – are there other resources where you use the Cn3D software?


Baxevanis AD. (2008) Searching NCBI databases using Entrez. Curr Protoc Bioinformatics. 2008 Dec;Chapter 1:Unit 1.3. DOI: 10.1002/0471250953.bi0103s24

Wang Y, Geer LY, Chappey C, Kans JA, Bryant SH. (2000) Cn3D: sequence and structure views for Entrez. Trends Biochem Sci. 2000 Jun;25(6):300-2. DOI: 10.1016/S0968-0004(00)01561-9 (subscription required)

WhatsYourProblem to WhatsTheAnswer

Our “What’s Your Problem” post will be transitioning to a “What’s the Answer” post this week and going forward. BioStar is a site for asking, answering and discussing bioinformatics questions. We are members of the community and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every week we will be highlighting one of those questions and answers here in this thread. You can still ask questions in this thread, or you can always join in at BioStar.

BioStar Question of the Week:

What is a good ontology for experimental results If i want to publish experimental results, preferably via RDFa using a standardized ontology what would be a good source to use. I am thinking of a triple such as:
Protein X — Interacts with — Protein Y
Where the ontology would spell out “Interacts with”.

Highlighted Answer:

I would recommend formatting your data using the IMEx (International Molecular Exchange Consortium)curation guidelines. This will allow you to submit your data easily to any of the participant databases (DIP, MINT, INTACT, etc). IMEx uses The PSI (Proteomics Standards Initiative) Molecular Interactionscontrolled vocabulary. There is a PSI-MI XML/CV validator here.

Check out the other answers, or provide one if you have insights into the problem.

Tip of the Week: Introduction to Changes to NCBI’s Protein Database

In today’s tip I will introduce you briefly to the changes at NCBI’s Protein database. I highlighted that changes had been made in a Friday SNPets, and someone asked for more details. Our full updated tutorial will be much more complete than this short tip, so be watching for that to be completed in the near future – but for now, enjoy this tip & head over to NCBI to do some exploring of your own!

Video Tip of the Week: Allergen Atlas

When this week’s sweep through the literature led me to a new resource on a topic that we haven’t covered in the past, I was psyched.  But then it led me even further to find a bunch of other resources and strategies that I hadn’t been exposed to before.  And yet it is an area close to me–very close–but I didn’t know the bioinformatics resources taht were in place around it.

I’m allergic to peanuts.  And I come from this allergy family.  Mostly our allergies are different items–I’m the only peanut one.  But the rest of my family is loaded with them–eggs, pollens, pets, strawberries, and plenty more. Quite a range.  And Trey is allergic to coconut.  So we never have Thai food when we do training sessions because it could kill either of us :O

So I was fascinated to learn about allergen databases.  I’ll start with my look at Allergen Atlas (that started this quest) and then I’ll move on to mention some other resources I discovered on this hunt. This week’s Video Tip of the Week is on allergenicity resources:

Allergen Atlas: http://tiger.dbs.nus.edu.sg/ATLAS/ and the paper for it: http://www.ncbi.nlm.nih.gov/pubmed/19213741

IUIS: International Union of Immunological Societies http://www.iuisonline.org/

Allergome: http://www.allergome.org/

Allermatch: http://www.allermatch.org/ FAO/WHO sequence matching tool

Tip of the Week: Molecular INTeraction Database (MINT)

mint_thumbnail.jpgMINT, the Molecular INTeraction Database, is so much fun to use. I know–there is high-quality curated information from the scientific literature. And that’s the real point. But quite frankly, I just love to examine the protein-protein interactions in the MINT viewer. In this brief (about 3 minutes) exploration of some of the high-level features of MINT I will offer a taste of how fun and informative this resource is.
A team at the University of Rome brings MINT to you. Check it out here: http://mint.bio.uniroma2.it/mint/

But at just a few minutes, we can’t provide the full detail about how to understand the graphics and how to use the site most effectively. We have a full tutorial on MINT that you might want to examine if this is a tool you would want to use on a regular basis.

And for more detail on the background and goals of MINT you should check out their paper. From their abstract:

Over the past few years the number of curated physical interactions has soared to over 95000.

That’s a lot of MINT. If you are like me and the previous owners of your house planted mint, you’ll understand the scope :)

Tip of the week: SMART protein domain analysis

We find that many people are interested in learning more about the domains in their proteins of interest. A great tool to use to examine the domains, their architecture/organization, and their relationships is SMART, the Simple Modular Architecture Research Tool. It is developed and maintained by the Bork lab at EMBL (a source of many great tools). This appetizer movie may get you interested in examining your proteins of interest with the SMART analysis. We show you how to find hundreds of proteins with “transmembrane receptor” domains in about 3 minutes.

We have a complete tutorial on SMART that presents the information in much more detail.