Tip of the Week: RCSB PDB Data Distribution Summaries


In today’s tip I will feature the data distribution summaries and their drill down features which you can see from many RCSB PDB searches. We are in the process of updating our full tutorial sponsored by the RCSB PDB team, and as part of that effort I’ve gotten to know and appreciate this new data presentation format. Over the last five years the RCSB PDB has really been working hard at redesigning their resource to be more easily accessed by a wide variety of users. Below you will find a recent citation from the group explaining all of their updates and the logic behind them. The paper is a good read because I won’t have time to do anything except scratch the surface of the redesign & you’ll get the details there, but also because¬† the intro also gives a great glimpse into what resources are dealing with in the way of ‘data deluge’. The increase in users AND data that the RCSB PDB has experienced over the last few years is mind boggling!

OK, back to the data distributions. To me these are really elegant ways of helping any user – PDB is by no means just for structural biologists – come to the RCSB PDB & quickly and easily access whole categories of interesting information and then drill down in detailed ways to access the specific structure or data that they are most interested in.¬† For example, I could begin with a keyword search for something as general as ‘kinase’. This search retrieves over 4 thousand hits, which could be quite daunting, but at the top of the report results are displayed under categories such as Organism, Taxonomy, Experimental Method, SCOP classification and more. Subcategories under each of these categories lets me know how many hits are, for example are a mixed Polymer type, are human hits, or are alpha and beta proteins. I can mouse over any subcategory title to find out the percent of hits in this category compared to all hits, or click on the title to further drill-down the data distribution on just that subcategory of results. The distribution summaries are updated to then focus specifically on the distribution of THOSE data. Using these summaries is much more intuitive than any text description description that I can muster.

My advice? Check out the tip, then check out the data distribution summaries, drill down utility, and all the other great features of the RCSB PDB & see how easy it is to find information on your favorite gene. Oh yea, and be watching for us to release our full, free & newly updated tutorial on the RCSB PDB resource soon!

ResearchBlogging.orgRose, P., Beran, B., Bi, C., Bluhm, W., Dimitropoulos, D., Goodsell, D., Prlic, A., Quesada, M., Quinn, G., Westbrook, J., Young, J., Yukich, B., Zardecki, C., Berman, H., & Bourne, P. (2010). The RCSB Protein Data Bank: redesigned web site and web services Nucleic Acids Research, 39 (Database) DOI: 10.1093/nar/gkq1021