Video Tip of the Week: Aquaria, streamlined access to protein structures for biologists

This week’s Video Tip of the Week is Aquaria, a new resource for exploring protein structures, mutations, and similarities to other proteins. It’s a very well-designed and interactive experience for end users. It is aimed largely at biologists who could benefit from exploring the structural details of their proteins of interest, but are daunted by tools aimed at structural biologists. But for tool developers, you should also look at how this rollout went. It’s one of the best examples of a tool launch I’ve seen in this field. And I’ve seen a lot.

So first, the tool. Aquaria offers users a streamlined way to access and explore protein structures. Combining the kinds of information you get from the PDB structure resources, and additional details like the UniProt mutations. Currently you start with a basic search by asking for a protein by name, or PDB or UniProt ID. They have pre-calculated the relationships of proteins in PDB and Swiss-Prot to quickly offer you a structure and related proteins. The paper notes: “Currently, Aquaria contains 46 million precalculated sequence-to-structure alignments, resulting in at least one matching structure for 87% of Swiss-Prot proteins and a median of 35 structures per protein….” In addition, it lets you explore other important biological features such as InterPro domains, post-translational modifications, so you can think about how the mutations + structures + functions impact a given protein that you are interested in. As they describe it:

“We have loaded SNP data from Uniprot and Interpro so you can see where the mutations lie on your 3D model. And we have found that you may be pleasantly surprised to find your mutations clustering in 3D space!”

The Aquaria folks provided an intro video to get you started:

Another handy feature they provided is a Quick Reference Card with shortcuts to the functions [PDF]. In addition to this intro, they have a longer video as well. This is more like a typical lecture with the background, the framework, the goals of the project, and more about the underlying database.

Now, this thing about the rollout of this software project. I found it when I was looking over the talks at the upcoming VIZBI conference (Visualizing Biological Data). Every year I find there are awesome ideas that come out of VIZBI, and tools I want to explore. Among them this year is Aquaria. So I went looking for more detail, and found some of the traditional stuff. The paper (below), the press release, etc. And then I found the Reddit discussion. The Aquaria team did a Science AMA on this tool. It engaged a range of folks–some folks just fans of science who had probably never seen protein structures before. That’s fine with me–the more folks who appreciate research and learn about how researchers explore proteins is a good thing. But others had good technical questions for the team–such as other ways to find proteins of interest with sequence searches, or integration with other tools like UCSC Genome Browser. All the answers are over there. I enjoyed the question about the name of the tool:

It seems you get the ideas we had in mind: using Aquaria lets us observe these fascinating creatures (proteins) from the natural world. Aquaria creates an artificial environment and lighting where we can observe isolated proteins; like aquarium fish, proteins are often beautiful and (usually) live in water.

I asked them about how this played out, and they had ~1000 folks visit their site as a result of this Reddit event. That was really interesting to me, and a very neat route to drive awareness.

They also provided a way to support users with one of my other favorite resources–Biostars. They created a support thread there where uses can ask questions and get answers. https://www.biostars.org/t/aquaria/ I so prefer this to mailing lists, and I’m glad to see this easy method to get support. In fact, I asked something that I couldn’t quite figure out yet.prot_structure_sample (Here’s the protein I was looking at: http://aquaria.ws/P09616/7ahl/A I wanted to see all the subunits in full color, you de-select autofocus to do that. And color by chains for this version.)

Also, for the developer types: they offer a way for you to interact with the Aquaria software to add your own features of interest with their API. Maybe you have new mutations you have found in some sequence you’ve obtained in your lab, for example. They are offering guidance on that here: http://bit.ly/aquaria-features. They touch on this in the longer video (~27min) if you want a bit more explanation. I suspect from the high quality support they are offering, they’d be interested to hear from you and what features you’d like to see applied to these proteins as well.

So kudos to this team for a nifty tool and really serious multi-media outreach efforts. I think it was well done on all counts. I’ll bet you Reddit reached more of the right folks than a press release ever will. PIOs take note–get your scientists on Reddit.

Quick links:

Aquaria site: http://aquaria.ws/

Reddit Science AMA: https://www.reddit.com/r/science/comments/2w2jvw/science_ama_series_we_are_dr_sean_odonoghue_and/

Biostar support thread: https://www.biostars.org/t/aquaria/

O’Donoghue S.I., Kenneth S Sabir, Maria Kalemanov, Christian Stolte, Benjamin Wellmann, Vivian Ho, Manfred Roos, Nelson Perdigão, Fabian A Buske, Julian Heinrich & Burkhard Rost & (2015). Aquaria: simplifying discovery and insight from protein structures, Nature Methods, 12 (2) 98-99. DOI: http://dx.doi.org/10.1038/nmeth.3258

RCSB PDB free webinar

I just wanted to give a heads up to our free webinar on the RCSB Protein Databank. It’s going to be next Wednesday, at 11am Pacific US time. (that’s 18:00 UTC). It’ll be about an hour, and did I say it’s free?

Seating is limited, so register early.

You can see our announcement here: http://blog.openhelix.eu/?p=11055

and you can go register right now here: http://www.openhelix.com/cgi/webinars.cgi

If you can’t make the webinar, you can check out our training materials and tutorial here (also free, sponsored by RCSB PDB)

News Updates Related to the PDB (Protein Data Bank), etc.

This is a quick post to update you on events & news items that I’ve gathered that are associated with the Protein Data Bank and other protein resources.

  • We recently received a suggestion for the RCSB PDB through twitter:

    RT @27andaphd: @Comprendia @openhelix you know what? I really think the pdb should have a Firefox toolbar for struct bio ppl like me. cc @openhelix

Well , you are in luck, @27andaphd! I talked to someone on the PDB team and they have you covered with the toolbar, as described in this Summer 2010 newsletter item:  Search the RCSB PDB in Your Web Browser

  • The Structural Biology KnowledgeBase, or SBKB, is a sister database to the RCSB PDB, and last Thursday they released version 4.0 of the SBKB. I’ve been checking out the release & it looks good – they’ve organized categories of information and resources into “hubs”, such as a Structural Targets Hub, a Sequence, Structure, & Function Hub, a Methods Hub, and more. I’m in the process now of updating our sponsored (free) SBKB tutorial now, but you may want to check out the release & see what’s new.

  •  We’ll have more exciting news items soon, so stay tuned!

Quick links:
The Structural Biology KnowledgeBase (SBKB): http://www.sbkb.org/

The Protein Data Bank (PDB) http://www.pdb.org/

OpenHelix’s free introductory tutorial on the SBKB: http://www.openhelix.com/sbkb

OpenHelix’s free introductory tutorial on the PDB: http://www.openhelix.com/pdb

On a Mission for Protein Information

It’s probably just the human brain’s ability to connect dots  &  find patterns, but it can be interesting how many “unrelated” events and information bits accumulate in my head & eventually get mulled into an idea or theory. Take, for example, a recent biotech mixer, bits from an education leadership series & a past Nature article – each “event” has been meandering in my mind and now they are finding their way out as this blog post.

OK, now the explanation: At a recent local biotech event I heard about a company (KeraNetics) purifying keratin proteins & using them to develop therapeutic and research applications. The company & their research sounded very interesting & because a lot of it is aimed at aiding wounded soldiers, it also sounded directly beneficial. The talk was short, only about 20 minutes, so there wasn’t a lot of time for details or questions. I decided I’d venture forth through many of the bioscience databases and resources that I know and love, in order to learn more about keratin.

My quest was both fun and frustrating because of the nature of the beast – keratin is “well known” (i.e. it comes up in high school academic challenge competitions ‘a lot’, according to someone in the know), but is hard to work with (i.e. tough, insoluble, fibrous structural proteins) that is hard to find much general information on in your average protein database (because it is  made of many different gene products, all referred to as “keratin”). I decided to begin my adventure at two of my favorite protein resources, PDB & SBKB, but I found no solved structures for keratin. Because of the way model organism databases are curated and organized, I often begin a protein search there, just to get some basic background, gene names, sequence information, etc. I (of course) found nothing other than a couple of GO terms in the Saccharomyces Genome Database (SGD), but I found hundreds of results in both Mouse Genome Informatics (MGI) (660 genomic features) and Rat Genome Database (RGD) (162 rat genes, 342 human genes). I also found gene names (Krt*), sequences and many summary annotations with references to diseases with links to OMIM. When I queried for “keratin”, in OMIM I got 180 hits, including 61 “clinical synopsises”, in UniProt returned 505 reviewed entries and 2,435 unreviewed entiries, in Entrez Protein 10,611 results and in PubMed 26,430 articles with 1,707 reviews. I got my curiosity about KeraNetics’ research sated by using a PubMed advanced search for Keratin in the abstract or title & the PI’s name as author (search = “(keratin[Title/Abstract]) AND Van Dyke[Author]“).

I ended up with a lot of information leads that I could have hunted through, but it was a fun process in which I learned a lot about keratin. This is where the education stuff comes in. I’ve been seeing a lot of studies go by talking about reforming education to be more investigation driven, and I can totally see how that can work. “Learning” through memorization & regurgitation is dry for everyone & rough for the “memory challenged”, like me. Having a reason or curiosity to explore, with a new nugget of data or understanding lurking around each corner, the information just seems to get in better & stay longer. (OT, but thought I’d mention a related site that I found today w/ some neat stuff: Mind/Shift-How we will learn.)

And I could have done the advanced PubMed search in the beginning, but what fun would that have been? Plus there is a lot that I learned about keratin from what I didn’t find, like that there wasn’t a plethora of PDB structures for keratin proteins. That brings me to the final dot in my mullings – an article that I came across today as I worked on my reading backlog: “Too many roads not taken“. If you have a subscription to Nature you can read it, but the main point is that researchers are still largely focusing on the same set of proteins that they have been for a long time, because these are the proteins for which there are research tools (antibodies, chemical inhibitors, etc). This same sort of philosophy is fueling the Protein Structure Initiative (PSI) efforts, as described here. Anyway, I found the article interesting & agree with the authors general suggestions. I would however extend it beyond these physical research tools & say that going forward researchers need more data analysis tools, and training on how to use them – but I would, wouldn’t I? :)


  • Sierpinski P, Garrett J, Ma J, Apel P, Klorig D, Smith T, Koman LA, Atala A, & Van Dyke M (2008). The use of keratin biomaterials derived from human hair for the promotion of rapid regeneration of peripheral nerves. Biomaterials, 29 (1), 118-28 PMID: 17919720
  • Edwards, A., Isserlin, R., Bader, G., Frye, S., Willson, T., & Yu, F. (2011). Too many roads not taken Nature, 470 (7333), 163-165 DOI: 10.1038/470163a

Video Tip of the Week: VnD Resource for Genetic Variation and Drug Information

In today’s tip I am going to feature a resource that I found recently. I’ve been updating our dbSNP tutorial, which Mary & Trey will be presenting at workshops in Morocco, and also our free PDB tutorial, which is sponsored by the RCSB PDB team. I have therefore been thinking about protein structures and small sequence variations a lot lately. As I explored the latest Database issue of NAR looking for resources to do a tip on, I found an article describing the VnD (genetic Variation and Drug) resource, which can also be accessed at the URL www.vandd.org, according to the NAR article. The article is “VnD: a structure-centric database of disease-related SNPs and drugs“, and figure one shows a veritable Who’s Who of protein, variation and disease resources, so I had to investigate.

What I found at VnD made me sure that this was a resource that I wanted to feature in a tip. VnD is from the Korean Bioinformation Center, or KOBIC, who has a list of databases and tools that they provide. I’ll save the rest of the KOBIC resources for another post & concentrate on VnD here. Compiling data from resources such as RefSeq, OMIM, UniProt, PDB, DrugBank, dbSNP, GAD and more might have been cool enough, depending on how it was done, but the VnD also does their own structure modeling analysis on how the variation affects the protein structure and drug/ligand binding.

This tip movie isn’t long enough to really show you the breadth of what is available from the VnD, but I hope it will be enough to encourage you to read the NAR article (listed below), and to check out VnD. One thing to note: don’t expect to find every dbSNP rs# over there – one that I’ve been using in our tutorial isn’t over there. They are specifically interested in variations within genes that might effect drug binding. But hey, you can’t query DrugBank with rs#s, and I’ve never seen the structure modeling done like VnD, so it is a worthy resource that you may want to investigate if you are interested in how genetic variations connect with disease and drug therapies.

Quick links:

VnD: Variations and Drugs resource -  http://vnd.kobic.re.kr:8080/VnD/index.jsp

Korean Bioinformation Center (KOBIC) – http://www.kobic.re.kr/

RCSB PDB – http://www.pdb.org

OpenHelix Tutorial on the RCSB PDB – http://www.openhelix.com/pdb

dbSNP: Short Genetic Variations, from NCBI -  http://www.ncbi.nlm.nih.gov/projects/SNP/

OpenHelix Tutorial on NCBI’s dbSNP – http://www.openhelix.com/cgi/tutorialInfo.cgi?id=39

For links to other resources and OpenHelix tutorials mentioned in this post, please see our catalog of resources – http://www.openhelix.com/cgi/tutorials.cgi

Yang, J., Oh, S., Ko, G., Park, S., Kim, W., Lee, B., & Lee, S. (2010). VnD: a structure-centric database of disease-related SNPs and drugs Nucleic Acids Research, 39 (Database) DOI: 10.1093/nar/gkq957

Tip of the Week: From UniProt to the PSI SBKB and Back Again

It is often beneficial to visit multiple biomedical databases or resources, even if they seem to provide overlapping  information because no two resources focus on the exact same information, or present it in exactly the same way. Instead of duplicating each others’ curation efforts, database often link out to related information at other resources. You can think of these links as “social connections”, if you want and in today’s tip I want to show you a couple of connections between protein information resources, including a new connection that really features some of the core value of the PSI’s Structural Biology Knowledgebase, or SBKB.

I begin the tip at the UniProtKB, where I search for a UniProt ID number. From the resulting protein report I first briefly show you how to link out to a corresponding RCSB PDB report, where you can find high quality protein structure information and more. If you are interested in learning more about the RCSB PDB & how to use it, please check out OpenHelix’s full, free tutorial that is sponsored by the RCSB PDB.

From there I return to the UniProt report and demonstrate a new link out option that links to protein protocols, available materials, as well as information about theoretical models and predicted protein targets from the SBKB. I don’t have time to show it but a recent update to the SBKB allows users to now search the Structure Biology Knowledgebase with a UniProt accession number. These searches provide users with additional information including protein structure information and information about pre-released structure sequence. As with the RCSB PDB, we have a free tutorial on the SBKB that is sponsored by the Protein Structure Initiative.

As I scroll through the UniProt protein report users will see information and links for a wide variety of bioscience resources. OpenHelix, as I’m sure many of you are aware, has tutorials on how to use many of these resources. Our tutorials on the RCSB PDB and the PSI SBKB are both free. Our tutorials on UniProt and many other resources are available through a subscription to our database of trainings or through purchase of individual access. Whether you learn the resources through our tutorials, through the references I list below, or through your own explorations of the databases, there really is an amazing amount of information available through these interlinked, publicly-funded resources – please make use of them in your research!

Quick Links:

UniProt Knowledgebase -  http://www.uniprot.org/

OpenHelix Tutorial on UniProt – http://www.openhelix.com/cgi/tutorialInfo.cgi?id=77

RCSB PDB – http://www.pdb.org

OpenHelix Tutorial on the RCSB PDB – http://www.openhelix.com/pdb

The Protein Structure Initiative Structural Biology Knowledgebase (SBKB) -  http://www.sbkb.org/

OpenHelix Tutorial on the SBKB – http://www.openhelix.com/sbkb

Catalog of all OpenHelix tutorials – http://www.openhelix.com/cgi/tutorials.cgi

The UniProt Consortium. (2009). The Universal Protein Resource (UniProt) in 2010 Nucleic Acids Research, 38 (Database) DOI: 10.1093/nar/gkp846

Rose, P., Beran, B., Bi, C., Bluhm, W., Dimitropoulos, D., Goodsell, D., Prlic, A., Quesada, M., Quinn, G., Westbrook, J., Young, J., Yukich, B., Zardecki, C., Berman, H., & Bourne, P. (2010). The RCSB Protein Data Bank: redesigned web site and web services Nucleic Acids Research, 39 (Database) DOI: 10.1093/nar/gkq1021

Berman, H., Westbrook, J., Gabanyi, M., Tao, W., Shah, R., Kouranov, A., Schwede, T., Arnold, K., Kiefer, F., Bordoli, L., Kopp, J., Podvinec, M., Adams, P., Carter, L., Minor, W., Nair, R., & Baer, J. (2009). The protein structure initiative structural genomics knowledgebase Nucleic Acids Research, 37 (Database) DOI: 10.1093/nar/gkn790

Many Protein Resources Have Recently Announced Updates

PDB structure 3rg9





In our ongoing pursuit of up-to-date tutorials, I’ve been tracking changes that are occurring at resources and planning our updates accordingly. Protein resources are especially going to keep me out of trouble this summer, because their developers and curators have been busy! I’ve compiled a short synopsis below, and would appreciate comments on any other resources you know about, or want to brag about! :)

  • I featured the ExPASy list of proteomic tools in a past tip. As of  Tuesday this list is no longer being kept up-to-date, but the ExPASy resource has been expanded beyond being “just” a proteomics resource and is now the new SIB Bioinformatics Resource Portal. According to its developers, the portal:

    “provides access to scientific databases and software tools in different areas of life sciences including proteomics, genomics, phylogeny, systems biology, population genetics, transcriptomics etc. … On this portal you find resources from many different SIB groups as well as external institutions.”

    And never fear, there is still an up-to-date list of proteomics tools found here.

  • I mentioned in my tip last week that NCBI’s MMDB has undergone an update & I’ll be updating our tutorial on it soon.
  • NCI/Nature Pathway Interaction Database, or PID, had an update June 14th that includes new and updated pathway information.
  • PROSITE had an update June 21st, which is Release 20.73, and now includes 1618 documentation entries, 1308 patterns, 936 profiles and 925 ProRules.
  • The RCSB PDB resource has announced updates to their Browse Database function, enhanced sequence displays from structure summary pages and the PDB-101 educational resource available from blackboard logos on PDB pages. For more details on using PDB, please see our free PDB Introductory tutorial sponsored by the RCSB.
  • STRING’s 9.0 release is now available, and we’ll be looking into anything we need to update in our tutorial as a result.
  • UniProt released an update June 28th that included a major update on many bacterial and archaeal Type II Toxin-Antitoxin modules, as is described here.

Enjoy all the new information – I know I will! :)

Tip of the Week: New and Improved OMIM®

In the realm of bioinformatics resources, few are more venerable than OMIM®, Online Mendelian Inheritance in Man [well, originally not online, on index cards...]. For those who might be new to OMIM, it is a catalog of genes and their variations, and resulting phenotypes in human, with a more clinical perspective than some resources offer. As I was reviewing the history of OMIM for this post, I began to wonder if there even is any repository in genomics that’s been maintained on a computer framework longer. I know of an older protein analysis program that I wrote about once here–from Margaret Dayhoff and Robert Ledley. But as an ongoing repository or catalog that was stored, Victor McKusick wrote:

Mendelian Inheritance in Man has been maintained on the computer since 1964.

It was stored on a mainframe at Johns Hopkins at that time. The other one that I thought was probably close was RCSB PDB, which is described on their “about” page in this way:

The PDB was established in 1971 at Brookhaven National Laboratory and originally contained 7 structures.

It’s likely that in some form it existed on a computer system earlier than that–and may give MIM a run for the record. Bruno Strasser described 4 resources developed around the same time–1965–as the Cambridge Structural Database, MIM, Index Medicus, and Atlas of Protein Sequence and Structure.

It’s not easy to maintain and develop a resource for this long. Just this past month we learned about the risk of KEGG going away. But in bioinformatics–like biology–a resource needs to evolve or die. (Actually, I can remember in grad school that phrase was used by the chair of our department to describe what biology faculty needs to do as well.) In this week’s tip of the week, I report that OMIM is evolving, and I introduce you to the new interface.

Most people have encountered OMIM at the NCBI. But if you go over to there today, you’ll see this notice on the homepage:

This is because OMIM has a new home. It’s not clear at this time if the NCBI incarnation will be updated going forward. The OMIM team at JHU is requesting that software providers who serve links to OMIM now migrate those links to the new OMIM.org site, which the OMIM team considers to be the official site and will be the up-to-date one.

Let’s talk specifically about the evolved OMIM now: it’s entered a new century! Yay! The incredible deep collection of curated data over the decades still remains, but the new interface is very nice to use and to look at–it no longer looks like 1995 over there. There are also new handy links, and new search options, and new features still to come.

Compare this same record for the APC gene (a contributor to hereditary colon cancer) in both places:

Old OMIM at NCBI: http://www.ncbi.nlm.nih.gov/omim/611731

New OMIM at JHU: http://www.omim.org/entry/611731

I find the new page to be significantly tidier, don’t you? The links you need to other resources are still there, but you can toggle open the menus to find them now. And some of the links are to resources that weren’t available on the old page (for example, BioGPS and PharmGKB which we like very much!).  I’m also told that on appropriate pages there will be links to the DECIPHER resource.

You can still browse around the MIM map by clicking on the Advanced search for Gene Map link: http://www.omim.org/search/advanced/geneMap . I have done this on many days when I have found something intriguing in a chromosomal region and I want to see what was reported in that area and stored at OMIM.

Another feature that I think is very cool is the option to change the language with the Google Translate menu. I know it’s not perfect, but I’m finding increasingly that I want to read blog posts in other languages and I am finding it works pretty well. Making the OMIM data so easily accessible to non-English speakers is a really nice touch.

Although sometimes it is tough to transition to new software, I think this is a good sign. In addition to maintaining the excellent knowledge collection that began so long ago, a new interface means that OMIM is continuing to grow and change to meet the needs of today. And as we move forward to identify more and more genomic variations and alterations that impact human health, well-curated and deep knowledge bases like this are crucial.

Congrats to the OMIM team on the new look and new home.

Quick link to the new OMIM: http://www.omim.org/

(Bonus: did you know there’s also an Online Mendelian Inheritance in Animals OMIA? http://www.ncbi.nlm.nih.gov/omia )

Follow them on Twitter: http://twitter.com/#!/OmimOrg

McKusick, V. (2006). A 60-Year Tale of Spots, Maps, and Genes Annual Review of Genomics and Human Genetics, 7 (1), 1-27 DOI: 10.1146/annurev.genom.7.080505.115749

Amberger, J., Bocchini, C., & Hamosh, A. (2011). A new face and new challenges for online mendelian inheritance in man (OMIM®) Human Mutation, 32 (5), 564-567 DOI: 10.1002/humu.21466

Strasser, B. (2009). Collecting, Comparing, and Computing Sequences: The Making of Margaret O. Dayhoff’s Atlas of Protein Sequence and Structure, 1954–1965 Journal of the History of Biology, 43 (4), 623-660 DOI: 10.1007/s10739-009-9221-0 (PDF available on his faculty web site here.)

Strasser, BJ (2006) “Collecting and Experimenting: The moral economies of biological research, 1960s-1980s.”, Preprints of the Max-Planck Institute for the History of Science, 310, 105-23.