Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…
From the PDBe mailing list: The 2012 Nobel Prize in Chemistry has been awarded to Robert J.
Lefkowitz and Brian K. Kobilka for their studies of G-protein–coupled receptors. You can learn more about this important family of receptors by exploring their structures within the PDB. http://pdbe.org/nobel2012
Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…
Nice video on RNA interference by Nature Reviews Genetics. You can access all of the featured RNAi multimedia links from this page, or go straight to the video on this page. [Jennifer]
Interesting, The Repertoire 10K (R10K) Project: RT @deannachurch: CG: go to http://t.co/rekf2Gkd for more information on joining the project! #AGBT [Mary]
And it’s not in the papers anymore… RT @genome_gov: Pachter: “My worst nightmare: the curse of deep sequencing” aka too much data. #AGBT [Mary]
Read a Nature Outlook on allergies from Nov. 2011 – lot of new philosophies & theories that I wasn’t aware of. Currently free full access is available to the Nature Allergy Outlook [Jennifer]
I just wanted to give a heads up to our free webinar on the RCSB Protein Databank. It’s going to be next Wednesday, at 11am Pacific US time. (that’s 18:00 UTC). It’ll be about an hour, and did I say it’s free?
This is a quick post to update you on events & news items that I’ve gathered that are associated with the Protein Data Bank and other protein resources.
We recently received a suggestion for the RCSB PDB through twitter:
RT @27andaphd: @Comprendia@openhelix you know what? I really think the pdb should have a Firefox toolbar for struct bio ppl like me. cc @openhelix
Well , you are in luck, @27andaphd! I talked to someone on the PDB team and they have you covered with the toolbar, as described in this Summer 2010 newsletter item: Search the RCSB PDB in Your Web Browser
The Structural Biology KnowledgeBase, or SBKB, is a sister database to the RCSB PDB, and last Thursday they released version 4.0 of the SBKB. I’ve been checking out the release & it looks good – they’ve organized categories of information and resources into “hubs”, such as a Structural Targets Hub, a Sequence, Structure, & Function Hub, a Methods Hub, and more. I’m in the process now of updating our sponsored (free) SBKB tutorial now, but you may want to check out the release & see what’s new.
It’s probably just the human brain’s ability to connect dots & find patterns, but it can be interesting how many “unrelated” events and information bits accumulate in my head & eventually get mulled into an idea or theory. Take, for example, a recent biotech mixer, bits from an education leadership series & a past Nature article – each “event” has been meandering in my mind and now they are finding their way out as this blog post.
OK, now the explanation: At a recent local biotech event I heard about a company (KeraNetics) purifying keratin proteins & using them to develop therapeutic and research applications. The company & their research sounded very interesting & because a lot of it is aimed at aiding wounded soldiers, it also sounded directly beneficial. The talk was short, only about 20 minutes, so there wasn’t a lot of time for details or questions. I decided I’d venture forth through many of the bioscience databases and resources that I know and love, in order to learn more about keratin.
My quest was both fun and frustrating because of the nature of the beast – keratin is “well known” (i.e. it comes up in high school academic challenge competitions ‘a lot’, according to someone in the know), but is hard to work with (i.e. tough, insoluble, fibrous structural proteins) that is hard to find much general information on in your average protein database (because it is made of many different gene products, all referred to as “keratin”). I decided to begin my adventure at two of my favorite protein resources, PDB & SBKB, but I found no solved structures for keratin. Because of the way model organism databases are curated and organized, I often begin a protein search there, just to get some basic background, gene names, sequence information, etc. I (of course) found nothing other than a couple of GO terms in the Saccharomyces Genome Database (SGD), but I found hundreds of results in both Mouse Genome Informatics (MGI) (660 genomic features) and Rat Genome Database (RGD) (162 rat genes, 342 human genes). I also found gene names (Krt*), sequences and many summary annotations with references to diseases with links to OMIM. When I queried for “keratin”, in OMIM I got 180 hits, including 61 “clinical synopsises”, in UniProt returned 505 reviewed entries and 2,435 unreviewed entiries, in Entrez Protein 10,611 results and in PubMed 26,430 articles with 1,707 reviews. I got my curiosity about KeraNetics’ research sated by using a PubMed advanced search for Keratin in the abstract or title & the PI’s name as author (search = “(keratin[Title/Abstract]) AND Van Dyke[Author]“).
I ended up with a lot of information leads that I could have hunted through, but it was a fun process in which I learned a lot about keratin. This is where the education stuff comes in. I’ve been seeing a lot of studies go by talking about reforming education to be more investigation driven, and I can totally see how that can work. “Learning” through memorization & regurgitation is dry for everyone & rough for the “memory challenged”, like me. Having a reason or curiosity to explore, with a new nugget of data or understanding lurking around each corner, the information just seems to get in better & stay longer. (OT, but thought I’d mention a related site that I found today w/ some neat stuff: Mind/Shift-How we will learn.)
And I could have done the advanced PubMed search in the beginning, but what fun would that have been? Plus there is a lot that I learned about keratin from what I didn’t find, like that there wasn’t a plethora of PDB structures for keratin proteins. That brings me to the final dot in my mullings – an article that I came across today as I worked on my reading backlog: “Too many roads not taken“. If you have a subscription to Nature you can read it, but the main point is that researchers are still largely focusing on the same set of proteins that they have been for a long time, because these are the proteins for which there are research tools (antibodies, chemical inhibitors, etc). This same sort of philosophy is fueling the Protein Structure Initiative (PSI) efforts, as described here. Anyway, I found the article interesting & agree with the authors general suggestions. I would however extend it beyond these physical research tools & say that going forward researchers need more data analysis tools, and training on how to use them – but I would, wouldn’t I?
Sierpinski P, Garrett J, Ma J, Apel P, Klorig D, Smith T, Koman LA, Atala A, & Van Dyke M (2008). The use of keratin biomaterials derived from human hair for the promotion of rapid regeneration of peripheral nerves. Biomaterials, 29 (1), 118-28 PMID: 17919720
Edwards, A., Isserlin, R., Bader, G., Frye, S., Willson, T., & Yu, F. (2011). Too many roads not taken Nature, 470 (7333), 163-165 DOI: 10.1038/470163a
What I found at VnD made me sure that this was a resource that I wanted to feature in a tip. VnD is from the Korean Bioinformation Center, or KOBIC, who has a list of databases and tools that they provide. I’ll save the rest of the KOBIC resources for another post & concentrate on VnD here. Compiling data from resources such as RefSeq, OMIM, UniProt, PDB, DrugBank, dbSNP, GAD and more might have been cool enough, depending on how it was done, but the VnD also does their own structure modeling analysis on how the variation affects the protein structure and drug/ligand binding.
This tip movie isn’t long enough to really show you the breadth of what is available from the VnD, but I hope it will be enough to encourage you to read the NAR article (listed below), and to check out VnD. One thing to note: don’t expect to find every dbSNP rs# over there – one that I’ve been using in our tutorial isn’t over there. They are specifically interested in variations within genes that might effect drug binding. But hey, you can’t query DrugBank with rs#s, and I’ve never seen the structure modeling done like VnD, so it is a worthy resource that you may want to investigate if you are interested in how genetic variations connect with disease and drug therapies.
Reference: Yang, J., Oh, S., Ko, G., Park, S., Kim, W., Lee, B., & Lee, S. (2010). VnD: a structure-centric database of disease-related SNPs and drugs Nucleic Acids Research, 39 (Database) DOI: 10.1093/nar/gkq957
It is often beneficial to visit multiple biomedical databases or resources, even if they seem to provide overlapping information because no two resources focus on the exact same information, or present it in exactly the same way. Instead of duplicating each others’ curation efforts, database often link out to related information at other resources. You can think of these links as “social connections”, if you want and in today’s tip I want to show you a couple of connections between protein information resources, including a new connection that really features some of the core value of the PSI’s Structural Biology Knowledgebase, or SBKB.
I begin the tip at the UniProtKB, where I search for a UniProt ID number. From the resulting protein report I first briefly show you how to link out to a corresponding RCSB PDB report, where you can find high quality protein structure information and more. If you are interested in learning more about the RCSB PDB & how to use it, please check out OpenHelix’s full, free tutorial that is sponsored by the RCSB PDB.
From there I return to the UniProt report and demonstrate a new link out option that links to protein protocols, available materials, as well as information about theoretical models and predicted protein targets from the SBKB. I don’t have time to show it but a recent update to the SBKB allows users to now search the Structure Biology Knowledgebase with a UniProt accession number. These searches provide users with additional information including protein structure information and information about pre-released structure sequence. As with the RCSB PDB, we have a free tutorial on the SBKB that is sponsored by the Protein Structure Initiative.
As I scroll through the UniProt protein report users will see information and links for a wide variety of bioscience resources. OpenHelix, as I’m sure many of you are aware, has tutorials on how to use many of these resources. Our tutorials on the RCSB PDB and the PSI SBKB are both free. Our tutorials on UniProt and many other resources are available through a subscription to our database of trainings or through purchase of individual access. Whether you learn the resources through our tutorials, through the references I list below, or through your own explorations of the databases, there really is an amazing amount of information available through these interlinked, publicly-funded resources – please make use of them in your research!
References: The UniProt Consortium. (2009). The Universal Protein Resource (UniProt) in 2010Nucleic Acids Research, 38 (Database) DOI: 10.1093/nar/gkp846
Rose, P., Beran, B., Bi, C., Bluhm, W., Dimitropoulos, D., Goodsell, D., Prlic, A., Quesada, M., Quinn, G., Westbrook, J., Young, J., Yukich, B., Zardecki, C., Berman, H., & Bourne, P. (2010). The RCSB Protein Data Bank: redesigned web site and web servicesNucleic Acids Research, 39 (Database) DOI: 10.1093/nar/gkq1021
Berman, H., Westbrook, J., Gabanyi, M., Tao, W., Shah, R., Kouranov, A., Schwede, T., Arnold, K., Kiefer, F., Bordoli, L., Kopp, J., Podvinec, M., Adams, P., Carter, L., Minor, W., Nair, R., & Baer, J. (2009). The protein structure initiative structural genomics knowledgebaseNucleic Acids Research, 37 (Database) DOI: 10.1093/nar/gkn790
In our ongoing pursuit of up-to-date tutorials, I’ve been tracking changes that are occurring at resources and planning our updates accordingly. Protein resources are especially going to keep me out of trouble this summer, because their developers and curators have been busy! I’ve compiled a short synopsis below, and would appreciate comments on any other resources you know about, or want to brag about!
I featured the ExPASy list of proteomic tools in a past tip. As of Tuesday this list is no longer being kept up-to-date, but the ExPASy resource has been expanded beyond being “just” a proteomics resource and is now the new SIB Bioinformatics Resource Portal. According to its developers, the portal:
“provides access to scientific databases and software tools in different areas of life sciences including proteomics, genomics, phylogeny, systems biology, population genetics, transcriptomics etc. … On this portal you find resources from many different SIB groups as well as external institutions.”
And never fear, there is still an up-to-date list of proteomics tools found here.
I mentioned in my tip last week that NCBI’s MMDB has undergone an update & I’ll be updating our tutorial on it soon.
In the realm of bioinformatics resources, few are more venerable than OMIM®, Online Mendelian Inheritance in Man [well, originally not online, on index cards...]. For those who might be new to OMIM, it is a catalog of genes and their variations, and resulting phenotypes in human, with a more clinical perspective than some resources offer. As I was reviewing the history of OMIM for this post, I began to wonder if there even is any repository in genomics that’s been maintained on a computer framework longer. I know of an older protein analysis program that I wrote about once here–from Margaret Dayhoff and Robert Ledley. But as an ongoing repository or catalog that was stored, Victor McKusick wrote:
Mendelian Inheritance in Man has been maintained on the computer since 1964.
It was stored on a mainframe at Johns Hopkins at that time. The other one that I thought was probably close was RCSB PDB, which is described on their “about” page in this way:
The PDB was established in 1971 at Brookhaven National Laboratory and originally contained 7 structures.
It’s likely that in some form it existed on a computer system earlier than that–and may give MIM a run for the record. Bruno Strasser described 4 resources developed around the same time–1965–as the Cambridge Structural Database, MIM, Index Medicus, and Atlas of Protein Sequence and Structure.
It’s not easy to maintain and develop a resource for this long. Just this past month we learned about the risk of KEGG going away. But in bioinformatics–like biology–a resource needs to evolve or die. (Actually, I can remember in grad school that phrase was used by the chair of our department to describe what biology faculty needs to do as well.) In this week’s tip of the week, I report that OMIM is evolving, and I introduce you to the new interface.
Most people have encountered OMIM at the NCBI. But if you go over to there today, you’ll see this notice on the homepage:
This is because OMIM has a new home. It’s not clear at this time if the NCBI incarnation will be updated going forward. The OMIM team at JHU is requesting that software providers who serve links to OMIM now migrate those links to the new OMIM.org site, which the OMIM team considers to be the official site and will be the up-to-date one.
Let’s talk specifically about the evolved OMIM now: it’s entered a new century! Yay! The incredible deep collection of curated data over the decades still remains, but the new interface is very nice to use and to look at–it no longer looks like 1995 over there. There are also new handy links, and new search options, and new features still to come.
Compare this same record for the APC gene (a contributor to hereditary colon cancer) in both places:
I find the new page to be significantly tidier, don’t you? The links you need to other resources are still there, but you can toggle open the menus to find them now. And some of the links are to resources that weren’t available on the old page (for example, BioGPS and PharmGKB which we like very much!). I’m also told that on appropriate pages there will be links to the DECIPHER resource.
You can still browse around the MIM map by clicking on the Advanced search for Gene Map link: http://www.omim.org/search/advanced/geneMap . I have done this on many days when I have found something intriguing in a chromosomal region and I want to see what was reported in that area and stored at OMIM.
Another feature that I think is very cool is the option to change the language with the Google Translate menu. I know it’s not perfect, but I’m finding increasingly that I want to read blog posts in other languages and I am finding it works pretty well. Making the OMIM data so easily accessible to non-English speakers is a really nice touch.
Although sometimes it is tough to transition to new software, I think this is a good sign. In addition to maintaining the excellent knowledge collection that began so long ago, a new interface means that OMIM is continuing to grow and change to meet the needs of today. And as we move forward to identify more and more genomic variations and alterations that impact human health, well-curated and deep knowledge bases like this are crucial.
Congrats to the OMIM team on the new look and new home.
Amberger, J., Bocchini, C., & Hamosh, A. (2011). A new face and new challenges for online mendelian inheritance in man (OMIM®) Human Mutation, 32 (5), 564-567 DOI: 10.1002/humu.21466
Strasser, B. (2009). Collecting, Comparing, and Computing Sequences: The Making of Margaret O. Dayhoff’s Atlas of Protein Sequence and Structure, 1954–1965 Journal of the History of Biology, 43 (4), 623-660 DOI: 10.1007/s10739-009-9221-0 (PDF available on his faculty web site here.)
The new tutorial reflects the many changes and enhancements on the RCSB PDB site, and includes a narrated on-line tutorial, PowerPoint slides, handouts, and exercises.
Bellevue, WA (PRWEB) April 12, 2011
The Research Collaboratory for Structural Biology (RCSB) Protein Data Bank (PDB) has partnered with OpenHelix to provide a revised and updated tutorial (http://www.openhelix.com/PDB) on its free web based resource for studying biological macromolecules (http://www.pdb.org).
The RCSB PDB provides a variety of tools and resources to use to study biological macromolecules. The PDB is the single worldwide repository of experimentally-determined 3D biological structures of proteins, nucleic acids and complex assemblies. As a member of the Worldwide PDB collaboration (wwpdb.org), the RCSB PDB curates and annotates PDB data, and presents basic and advanced search, display and visualization methods to access these data.
The new tutorial reflects the many changes and enhancements on the RCSB PDB site, including a new data drill-down and data summary feature, updated ligand features such as a download page, images and binding affinity data, new report types and visualization options, among many others.
The new training materials (at http://www.openhelix.com/pdb) include an online narrated tutorial that demonstrates: basic and advanced searches, how to generate reports, the different options for exploring individual structures, and many of the research and educational resources and tools available at the RCSB PDB. The approximately 60-minute tutorial, which runs in just about any browser, can be viewed from beginning to end or navigated using chapters and forward and backward sliders.
In addition to the tutorial, RCSB PDB users can also access useful training and teaching materials including the animated PowerPoint slides used as a basis for the tutorial, suggested script for the slides, slide handouts, and exercises. This can save a tremendous amount time and effort for teachers and professors to create classroom content.
About the RCSB PDB
The RCSB Protein Data Bank (http://www.pdb.org), administered by the Research Collaboratory for Structural Bioinformatics (RCSB), supports scientific research and education worldwide by providing an essential resource of information about biomolecular structures. These molecules of life are found in all organisms, from bacteria and plants to animals and humans.
The RCSB PDB member institutions jointly manage the project: Rutgers, The State University of New Jersey and the San Diego Supercomputer Center and the Skaggs School of Pharmacy and Pharmaceutical Sciences at the University of California, San Diego.
OpenHelix, LLC, (http://www.openhelix.com) provides a bioinformatics and genomics search and training portal, giving researchers one place to find and learn how to use resources and databases on the web. The OpenHelix Search portal searches hundreds of resources, tutorial suites and other material to direct researchers to the most relevant resources and OpenHelix training materials for their needs. Researchers and institutions can save time, budget and staff resources by leveraging a subscription to nearly 100 online tutorial suites available through the portal. More efficient use of the most relevant resources means quicker and more effective research.