This week is a bit different than the usual “What’s the Answer?” post where we highlight a question from a forum that our readers might be interested in. However–in this post, one of the answers includes BioStar–so it sort of comes back around!
Stephen Turner (aka @genetics_blog) wrote up a blog post recently in response to people asking him how he stays current in bioinformatics/genomics. I know a lot of people retweeted that post at the time, and his very kind inclusion of OpenHelix led to some good traffic to this blog and new twitter followers. And just last night I saw this as well, confirming my impression of this post:
There are blogs, forums, automated searches, Twitter, and literature–of course. A lot of people might know about some of these, but it’s nice to see someone assemble a collection. It is also interesting to see how similar it is to my strategy.
So it seemed like this might be a fun item to highlight as a good source of answers on a number of things, and a good way to find useful sites and folks to be aware of in this field. Check it out.
NCBI was created in 1988 and has maintained the GenBank database for years. They also provide many computational resources and data retrieval systems for many types of biological data. As such they know all too well how quickly the data that biologists collect has changed and expanded. As uses for various data types have been developed, it has become obvious that new types of information (such as expanded metadata) need to be collected, and new ways of handling data are required.
NCBI has been adapting to such needs throughout the years and recently has been adapting its genome resources. Today’s tip will be based on some of those changes. My video will focus on the “completely redesigned Genome site”, which was recently rolled out and announced in the most recent NCBI newsletter. I haven’t found a publication describing the changes, but the newsletter goes into some detail and the announcement found at the top of the Genome site (& that I point out in the video) has very helpful details about the changes.
As you will see in the announcement, the Genome resource is not the only related resource to have undergone changes recently, including the redesign of the Genome Project resource into the BioProject resource and the creation of the BioSample resource. I won’t have time to go into detail about those two resources but at the end of my post I will link to two recent NCBI publications that came out in Nucleic Acids Research this month – these are good resources to read for more information on BioProject, BioSample, and on the NCBI as a whole. For a historical perspective I also link to the original Genome reference, which is in Bioinformatics and currently free to access.
Some of the changes are very interesting, including that “Single genome records now represent an organism and not a genome for one isolate.” The NCBI newsletter states that “Major improvements include a more natural organization at the level of the organism for prokaryotic, eukaryotic, and viral genomes. Reports include information about the availability of nuclear or prokaryotic primary genomes as well as organelles and plasmids. ” There’s also a note that “Because of the reorganization to a natural classification system, older genome identifiers are no longer valid. Typically these genome identifiers were not exposed in the previous system and were used mainly for programmatic access. ” That makes me wonder what changes this will mandate to other NCBI’s resources, as well as external resources. I haven’t seen any announcements on that yet, so I’ll just have to stay tuned & check around often.
Enjoy the tip & let us, or NCBI, know what you think of their changes!
Historic Entrez Genome reference: Tatusova, T., Karsch-Mizrachi, I., & Ostell, J. (1999). Complete genomes in WWW Entrez: data representation and analysisBioinformatics, 15 (7), 536-543 DOI: 10.1093/bioinformatics/15.7.536
Barrett, T., Clark, K., Gevorgyan, R., Gorelenkov, V., Gribov, E., Karsch-Mizrachi, I., Kimelman, M., Pruitt, K., Resenchuk, S., Tatusova, T., Yaschenko, E., & Ostell, J. (2011). BioProject and BioSample databases at NCBI: facilitating capture and organization of metadataNucleic Acids Research DOI: 10.1093/nar/gkr1163
Sayers, E., Barrett, T., Benson, D., Bolton, E., Bryant, S., Canese, K., Chetvernin, V., Church, D., DiCuccio, M., Federhen, S., Feolo, M., Fingerman, I., Geer, L., Helmberg, W., Kapustin, Y., Krasnov, S., Landsman, D., Lipman, D., Lu, Z., Madden, T., Madej, T., Maglott, D., Marchler-Bauer, A., Miller, V., Karsch-Mizrachi, I., Ostell, J., Panchenko, A., Phan, L., Pruitt, K., Schuler, G., Sequeira, E., Sherry, S., Shumway, M., Sirotkin, K., Slotta, D., Souvorov, A., Starchenko, G., Tatusova, T., Wagner, L., Wang, Y., Wilbur, W., Yaschenko, E., & Ye, J. (2011). Database resources of the National Center for Biotechnology InformationNucleic Acids Research DOI: 10.1093/nar/gkr1184
It is often beneficial to visit multiple biomedical databases or resources, even if they seem to provide overlapping information because no two resources focus on the exact same information, or present it in exactly the same way. Instead of duplicating each others’ curation efforts, database often link out to related information at other resources. You can think of these links as “social connections”, if you want and in today’s tip I want to show you a couple of connections between protein information resources, including a new connection that really features some of the core value of the PSI’s Structural Biology Knowledgebase, or SBKB.
I begin the tip at the UniProtKB, where I search for a UniProt ID number. From the resulting protein report I first briefly show you how to link out to a corresponding RCSB PDB report, where you can find high quality protein structure information and more. If you are interested in learning more about the RCSB PDB & how to use it, please check out OpenHelix’s full, free tutorial that is sponsored by the RCSB PDB.
From there I return to the UniProt report and demonstrate a new link out option that links to protein protocols, available materials, as well as information about theoretical models and predicted protein targets from the SBKB. I don’t have time to show it but a recent update to the SBKB allows users to now search the Structure Biology Knowledgebase with a UniProt accession number. These searches provide users with additional information including protein structure information and information about pre-released structure sequence. As with the RCSB PDB, we have a free tutorial on the SBKB that is sponsored by the Protein Structure Initiative.
As I scroll through the UniProt protein report users will see information and links for a wide variety of bioscience resources. OpenHelix, as I’m sure many of you are aware, has tutorials on how to use many of these resources. Our tutorials on the RCSB PDB and the PSI SBKB are both free. Our tutorials on UniProt and many other resources are available through a subscription to our database of trainings or through purchase of individual access. Whether you learn the resources through our tutorials, through the references I list below, or through your own explorations of the databases, there really is an amazing amount of information available through these interlinked, publicly-funded resources – please make use of them in your research!
References: The UniProt Consortium. (2009). The Universal Protein Resource (UniProt) in 2010Nucleic Acids Research, 38 (Database) DOI: 10.1093/nar/gkp846
Rose, P., Beran, B., Bi, C., Bluhm, W., Dimitropoulos, D., Goodsell, D., Prlic, A., Quesada, M., Quinn, G., Westbrook, J., Young, J., Yukich, B., Zardecki, C., Berman, H., & Bourne, P. (2010). The RCSB Protein Data Bank: redesigned web site and web servicesNucleic Acids Research, 39 (Database) DOI: 10.1093/nar/gkq1021
Berman, H., Westbrook, J., Gabanyi, M., Tao, W., Shah, R., Kouranov, A., Schwede, T., Arnold, K., Kiefer, F., Bordoli, L., Kopp, J., Podvinec, M., Adams, P., Carter, L., Minor, W., Nair, R., & Baer, J. (2009). The protein structure initiative structural genomics knowledgebaseNucleic Acids Research, 37 (Database) DOI: 10.1093/nar/gkn790
***This is a special offering of our introductory programming course, given in-person in Kendall Square, Cambridge, Massachusetts.***
This course teaches IT specialists, entry level bioinformaticians and biologists to leverage Linux and various open-source bioinformatics tools, together with scripting and data management tools, to perform computations in biological research and create information from data. Examples in this course will use data from DNA and amino acid sequences, microarray profiles, images, mass spectrometry, LIMS, and biological annotations.
The course is divided into 4 sessions, with roughly 1-1.5 hours of lecture, and additional time for lab exercises. A 5th session is reserved for project discussion and any other course relevant issues the students may want to discuss.
Session 1 will cover an overview of the computing landscape for bioinformatics. Common data management and mining issues will be highlighted, alongwith an overview of challenging problems and their solutions. Linux OS overview will be done, alongwith lab exercises to get participants familiar with Linux.
Session 2 will cover fundamentals of scripting with Perl, such as scalars, arrays, variable interpolation, operators (mathematics, conditional, logical), file input/output, printing, loops (if-then-else, for, while), list operations, etc. Lab exercises will be conducted
Session 3 will cover functions/ subroutines, hash arrays and regular expressions. Participants will be introduced to MySQL databases.
Session 4 will cover the installation of Perl packages, and some examples of using the famous package BioPerl for manipulating sequences, automating BLAST queries, etc. Project will be assigned that uses BioPerl and the Perl Database interface
PREREQUISITES: No programming experience is required, just a need to learn how to program.
The new tutorial reflects the many changes and enhancements on the RCSB PDB site, and includes a narrated on-line tutorial, PowerPoint slides, handouts, and exercises.
Bellevue, WA (PRWEB) April 12, 2011
The Research Collaboratory for Structural Biology (RCSB) Protein Data Bank (PDB) has partnered with OpenHelix to provide a revised and updated tutorial (http://www.openhelix.com/PDB) on its free web based resource for studying biological macromolecules (http://www.pdb.org).
The RCSB PDB provides a variety of tools and resources to use to study biological macromolecules. The PDB is the single worldwide repository of experimentally-determined 3D biological structures of proteins, nucleic acids and complex assemblies. As a member of the Worldwide PDB collaboration (wwpdb.org), the RCSB PDB curates and annotates PDB data, and presents basic and advanced search, display and visualization methods to access these data.
The new tutorial reflects the many changes and enhancements on the RCSB PDB site, including a new data drill-down and data summary feature, updated ligand features such as a download page, images and binding affinity data, new report types and visualization options, among many others.
The new training materials (at http://www.openhelix.com/pdb) include an online narrated tutorial that demonstrates: basic and advanced searches, how to generate reports, the different options for exploring individual structures, and many of the research and educational resources and tools available at the RCSB PDB. The approximately 60-minute tutorial, which runs in just about any browser, can be viewed from beginning to end or navigated using chapters and forward and backward sliders.
In addition to the tutorial, RCSB PDB users can also access useful training and teaching materials including the animated PowerPoint slides used as a basis for the tutorial, suggested script for the slides, slide handouts, and exercises. This can save a tremendous amount time and effort for teachers and professors to create classroom content.
About the RCSB PDB
The RCSB Protein Data Bank (http://www.pdb.org), administered by the Research Collaboratory for Structural Bioinformatics (RCSB), supports scientific research and education worldwide by providing an essential resource of information about biomolecular structures. These molecules of life are found in all organisms, from bacteria and plants to animals and humans.
The RCSB PDB member institutions jointly manage the project: Rutgers, The State University of New Jersey and the San Diego Supercomputer Center and the Skaggs School of Pharmacy and Pharmaceutical Sciences at the University of California, San Diego.
OpenHelix, LLC, (http://www.openhelix.com) provides a bioinformatics and genomics search and training portal, giving researchers one place to find and learn how to use resources and databases on the web. The OpenHelix Search portal searches hundreds of resources, tutorial suites and other material to direct researchers to the most relevant resources and OpenHelix training materials for their needs. Researchers and institutions can save time, budget and staff resources by leveraging a subscription to nearly 100 online tutorial suites available through the portal. More efficient use of the most relevant resources means quicker and more effective research.
Free tutorial suite on the Structural Biology Knowledgebase includes an online narrated movie, PowerPoint slides, slide handouts and exercises.
Bellevue, WA (PRWEB) April 11, 2011
The Structural Biology Knowledgebase (SBKB), a one-stop shop for information about proteins hosted at Rutgers University, has partnered with OpenHelix to provide an updated and revised free tutorial suite (http://www.openhelix.com/sbkb) on its online protein “portal” located at http://sbkb.org/.
The SBKB is a free, comprehensive resource produced through a collaboration between the National Institutes of Health’s Protein Structure Initiative: Biology program and the Nature Publishing Group. The PSI SBKB contains genetic, structural, functional and experimental information about proteins that is easily accessible through a variety of reports and displays. The portal also includes links to many additional resources.
The new tutorial reflects the many changes and enhancements to the SBKB, including a recent name change from Structural Genomics Knowledgebase to Structural Biology Knowledgebase, new navigation organization, and remodeled Protein Model Portal reports, among many others.
The online narrated tutorial runs in just about any browser and can be navigated in a number of ways. In about 60 minutes, the tutorial highlights and explains the features and functionality needed to start using the SBKB effectively. The tutorial can be used by new users to introduce them to the protein portal, by previous users to view new features and functionality, or simply as a reference tool to understand specific features.
In addition to the tutorial, users can also access useful training and teaching materials including the animated PowerPoint slides used as a basis for the tutorial, suggested script for the slides, slide handouts, and exercises. This can save a tremendous amount time and effort for teachers and professors to create classroom content.
About the PSI
The Protein Structure Initiative (PSI, http://www.nigms.nih.gov/Initiatives/PSI/psi_biology/), which is supported by the National Institutes of Health, is a federal, university and industry effort aimed at dramatically reducing the costs and lessening the time it takes to determine a three-dimensional protein structure. The long-range goal of the PSI is to make the three-dimensional atomic-level structures of most proteins easily obtainable from knowledge of their corresponding DNA sequences. The PSI strives to gain biological insights from new structures and to help the broad biomedical research community make use of PSI research findings.
OpenHelix, LLC, (http://www.openhelix.com) provides a bioinformatics and genomics search and training portal, giving researchers one place to find and learn how to use resources and databases on the web. The OpenHelix Search portal searches hundreds of resources, tutorial suites and other material to direct researchers to the most relevant resources and OpenHelix training materials for their needs. Researchers and institutions can save time, budget and staff resources by leveraging a subscription to over 100 online tutorial suites available through the portal. More efficient use of the most relevant resources means quicker and more effective research.
Online tutorial gives researchers and scientists a place to learn about the many biology resources available to them.
These links assist scientists by guiding them to relevant technical tutorials on resources which may be unfamiliar to them. Thanks to this partnership with OpenHelix, BioMed Central journals are able to make their scientific content more useful and access.
Bellevue, WA (PRWEB) April 6, 2011
The science community now has a valuable launching point to explore and find the many bioinformatics and genomics resources available to them through the “World Tour of Genomics Resources” tutorial suite by OpenHelix.
The free tutorial suite includes a sampling of resources organized by categories such as algorithms and analysis tools, expression resources, genome browsers (both eukaryotic and prokaryotic/microbial), literature and text mining resources, and resources focused on nucleotides, proteins, pathways, disease and variation.
In each category, the tutorial explores not only the most popular resources, but also some lesser known ones that fill unique scientific needs or are especially helpful to researchers.
The tour also shows easy ways of accomplishing the difficult task of finding and learning about other resources with the free OpenHelix search tool, tutorial suites, and other tools.
“With the ever expanding data sets and resources of the genomics era” said Warren (Trey) Lathe, Chief Science Officer at OpenHelix, “this tutorial suite fills the critical need of giving scientists an overview of resources and showing them ways to find them and learn how to use them.”
The online narrated tutorial, which runs in just about any browser, can be viewed from beginning to end or navigated using chapters and forward and backward sliders.
Included in the tutorial suite are animated PowerPoint slides used as a basis for the tutorial, suggested script for the slides, slide handouts, and aa list of the resources and tutorial landing pages mentioned in the tutorial. This saves a tremendous amount time and effort for teachers and professors to give this tour to others.
A companion piece to this free tutorial, exploring ways to find and learn about online biology computational tools, is the paper “OpenHelix: bioinformatics education outside of a different box” published in a special issue of Briefings in Bioinformatics entitled “Special Issue: Education in Bioinformatics“. This paper describes a wide range of repositories where researchers can access informal educational sources of learning on publicly available bioinformatics resources. These include a wide variety of formats and strategies including lists of resources, journals that regularly feature tool descriptions, and eLearning resources sources such as the MIT OpenCourseWare effort.
We usually don’t blog specifically about OpenHelix tutorial purchasing (we do that with press releases), it’s not the purpose of the blog, but I really wanted to give a quick heads up. Many of our tutorials are free to the end-user because the resource provider has funded the training and outreach. UCSC, ENCODE, PDB, VISTA, SBKB and Galaxy are just some examples. The bulk of our tutorials (check out the catalog: reaching 100!) are behind a subscription wall. For trainers, professors teaching genomics, power learners, groups and institutions, subscriptions make a lot of sense. Sometimes though, individual users need to train on one or two resources and their need is fulfilled. We’ve just added a individual purchase function to our tutorials for those users.
If you are not subscribed, you’ll notice new green “purchase” and “subscribe” buttons (if you are subscribed, those buttons won’t appear of course, the tutorials are unlimited access). Click on the “purchase” button and you can get access to that specific tutorial immediately after a $28.50 purchase (through Google checkout, requires a free Google account, if you have a google email, that will be al you need). You’ll have immediate access that will last for 3 days after purchase. That will give unlimited access to the Flash movie for three days and the ability to download the slides, handouts and exercises.
Again, you can see the tutorials in our list here, or search for the resources on our home and search page. Just type in the resource (or general topic) you are interested in the search box. If we have a tutorial on the resource, there will be a ‘puzzle’ icon to the left of the search result. Green means it’s sponsored and free, red means you can view it with a subscription, or individual purchase. Just click the icon :D.
Would like to just announce that Mary and I will be giving an all-day hands-on workshop on Tuesday, November 2nd, 2010 in Washington DC (my home town), right before the ASHG conference (where we will also be). The title of the workshop is A World Tour of Genome Browsers and a Galaxy of Analysis Tools. We’ll be covering UCSC Genome and Table Browsers, an overview of other genome browsers, BioMart, Galaxy and a tour of genome resources and how to find them. For more information on location, cost, topics you can continue reading here. There are workshops on UCSC and Galaxy at ASHG for attendess (which we will be at, but Bob, Anton and others will be doing), but those have sold out and filled up. We are offering this workshop for those who would like to learn these topics and more, both DC residents and ASHG attendees.
Back in April I happened to mention that we (OpenHelix) were writing a paper on informal sources of bioinformatics education (in a Friday SNPets item) and we were asked to announce when the paper came out. Well, we got word late last week that the article has been published. The article appears in a special issue of Briefings in Bioinformatics that is devoted to bioinformatics education. I’m not sure if all the articles in the issue are available yet, but it looks like several are in the journal’s Advanced Access area. Bioinformatics education is an area (obviously) that OpenHelix cares deeply about & we are anxiously awaiting our copies of the full issue so we can read all the articles, but I digress…
The title “OpenHelix: bioinformatics education outside of a different box” (if you hit a paywall, or have trouble accessing, we will gladly send a reprint. Just email the corresponding author, Jennifer listed in the abstract or ask from our contact link- Trey) was a cool suggestion from one of the article’s reviewers – my original title was much tamer (ok, more boring). Regardless of the final title, what we wanted to do in the article is to discuss informal sources of bioinformatics education. By education we do mean acquiring applicable information that allows a researcher to operate within the field of bioinformatics. By informal we mean outside of traditional, credit based classes and degrees. Essentially we provide a bit of the knowledge and know-how that we’ve gathered over years of working with hundreds of resources, thousands of workshop attendees, and countless online contacts about where a researcher, or librarian, or whoever can turn for various informational needs in the field of bioinformatics.
Our contention is that not everyone needs to program in order to manage and manipulate their biological data these days. There are SO many fine publicly available databases, algorithms, tools and more, it is just a matter of awareness and training for anyone to be able to reformat and analyze their personal data sets. We maintain that :
…bioinformatics education needs to do a minimum of four things:
1. raise awareness of the available resources
2. enable researchers to find and evaluate resource functionality
3. lower the barrier between awareness and use of a resource
4. support the continuing educational needs of regular resource users
In the paper we walk through each of these – we first describe example needs associated with the point, and then cover possible informal resources that meet the needs. The article includes tables of resources and links to them and many many references. We really hope that is a very useful resource in the field of bioinformatics education. I am already looking forward to contributing to the next special education issue, both to hone my writing skills and to extend the information we can provide readers. Please do comment, email, whatever and let us know about the resources that you use, what you learned from the article, etc. Oh, here’s the citation info: Williams, J., Mangan, M., Perreault-Micale, C., Lathe, S., Sirohi, N., & Lathe, W. (2010). OpenHelix: bioinformatics education outside of a different box Briefings in Bioinformatics DOI: 10.1093/bib/bbq026