Category Archives: Tip of the Week

Video Tip of the Week: TargetMine, Data Warehouse for Drug Discovery

Browsing around genomic regions, layering on lots of associated data, and beginning to explore new data types I might come across are things that really fire up my brain. For me, visualization is key to forming new ideas about the relationships between genomic features and patterns of data. But frequently I want to take this to the next step–asking where else these patterns appear, how many other instances of this situation are there in a data set, and maybe adding additional complexity to the problem and refine the quest. This is not always easy to do with primarily visual software tools. This is when I turn to tools like the UCSC Table Browser, BioMart, and InterMine to handle some list of genes, or regions, or features.

We’ve touched on all of these before–sometimes with full tutorial suites (UCSC, BioMart), and sometimes as a Tip of the Week, InterMine and InterMine for complex queries. Learning about the foundations of these tools will let you use various versions or flavors of them at other sites. I love to see tools that are re-used for different topics when that’s possible, rather than building a whole new system. There are ModENCODE, rat, yeast mines, and more. This week’s tip is about one of those others–TargetMine is built on the InterMine foundation, with a specific focus on prioritizing candidate genes for pharmaceutical interventions. From their site overview, I’ll add this description they use: TargetMine

TargetMine is an integrated data warehouse system which has been primarily developed for the purpose of target prioritisation and early stage drug discovery.

For more details about their framework and philosophy, you should see their papers (linked below). The earlier one sets out the rationale, the data types, and the data sources they are incorporating. They also establish their place in the ecosystem of other databases in this arena, which helps you to understand their role.  But you should see the next paper for a really good grasp of how their candidate prioritization work with the “Integrated Pathway Clusters” concept they’ve added. They combined data from KEGG, Reactome, and NCI’s PID collections to enhance the features of their data warehouse system.

This week’s Video Tip of the Week highlights one of the tutorial movies that the TargetMine team provides. There’s no spoken audio with it, but the captions that help you to understand what’s going on are in English. I followed along on a browser with their example–they have a sample list to simply click on, and you can see various enrichments of the sets–pathways, Gene Ontology, Disease Ontology, InterPro, CATH, and compounds. They call these the “biological themes” and I find them really useful. You can create new lists from these theme collections. They also illustrate the “template” option–pre-defined queries with typical features people may wish to search. The example shows how to go from the list of genes you had to pathways–but there are other templates as well.

Another section of the video has an example of a custom query with the Query Builder. They ask for structural information for proteins targeted by acetaminophen. It’s a nice example of how to go from a compound to protein structure–a question I’ve seen come up before in discussion threads.

In their more recent paper (also below), they have some case studies that illustrate the concepts of prioritizing targets for different disease situations with their system.  They also expand on the functions with additional software to explore the pathways: http://targetmine.mizuguchilab.org/pathclust/ .

So have a look at the features of TargetMine for prioritization of candidate genes. I think the numerous “themes” are a really useful way to assess lists of genes (or whatever you are starting with).

Quick Links:

TargetMine: http://targetmine.mizuguchilab.org/ [note: their domain name has changed since the publications, this is the one that will persist.]

InterMine: http://intermine.github.io/intermine.org/

References:

Chen, Y., Tripathi, L., & Mizuguchi, K. (2011). TargetMine, an Integrated Data Warehouse for Candidate Gene Prioritisation and Target Discovery PLoS ONE, 6 (3) DOI: 10.1371/journal.pone.0017844

Chen, Y., Tripathi, L., Dessailly, B., Nyström-Persson, J., Ahmad, S., & Mizuguchi, K. (2014). Integrated Pathway Clusters with Coherent Biological Themes for Target Prioritisation PLoS ONE, 9 (6) DOI: 10.1371/journal.pone.0099030

Kalderimis A.,  R. Lyne, D. Butano, S. Contrino, M. Lyne, J. Heimbach, F. Hu, R. Smith, R. Stěpán, J. Sullivan & G. Micklem & (2014). InterMine: extensive web services for modern biology, Nucleic Acids Research, 42 (W1) W468-W472. DOI: http://dx.doi.org/10.1093/nar/gku301

Video Tip of the Week: Viewing Amino Acid info in the UCSC Genome Browser

We’ve been doing training on the UCSC Genome Browser for over 10 years now. We’ve seen it grow from just a few genomes and a few tracks to the enormous trove of information it is today. In fact, one of the toughest things about training is how to balance all the new information and features with the foundational things one needs to really grok the framework and the functionality.

ucsc_aminoacidsIn the training materials that we have today, we touch briefly on the amino acid display in the reference genome track. And one of our very first Tip-of-the-Week blog posts was about how to visualize the 3-frame translation on the browser. But we don’t have time to go into every track and show the options that you can employ for all the displays. We stress that you should check each track for more features, but in the short workshops we can only cover a few examples.

The UCSC team has started a great video series that can supplement the main training  suites that are available. You can see their whole YouTube channel here: UCSC Genome Browser. But today I’ll highlight one of the recent ones so you can see the type of help they offer. And it covers a feature–display of amino acids, codon numbers, and amino acid variation information, that we don’t have time to do in our workshops.

How do I identify codon numbers with the UCSC Genome Browser?

Bob Kuhn and Pauline Fujita’s video here, as well as the others in the series, focus on a specific task and give great tips and details. Be sure to check out the others as well, and subscribe to their channel to be notified when new ones become available.

Quick links:

UCSC Genome Browser: http://genome.ucsc.edu

UCSC YouTube Channel: http://bit.ly/genomebrowserYoutube

OpenHelix full training suite materials:

Note: UCSC has sponsored our training materials for years, and because of this sponsorship they are freely available to everyone.

Reference:

Rosenbloom K.R., J. Armstrong, G. P. Barber, J. Casper, H. Clawson, M. Diekhans, T. R. Dreszer, P. A. Fujita, L. Guruvadoo, M. Haeussler & R. A. Harte & (2014). The UCSC Genome Browser database: 2015 update, Nucleic Acids Research, 43 (D1) D670-D681. DOI: http://dx.doi.org/10.1093/nar/gku1177

Video Tip of The Week: Jalview for multiple sequence alignment editing and visualization

The multiple sequence alignment editing question recently on our What’s the Answer? feature was popular. We have covered MSA editors in the past, and we include a bit on Jalview in our Clustal tutorial, but I hadn’t revisited them lately. In preparation for that post I specifically looked over at the Jalview site, and I realized that they have recently provided a number of training videos to help people use their tools. So this week’s tip of the week will highlight them.

At the Jalview site, they give this brief description of the features:

Jalview is a free program for multiple sequence alignment editing, visualisation and analysis. Use it to view and edit sequence alignments, analyse them with phylogenetic trees and principal components analysis (PCA) plots and explore molecular structures and annotation.

There are 2 flavors of Jalview. There is a JalviewLite applet you can demo by simply clicking on some examples at their site. Or you can run the Jalview desktop for more features (you can do this from the web or by downloading a local copy). The description on their About page will tell you more about the distinctions. You may also encounter Jalview that’s being incorporated in other tools. Here’s a handy list of those on their Community resources page.

On the Jalview online training Youtube channel, they have a number of videos. Some are general overview, some are specific tasks. For a general overview of what it does, this intro video will help you to decide if it’s a tool that would help you:

If you are ready to try it out, there are some handy tips in this video with more details about actually using the features of the software. It covers basic navigation, understanding the interface layout, working on editing, and good tips for accomplishing things efficiently.

For more of the philosophy and foundations of Jalview, check out their paper (linked below). And check out their other videos to go further.

Quick link:

Jalview: http://www.jalview.org/

Reference:

Waterhouse, A.M., Procter, J.B., Martin, D.M.A, Clamp, M. and Barton, G. J. (2009)
“Jalview Version 2 – a multiple sequence alignment editor and analysis workbench”
Bioinformatics25 (9) 1189-1191 doi: 10.1093/bioinformatics/btp033

Video Tip of the Week: The New OpenHelix Interface

Generally we like to highlight new features and new tools from bioinformatics software providers. But this week we wanted to introduce some new features of our own OpenHelix site. If you’ve been using the site for a while, you will have noticed that recently we rolled out some changes. All the same tutorial materials and tips are available, but we’ve provided new ways to access them.

The most important thing about the new site is accessing our training suites.

Access the training video, slides, and exercises on the training suite page.

Access the training video, slides, and exercises on the training suite page.

This is now quicker with buttons right from the main page at OpenHelix: full catalog, or list of free tutorials. And when you find a suite you want to watch (like the UCSC Genome Browser one shown here as an example), it loads right on that landing page instead of requiring another click to launch a new window. If you still want the larger original size version in a new window, though, that’s still available from the button below.  Access to our slides, handouts, and exercises is still there–right below the video. And you can still quickly hop over to the site that’s described on the page with the “Visit the Resource” link.

Further down on the page we have the links to related content. This can be other tutorials about this resource (for example, UCSC Genome Browser Advanced Topics), or other genome browsers. We also collect the blog posts related to this resource that may offer new tools or features and link them from this page. And we also text-mine the BiomedCentral open-access publications to seek out those citing this resource–this way you can see what researchers are doing with the tools in their research programs.

Our search feature still provides access to our complete collection of resources that we’ve examined over the years. But the search results are now also refined to let you tab to those with our popular Video Tip of the Week subset if you just want to locate those with short video tips.

With that overview, I’ll also offer this week’s video tip overview of the new site as well.

Our basic philosophy remains the same, as we explained on our paper (linked below).

To accomplish its outreach mandate, bioinformatics education needs to do a minimum of four things:

  1. raise awareness of the available resources
  2. enable researchers to find and evaluate resource functionality
  3. lower the barrier between awareness and use of a resource
  4. support the continuing educational needs of regular resource users

We want to provide introductory training on many of the core resources in bioinformatics, and help educators and trainers elsewhere to provide this to students and staff who will need to access these tools for their research. We hope you like the new look. If you have any issues, let us know.

Quick link:

Main site: www.openhelix.com

Reference:

Williams J.M., M.E. Mangan, C. Perreault-Micale, S. Lathe, N. Sirohi & W. C. Lathe (2010). OpenHelix: bioinformatics education outside of a different box, Briefings in Bioinformatics, 11 (6) 598-609. DOI: http://dx.doi.org/10.1093/bib/bbq026

There’s a press release associated with this too, with further details: OpenHelix Unveils New Online Training Site, Subscriber Services.

Video Tip of the Week: Protein structure information for public outreach. Really.

This week’s tip isn’t about a specific tool–but a really interesting look at how a tool was used in the context of some general public outreach messaging. Recently I posted about Aquaria, a new tool available to let biologists explore protein structures, mutations, and domains in user-friendly ways. But an interesting example of how the information about protein structures can be used to drive understanding came from a video animation of protein accumulation in Alzheimer’s. Just have a look at the video first and enjoy it. How cool is that clathrin basket pulling the vesicle in?

Description from their brochure at the launch [PDF]:

Christopher Hammang’s “Alzheimer’s Enigma” which explores the neurons of the human brain, and reveals how normal protein breakdown processes become dysfunctional and result in plaque formation during Alzheimer’s disease.

I found out about it as I was looking at the upcoming VIZBI talks and exploring their site for other features. In the VizbiPlus section there are a number of excellent animations of molecular processes, and this video was one of them. Be sure to watch for other tweets with the #vizbi hashtag for the next few days. I bet you’ll see some amazing tools and visualizations, as always.

Recently I mentioned the longer, more comprehensive, video from the Aquaria team, but I didn’t use that for my tip–I just used the short version overview. But the longer version had this extra bonus piece of how their software had been used by this animator. Here is Christopher Hammang, creator of this video, describing how he used the Aquaria information to generate the structural model for his animation:

Often it helps people to see how someone else used a tool for a project to get a better grasp of it. And this seemed like such a compelling and unusual example, I wanted to highlight it.

So again I’ll point you to the Aquaria tool tip from earlier this month to explore more, now with an understanding of an example of its use. But I would also encourage you to have a look at the other animations coming out of VIZBI at the VizbiPlus page. I swear, the animated intestine is way cooler than you might expect. The diabetes + insulin receptor videos are really informative and helpful. A cancer video illustrates a misbehaving p53.  Go look.

Quick links:

VizbiPlus videos: http://www.vizbi.org/Plus/

VizbiPlus Poster from Hammang and team: Alzheimer’s Enigma: Putting the Pieces Together http://www.vizbi.org/Posters/2015/B08

Vizbi Posters: http://www.vizbi.org/Posters/

Aquaria: http://aquaria.ws

Reference:
O’Donoghue S.I., Kenneth S Sabir, Maria Kalemanov, Christian Stolte, Benjamin Wellmann, Vivian Ho, Manfred Roos, Nelson Perdigão, Fabian A Buske, Julian Heinrich & Burkhard Rost & (2015). Aquaria: simplifying discovery and insight from protein structures, Nature Methods, 12 (2) 98-99. DOI: http://dx.doi.org/10.1038/nmeth.3258

Video Tip of the Week: Designing proteins, using Rosetta

As often happens, last week’s tip on visualizing structures led me to some more reading and thinking about creating protein structures. And although it’s important for biologists to be able to use more of the information about protein structures and variations in their work from tools like Aquaria or PDB, it’s also important for some researchers to be on the other end of the pipeline and actually making the protein structures. Further, this also leads to the possibility of better designs of novel proteins as therapeutics–for example, making antibodies like the ones that could possibly battle Ebola.

As I looked around for protein design software to highlight for a tip, it was clear to me that the level of complexity of the problems in designing proteins didn’t really lend itself to short videos. There are some introductory seminars and tutorials on the Rosetta tools, but these certainly require a bit of time to explore. Instead, I’ve decided to highlight this really nice overview on the aspects of protein design that you would have to tackle to make customized proteins.

This iBiology “Introduction to Protein Design” by David Baker is really well done. There’s also a second seminar that is more detailed about designing proteins with new functions to solve many problems in biomedical research and environmental challenges.

This seems incredibly important and useful–but certainly daunting to get started. One way to get a head start on this would be to take an intro workshop. I was recently notified about the opportunity to learn from a couple of researchers who are very skilled with the Rosetta tools–Daisuke Kuroda and Jared Adolf-Bryfogle.

I’m including in the references a nice review of the basics of computational design of antibodies by Kuroda et al. And also a paper by Adolf-Bryfogle and team that covers important aspects of the component parts of antibodies that you would need to predict structures and design new ones, which are stored in the database they’ve created. This should give you a sense of the challenges and opportunities. And give you a good foundation for the concepts.

RosettaCommonsLogoRosetta software has been a powerhouse of protein design for many years. It’s been a leader in the CASP competitions (Critical Assessment of protein Structure Prediction). It’s got a strong user community: Rosetta Commons. You can obtain and use the software in a variety of ways, including some servers for academic use, and one important stop would be the ROSIE servers, “The Rosetta Online Server that Includes Everyone hosts several servers for combined computer power as a free resource for academic users.”

Quick links:

ROSIE servers: http://rosie.rosettacommons.org (note there’s a specific protocol section there that covers antibodies)

Rosetta Commons: https://www.rosettacommons.org/

PyIgClassify: http://dunbrack2.fccc.edu/PyIgClassify/

Short course:  Designing Antibodies with Rosetta Sunday May 3 2015. Early registration discount ends soon.

References:

Kuroda D., H. Shirai, M. P. Jacobson & H. Nakamura (2012). Computer-aided antibody design, Protein Engineering Design and Selection, 25 (10) 507-522. DOI: http://dx.doi.org/10.1093/protein/gzs024

Adolf-Bryfogle J., Q. Xu, B. North, A. Lehmann & R. L. Dunbrack (2014). PyIgClassify: a database of antibody CDR structural classifications, Nucleic Acids Research, 43 (D1) D432-D438. DOI: http://dx.doi.org/10.1093/nar/gku1106

Rybicki E.P. (2014). Plant-based vaccines against viruses, Virology Journal, 11 (1) 205. DOI: http://dx.doi.org/10.1186/s12985-014-0205-0

Note: OpenHelix is a part of Cambridge Healthtech Institute.

Video Tip of the Week: Aquaria, streamlined access to protein structures for biologists

This week’s Video Tip of the Week is Aquaria, a new resource for exploring protein structures, mutations, and similarities to other proteins. It’s a very well-designed and interactive experience for end users. It is aimed largely at biologists who could benefit from exploring the structural details of their proteins of interest, but are daunted by tools aimed at structural biologists. But for tool developers, you should also look at how this rollout went. It’s one of the best examples of a tool launch I’ve seen in this field. And I’ve seen a lot.

So first, the tool. Aquaria offers users a streamlined way to access and explore protein structures. Combining the kinds of information you get from the PDB structure resources, and additional details like the UniProt mutations. Currently you start with a basic search by asking for a protein by name, or PDB or UniProt ID. They have pre-calculated the relationships of proteins in PDB and Swiss-Prot to quickly offer you a structure and related proteins. The paper notes: “Currently, Aquaria contains 46 million precalculated sequence-to-structure alignments, resulting in at least one matching structure for 87% of Swiss-Prot proteins and a median of 35 structures per protein….” In addition, it lets you explore other important biological features such as InterPro domains, post-translational modifications, so you can think about how the mutations + structures + functions impact a given protein that you are interested in. As they describe it:

“We have loaded SNP data from Uniprot and Interpro so you can see where the mutations lie on your 3D model. And we have found that you may be pleasantly surprised to find your mutations clustering in 3D space!”

The Aquaria folks provided an intro video to get you started:

Another handy feature they provided is a Quick Reference Card with shortcuts to the functions [PDF]. In addition to this intro, they have a longer video as well. This is more like a typical lecture with the background, the framework, the goals of the project, and more about the underlying database.

Now, this thing about the rollout of this software project. I found it when I was looking over the talks at the upcoming VIZBI conference (Visualizing Biological Data). Every year I find there are awesome ideas that come out of VIZBI, and tools I want to explore. Among them this year is Aquaria. So I went looking for more detail, and found some of the traditional stuff. The paper (below), the press release, etc. And then I found the Reddit discussion. The Aquaria team did a Science AMA on this tool. It engaged a range of folks–some folks just fans of science who had probably never seen protein structures before. That’s fine with me–the more folks who appreciate research and learn about how researchers explore proteins is a good thing. But others had good technical questions for the team–such as other ways to find proteins of interest with sequence searches, or integration with other tools like UCSC Genome Browser. All the answers are over there. I enjoyed the question about the name of the tool:

It seems you get the ideas we had in mind: using Aquaria lets us observe these fascinating creatures (proteins) from the natural world. Aquaria creates an artificial environment and lighting where we can observe isolated proteins; like aquarium fish, proteins are often beautiful and (usually) live in water.

I asked them about how this played out, and they had ~1000 folks visit their site as a result of this Reddit event. That was really interesting to me, and a very neat route to drive awareness.

They also provided a way to support users with one of my other favorite resources–Biostars. They created a support thread there where uses can ask questions and get answers. https://www.biostars.org/t/aquaria/ I so prefer this to mailing lists, and I’m glad to see this easy method to get support. In fact, I asked something that I couldn’t quite figure out yet.prot_structure_sample (Here’s the protein I was looking at: http://aquaria.ws/P09616/7ahl/A I wanted to see all the subunits in full color, you de-select autofocus to do that. And color by chains for this version.)

Also, for the developer types: they offer a way for you to interact with the Aquaria software to add your own features of interest with their API. Maybe you have new mutations you have found in some sequence you’ve obtained in your lab, for example. They are offering guidance on that here: http://bit.ly/aquaria-features. They touch on this in the longer video (~27min) if you want a bit more explanation. I suspect from the high quality support they are offering, they’d be interested to hear from you and what features you’d like to see applied to these proteins as well.

So kudos to this team for a nifty tool and really serious multi-media outreach efforts. I think it was well done on all counts. I’ll bet you Reddit reached more of the right folks than a press release ever will. PIOs take note–get your scientists on Reddit.

Quick links:

Aquaria site: http://aquaria.ws/

Reddit Science AMA: https://www.reddit.com/r/science/comments/2w2jvw/science_ama_series_we_are_dr_sean_odonoghue_and/

Biostar support thread: https://www.biostars.org/t/aquaria/

Reference:
O’Donoghue S.I., Kenneth S Sabir, Maria Kalemanov, Christian Stolte, Benjamin Wellmann, Vivian Ho, Manfred Roos, Nelson Perdigão, Fabian A Buske, Julian Heinrich & Burkhard Rost & (2015). Aquaria: simplifying discovery and insight from protein structures, Nature Methods, 12 (2) 98-99. DOI: http://dx.doi.org/10.1038/nmeth.3258

Video Tip of the Week: Beacon, to locate genome variants of potential clinical significance

This week’s Video Tip of the Week follows on last week’s chatter about the Internet of DNA. As I mentioned then, the Beacon tool we touched on was going to get more coverage. So this week’s video is provided by the Beacon team, part of the larger Global Alliance for Genomics and Health project (GA4GH).

I’ve touched on some of the GA4GH work in the past. I heard more about a very interesting piece of it from David Haussler at the recent TRICON meeting.

D. Haussler, slide from TRICON talk.

D. Haussler, slide from TRICON talk.

The talk was called “Stable Reference Structures for Human Genome Analysis” and it was important for me to see this. I’ve been wrestling with some of the literature (linked below) that describes ways to represent genome variations among massive numbers of humans. It really helped me to hear it described and shown as cartoons on slides that were less like equations. And how this will play out in graphs and visualizations with software tools is of particular interest to me.

So one branch of the Data Working Group of the GA4GH is tasked with how to represent the variations as multiple paths as graphs, instead of the one linear reference genome we think of today. It has to accommodate many types of variations–inversions, deletions, duplications, as well as just SNPs. So, as the kids say today, it’s complicated. But we have to figure it out. Stay tuned, I’m sure we’ll be talking more about this in the years to come.

beacon-icon

Beacon is like SETI for genome variations.

Another branch of this project is tasked with trying to figure out how to share genomic data among all the international producers of this data. If we can’t share the data, we won’t be able to look at the variations among humans and learn from them, nevermind display them. This has additional layers of social and legal complexity we are just beginning to face. As a first pass at sharing this data, a “Beacon” system has been implemented to help researchers locate variations of interest to them.

You should read up on the whole Beacon philosophy and see its current implementation at their site. From what I gather, it is a minimal way to share genome information, without incurring privacy and consent barriers that might be hit if you were pulling down a whole genome. You can query any site that implements a Beacon to ask: do you have a variation at this position? And the Beacons can respond with “yes” or “no”. If there are useful variations, you can then pursue them from there, and if you need access to more you can go through the channels then. But at least you’ve possibly found some needles in some haystacks that you might not have known about otherwise.

The Beacon team has done a short video explaining this. It has no audio, just explanatory text with the graphics. Marc Fiume gave me permission to embed it here.

The “Beacon of Beacons” aggregates the query to send it out to all the known Beacons. You can use it today to search for this kind of data. The video also notes that you can cloak the name of the institution to protect patient privacy.

I have been more acutely concerned about genomic privacy issues than some of my cohorts in this arena. And I fully accept that there will not be privacy–what I want is protection from misuse of the information, which I find lacking in the US legal framework right now. That said, I think that Beacon is a nice work-around for that. If I had a variant of concern, I could ping these other sites to see if others had it. Or vice-versa. But the framework under which the donor of that material provided the data would not be pierced. This makes total sense to me, and I can accept this strategy.

Sharing the genomic data from sequenced individuals is going to be tricky and complex. But I’m keen to see the GA4GH group tackle it. I like several of the directions that I’ve seen so far. But right now–check out Beacon. Implement one if you have this kind of data, and let’s see if it works.

Quick links:

Global Alliance for Genomics and Health: http://genomicsandhealth.org/

Beacon (project details page): http://ga4gh.org/#/beacon

Beacon of Beacons (where you would do a search): http://ga4gh.org/#/beacon/bob

References:

Nguyen N., Glenn Hickey, Daniel R. Zerbino, Brian Raney, Dent Earl, Joel Armstrong, W. James Kent, David Haussler & Benedict Paten (2015). Building a Pan-Genome Reference for a Population, Journal of Computational Biology, 150107093755006. DOI: http://dx.doi.org/10.1089/cmb.2014.0146

During David Haussler’s talk, he also referenced these papers:

Video Tip of the Week: CRISPRdirect for editing tools and off-target information

Great RCSB PDB molecule-of-the-month page on CRISPR

Great RCSB PDB molecule-of-the-month page on CRISPR

Genome editing strategies are certainly a hot topic of late. We were astonished at the traffic that the animation of the CRISPR/Cas-9 process recently drew to the blog. There’s a huge amount of potential for novel types of studies and interventions in human disease situations–but I’m already seeing applications in agriculture coming along. There’s an edited canola available in Canada already. China has edited wheat for disease resistance. There’s a project underway to remove horns from cattle–by merely snipping out a bit of sequence with TALENs/ZNF strategies. They’ve already created cattle with edited myostatin too.

To accompany this work, new software tools have been developed to help design target sequences and evaluate potential off-target situations. Both TALEN target software tools exist, and CRISPR tools exist. For this post I’ll be focusing on just one of the CRISPR tools, but I’ll list a few others as well. Some sites have incorporated both options in their software tools. Some will have a small range of species, some have larger sets. So part of choosing a tool is asking about the genomes it supports. In future Tips we may explore some of the others. There is something of a flood of these tools coming along, and I’ll continue to explore them.

This week’s focus is CRISPRdirect. A Japanese group has created this tool for generating a guide sequence and for evaluating potential off-target activity. This introductory video (with music, and with English annotations to convey the features) will give you an overview of the functions.

It seems to be an easy-to-use interface, with effective organization of the results. They have a nice range of species to examine–not only some of the mammalian genomes, but fish, chicken, worm, plants, and yeast too. There’s a graphical viewing component and an easy export option as well.

So I’ve come across a few tools in my search, but if you have favorites please feel free to add them below in the comments. I’m going to continue to look into these tools and will be looking to highlight others in the future.

Quick link:

CRISPRdirect: http://crispr.dbcls.jp/

A few links to other tools I’ve been looking at:

E-TALEN: http://www.e-talen.org/E-TALEN/

E-CRISP: http://www.e-crisp.org/E-CRISP/

TAL Effector Nucleotide Targeter 2.0: https://tale-nt.cac.cornell.edu/

Prognos: http://baolab.bme.gatech.edu/Research/BioinformaticTools/prognos.html

ZiFiT Targeter software (TALEN/ZNF/CRISPR support): http://zifit.partners.org/ZiFiT/

COSMID: https://crispr.bme.gatech.edu/

CRISPY (specific for CHO cells): http://staff.biosustain.dtu.dk/laeb/crispy/

Reference:

Naito Y., K. Hino, H. Bono & K. Ui-Tei (2014). CRISPRdirect: software for designing CRISPR/Cas guide RNA with reduced off-target sites, Bioinformatics, DOI: http://dx.doi.org/10.1093/bioinformatics/btu743

Video Tip of the Week: RStudio as an interface for using R

Although typically we focus on databases and algorithms in use in bioinformatics and genomics, there are some other tools that support this work that are crucial as well. The statistical software and computing tools associated with R fall into this category. Increasingly RStudio is being adopted by folks in genomics, and although we talked about R in the past, I hadn’t highlighted the RStudio interface before. But this really lowered the barrier to entry, and has changed the way to use R effectively, and it’s time to include this in our Video Tips of the Week.

In a previous tip we highlighted some training on R that was delivered in a webinar, by Heather Merk of Ohio State. So if you need an overall Introduction to R Statistical Software, that’s a good place to start. When you are ready to begin to work with R, though, you should consider trying out RStudio.

This overview video will demonstrate the basics of the interface for RStudio.

RStudio Overview – 1:30 from RStudio, Inc. on Vimeo.

There’s more detail on many of the features of RStudio that they provide as well. And their Vimeo channel has a few more videos as well. Another thing about using RStudio is that there’s increasingly additional types of support coming from that front. A popular tip we did was on Slidify to make sides directly from RStudio.

RStudio is not just for genomics, though–it’s widely used in many fields that engage in statistical analysis. I was surprised to not find a lot of references to it in PubMed yet–some guidance and explainers in biotech, but I know it’s being widely used. You can see a lot of examples in use in Google Scholar. This includes several enthusiastic uses of RStudio in teaching situation: An Attractive Template of a Reproducible Data Analysis Document for an Awesome Class Project; and Teaching precursors to data science in introductory and second courses in statistics. I did find reference to a software review in an economics publication. And you can get a book to help if that’s how you like to learn more as well.

But if you haven’t had a chance to check out RStudio yet, I’d recommend it.

Quick links:

RStudio: http://www.rstudio.com/

R: http://www.r-project.org/

RSeek: an R-specific search engine http://www.rseek.org (hat tip Elana Fertig’s handy intro slide deck)

References:

Gandrud, Christopher. Reproducible Research with R and R Studio. CRC Press, 2013.

Racine J.S. (2011). RStudio: A Platform-Independent IDE for R and Sweave, Journal of Applied Econometrics, 27 (1) 167-172. DOI: http://dx.doi.org/10.1002/jae.1278

Fertig, E. (2012) Getting Started in R.