Category Archives: Tip of the Week

Video Tip of the Week: Beacon, to locate genome variants of potential clinical significance

This week’s Video Tip of the Week follows on last week’s chatter about the Internet of DNA. As I mentioned then, the Beacon tool we touched on was going to get more coverage. So this week’s video is provided by the Beacon team, part of the larger Global Alliance for Genomics and Health project (GA4GH).

I’ve touched on some of the GA4GH work in the past. I heard more about a very interesting piece of it from David Haussler at the recent TRICON meeting.

D. Haussler, slide from TRICON talk.

D. Haussler, slide from TRICON talk.

The talk was called “Stable Reference Structures for Human Genome Analysis” and it was important for me to see this. I’ve been wrestling with some of the literature (linked below) that describes ways to represent genome variations among massive numbers of humans. It really helped me to hear it described and shown as cartoons on slides that were less like equations. And how this will play out in graphs and visualizations with software tools is of particular interest to me.

So one branch of the Data Working Group of the GA4GH is tasked with how to represent the variations as multiple paths as graphs, instead of the one linear reference genome we think of today. It has to accommodate many types of variations–inversions, deletions, duplications, as well as just SNPs. So, as the kids say today, it’s complicated. But we have to figure it out. Stay tuned, I’m sure we’ll be talking more about this in the years to come.

beacon-icon

Beacon is like SETI for genome variations.

Another branch of this project is tasked with trying to figure out how to share genomic data among all the international producers of this data. If we can’t share the data, we won’t be able to look at the variations among humans and learn from them, nevermind display them. This has additional layers of social and legal complexity we are just beginning to face. As a first pass at sharing this data, a “Beacon” system has been implemented to help researchers locate variations of interest to them.

You should read up on the whole Beacon philosophy and see its current implementation at their site. From what I gather, it is a minimal way to share genome information, without incurring privacy and consent barriers that might be hit if you were pulling down a whole genome. You can query any site that implements a Beacon to ask: do you have a variation at this position? And the Beacons can respond with “yes” or “no”. If there are useful variations, you can then pursue them from there, and if you need access to more you can go through the channels then. But at least you’ve possibly found some needles in some haystacks that you might not have known about otherwise.

The Beacon team has done a short video explaining this. It has no audio, just explanatory text with the graphics. Marc Fiume gave me permission to embed it here.

The “Beacon of Beacons” aggregates the query to send it out to all the known Beacons. You can use it today to search for this kind of data. The video also notes that you can cloak the name of the institution to protect patient privacy.

I have been more acutely concerned about genomic privacy issues than some of my cohorts in this arena. And I fully accept that there will not be privacy–what I want is protection from misuse of the information, which I find lacking in the US legal framework right now. That said, I think that Beacon is a nice work-around for that. If I had a variant of concern, I could ping these other sites to see if others had it. Or vice-versa. But the framework under which the donor of that material provided the data would not be pierced. This makes total sense to me, and I can accept this strategy.

Sharing the genomic data from sequenced individuals is going to be tricky and complex. But I’m keen to see the GA4GH group tackle it. I like several of the directions that I’ve seen so far. But right now–check out Beacon. Implement one if you have this kind of data, and let’s see if it works.

Quick links:

Global Alliance for Genomics and Health: http://genomicsandhealth.org/

Beacon (project details page): http://ga4gh.org/#/beacon

Beacon of Beacons (where you would do a search): http://ga4gh.org/#/beacon/bob

References:

Nguyen N., Glenn Hickey, Daniel R. Zerbino, Brian Raney, Dent Earl, Joel Armstrong, W. James Kent, David Haussler & Benedict Paten (2015). Building a Pan-Genome Reference for a Population, Journal of Computational Biology, 150107093755006. DOI: http://dx.doi.org/10.1089/cmb.2014.0146

During David Haussler’s talk, he also referenced these papers:

Video Tip of the Week: CRISPRdirect for editing tools and off-target information

Great RCSB PDB molecule-of-the-month page on CRISPR

Great RCSB PDB molecule-of-the-month page on CRISPR

Genome editing strategies are certainly a hot topic of late. We were astonished at the traffic that the animation of the CRISPR/Cas-9 process recently drew to the blog. There’s a huge amount of potential for novel types of studies and interventions in human disease situations–but I’m already seeing applications in agriculture coming along. There’s an edited canola available in Canada already. China has edited wheat for disease resistance. There’s a project underway to remove horns from cattle–by merely snipping out a bit of sequence with TALENs/ZNF strategies. They’ve already created cattle with edited myostatin too.

To accompany this work, new software tools have been developed to help design target sequences and evaluate potential off-target situations. Both TALEN target software tools exist, and CRISPR tools exist. For this post I’ll be focusing on just one of the CRISPR tools, but I’ll list a few others as well. Some sites have incorporated both options in their software tools. Some will have a small range of species, some have larger sets. So part of choosing a tool is asking about the genomes it supports. In future Tips we may explore some of the others. There is something of a flood of these tools coming along, and I’ll continue to explore them.

This week’s focus is CRISPRdirect. A Japanese group has created this tool for generating a guide sequence and for evaluating potential off-target activity. This introductory video (with music, and with English annotations to convey the features) will give you an overview of the functions.

It seems to be an easy-to-use interface, with effective organization of the results. They have a nice range of species to examine–not only some of the mammalian genomes, but fish, chicken, worm, plants, and yeast too. There’s a graphical viewing component and an easy export option as well.

So I’ve come across a few tools in my search, but if you have favorites please feel free to add them below in the comments. I’m going to continue to look into these tools and will be looking to highlight others in the future.

Quick link:

CRISPRdirect: http://crispr.dbcls.jp/

A few links to other tools I’ve been looking at:

E-TALEN: http://www.e-talen.org/E-TALEN/

E-CRISP: http://www.e-crisp.org/E-CRISP/

TAL Effector Nucleotide Targeter 2.0: https://tale-nt.cac.cornell.edu/

Prognos: http://baolab.bme.gatech.edu/Research/BioinformaticTools/prognos.html

ZiFiT Targeter software (TALEN/ZNF/CRISPR support): http://zifit.partners.org/ZiFiT/

COSMID: https://crispr.bme.gatech.edu/

CRISPY (specific for CHO cells): http://staff.biosustain.dtu.dk/laeb/crispy/

Reference:

Naito Y., K. Hino, H. Bono & K. Ui-Tei (2014). CRISPRdirect: software for designing CRISPR/Cas guide RNA with reduced off-target sites, Bioinformatics, DOI: http://dx.doi.org/10.1093/bioinformatics/btu743

Video Tip of the Week: RStudio as an interface for using R

Although typically we focus on databases and algorithms in use in bioinformatics and genomics, there are some other tools that support this work that are crucial as well. The statistical software and computing tools associated with R fall into this category. Increasingly RStudio is being adopted by folks in genomics, and although we talked about R in the past, I hadn’t highlighted the RStudio interface before. But this really lowered the barrier to entry, and has changed the way to use R effectively, and it’s time to include this in our Video Tips of the Week.

In a previous tip we highlighted some training on R that was delivered in a webinar, by Heather Merk of Ohio State. So if you need an overall Introduction to R Statistical Software, that’s a good place to start. When you are ready to begin to work with R, though, you should consider trying out RStudio.

This overview video will demonstrate the basics of the interface for RStudio.

RStudio Overview – 1:30 from RStudio, Inc. on Vimeo.

There’s more detail on many of the features of RStudio that they provide as well. And their Vimeo channel has a few more videos as well. Another thing about using RStudio is that there’s increasingly additional types of support coming from that front. A popular tip we did was on Slidify to make sides directly from RStudio.

RStudio is not just for genomics, though–it’s widely used in many fields that engage in statistical analysis. I was surprised to not find a lot of references to it in PubMed yet–some guidance and explainers in biotech, but I know it’s being widely used. You can see a lot of examples in use in Google Scholar. This includes several enthusiastic uses of RStudio in teaching situation: An Attractive Template of a Reproducible Data Analysis Document for an Awesome Class Project; and Teaching precursors to data science in introductory and second courses in statistics. I did find reference to a software review in an economics publication. And you can get a book to help if that’s how you like to learn more as well.

But if you haven’t had a chance to check out RStudio yet, I’d recommend it.

Quick links:

RStudio: http://www.rstudio.com/

R: http://www.r-project.org/

RSeek: an R-specific search engine http://www.rseek.org (hat tip Elana Fertig’s handy intro slide deck)

References:

Gandrud, Christopher. Reproducible Research with R and R Studio. CRC Press, 2013.

Racine J.S. (2011). RStudio: A Platform-Independent IDE for R and Sweave, Journal of Applied Econometrics, 27 (1) 167-172. DOI: http://dx.doi.org/10.1002/jae.1278

Fertig, E. (2012) Getting Started in R.

Video Tip of the Week: IntOGen, for Integrative OncoGenomics

When I’m looking for upcoming Tips of the Week, frequently one tool or paper will lead me to looking at other related tools in that sphere. Last week’s tip on COSMIC got me looking through cancer genomics resources, and one of the others that I came across is IntOGen, “Integrative Onco Genomics”.

Their Nature Methods paper has a nice summary of their goals:

The IntOGen-mutations platform (http://www.intogen.org/mutations/) summarizes somatic mutations, genes and pathways involved in tumorigenesis. It identifies and visualizes cancer drivers, analyzing 4,623 exomes from 13 cancer sites. It provides support to cancer researchers, aids the identification of drivers across tumor cohorts and helps rank mutations for better clinical decision-making.

The paper has a nice description of the features that they have incorporated into their interface. They have different mutation-calling workflows that are run, and then scores are derived from them. There is a searchable component–where data from projects like TCGA and ICGC have been assessed with these tools and you can explore the outcomes with their browser function. In this video from their team you can get a sense of how to search and examine the data [no audio, annotated with text for guidance]:

You can also upload samples that you have to examine and compare in their Mutation Analysis component. There are separate additional videos that they offer to help you to perform this type of analysis at their site. Here’s one of them:

I had actually been at their site recently looking at a blog post about mutation diagrams: How to generate mutation distribution and frequency plots? People who read our blog seem to really gravitate towards tools that help them to visualize various features of the research they are doing. Domain drawing, motif representation, and sets are popular. Recently one of the posts I did highlighting a domain and mutation mapper at the cBio Portal got some traction. According to the blog post, the team from IntOGen decided to create a helpful similar tool that they can incorporate in an upcoming release of IntOGen. So it’s great to see new ideas and new tools coming along as well.

The tools they offer look very nice, check them out. And I’m trying not to be jealous of their location and photos as I continue to shovel snow….

Quick links:

IntOGen: http://www.intogen.org

Reference:
Gonzalez-Perez A., Christian Perez-Llamas, Jordi Deu-Pons, David Tamborero, Michael P Schroeder, Alba Jene-Sanz, Alberto Santos & Nuria Lopez-Bigas (2013). IntOGen-mutations identifies cancer drivers across tumor types, Nature Methods, 10 (11) 1081-1082. DOI: http://dx.doi.org/10.1038/nmeth.2642

Video Tip of the Week: COSMIC, Catalogue Of Somatic Mutations In Cancer

When we do workshops at medical centers, one of the most common questions I get is about locating good resources for cancer data. And we’ve talked about some of the large projects, like the ICGC. We’ve talked about ways to stratify data sets, and one example of this was in cancer, using data from The Cancer Genome Atlas.  Going forward, the ability to rapidly sequence normal vs tumor pairs should help us to even more rapidly understand and target tumors. And this will lead to other cases of entirely new leads in some situations.

But one of the really solid tools that I like to be sure to highlight for people is the COSMIC collection. It’s not new–it’s been around for a decade now. But it’s one of those types of core data resources that people really need to know about. Their long experience, their high quality curation, and their adaptations to new influxes of data volumes and data types, make them a really valuable source of information.

Reading their update paper in the 2015 NAR Database issue, I wanted to go over and refresh my memory of the features I knew, and explore some of the newer features too. There really is some serious depth over there, and I can’t touch on all of the aspects that they have in a blog post like this. But I also discovered that they’ve recently provided a number of videos to help people learn about the various tools and options.

For this week’s Video Tip of the Week, I’ll include their “overview” piece. But you should check out their Tutorials page for additional topics as well.

One feature that I hadn’t realized is that they offer was a Genome Browser using the JBrowse framework.  There’s a separate video with some guidance on how to use that.

Their future directions section in the paper makes it clear they are preparing to be able to handle the incoming data on this topic. And they are evaluation new tools and analyses that may be appropriate. But they commit to maintaining their strong emphasis on curation–which is music to my ears. I think quality hand curation is simultaneously undervalued by end users (and sadly by funders), while being entirely critical to handling all the big data that’s coming. So get familiar with COSMIC for cancer genomics data. It will be worth you time.

Quick link:

COSMIC: http://cancer.sanger.ac.uk/

Reference:

Forbes S.A., D. Beare, P. Gunasekaran, K. Leung, N. Bindal, H. Boutselakis, M. Ding, S. Bamford, C. Cole, S. Ward & C. Y. Kok & (2014). COSMIC: exploring the world’s knowledge of somatic mutations in human cancer, Nucleic Acids Research, 43 (D1) D805-D811. DOI: http://dx.doi.org/10.1093/nar/gku1075

Video Tip of the Week: Helium plant pedigree software, because “Plants are weird.”

A lot of people find our blog by searching for “pedigree” tools. We’ve covered them in the past, and we’ve got some training on the Madeline 2.0 web tools that we like. We have groused about the fact that some pedigree tools do not accommodate same-sex families. Largely focused on human relationships, there are a variety of options.

Another branch of this type of software is animal colony management software. This can be used to track animals in breeding situations. We’ve highlighted The Jackson Lab’s Mouse Colony Management Software, and we see a lot people going over to take a look. But there are other types of breeding software out there too.

Plant pedigrees are a special challenge, though. Although I did begin to look into that software at one point, I hadn’t looked again for a while. So when I saw the announcement about an upcoming talk at the  Bio-IT World conference, I thought it was time to look again. Helium was new to me, and I admit I laughed out loud at my first introduction to it:

BioVis 2013: Poster: Evaluation of Helium: Visualization of Large Scale Plant Pedigrees from VGTCommunity on Vimeo.

“Ok, so, plants are weird….” Best poster intro I’ve heard.

But really, the potential complexity of plant breeding pedigrees is much more daunting than even tricky human pedigrees. Their paper on the Helium efforts (linked below) describes some of those aspects in more detail:

Firstly, the named entities in plant pedigrees may, but not always, represent a population of genetically identical individuals, not a single plant. While it is relatively simple to grow many plants from seed, potentially many decades after production, in humans and animals this is understandably not the norm. The generation of these genetically identical (homozygous) varieties is possible through doubled haploidy, inbreeding, or crossing of pairs of inbred lines to achieve what is termed an F1 hybrid. Successive inbreeding by self-pollination of these F1 generation plants leads to individual plants that are close to homozygous across all alleles.

There are no standards for plant pedigrees yet, I learned from this paper. Zoiks! Well, I guess that gives them free rein to design something that users want. The folks on the Helium project got a bunch of potential users, asked them what they needed, what worked, what didn’t work, and they are building a nice looking tool with the specs they got. Their paper goes on to describe their paper prototyping, the feedback, and other interactions they got further downstream in the process. It’s a nice example of how to get some direction from the likely end users.

Another video offers a bit longer view on their software, but there’s no audio (below). The most detailed video is the one attached to their paper in the supplemental files, but I can’t embed it. Go over there to download and watch that, with captions about what’s happening.

I wasn’t able to find any downloadable software yet to kick the tires myself. And because of the blizzard I’m worried I won’t have power for the next few days to check it out. But from what I can see and read in the paper, it looks promising and I’m eager to try it out at some point. Looking forward to Jessie Kennedy’s talk.

Quick link:

Helium project page: http://ics.hutton.ac.uk/helium/

Best intro video version, with explanation captions: http://www.biomedcentral.com/1471-2105/15/259/additional

This is the item that caught my eye, via email. I’m going to be at Bio-IT World, so I’m hoping to be able to see this presented live.

Dr. Jessie Kennedy to Deliver Keynote Presentation on Visualization Tools Designed for Biologists at 2015 Bio-IT World Conference, as part of the Data Visualization and Exploration Tools Track.

Jessie KennedyKeynote Presentation: Pedigree Visualization in Genomics
Jessie Kennedy, Ph.D., Professor & Director, Institute for Informatics and Digital Innovation, Edinburgh Napier University Most visualizations that display pedigree structure for genetic research have been designed to deal with human family trees. Animal and plant breeders study the inheritance of genetic markers in pedigrees to identify regions of the genome that contain genes controlling traits of economic benefit and, ultimately, to improve the quality of animal and plant breeding programs. However, due to the size and nature of plant and animal pedigree structures, human pedigree visualizations tools are unsuitable for use in studying animal and plant genotype data. We discuss two visualization tools, VIPER (designed for cleaning genotyping errors in animal pedigree genotype datasets) and Helium (designed to visualize the transmission of alleles encoding traits and characteristics of agricultural importance in a plant pedigree-based framework), and show how they support the work of biologists.

Early Registration Rates Available Now!
Register by January 30 to Save up to $400

 

Reference:

Shaw P.D., Martin Graham, Jessie Kennedy, Iain Milne & David F Marshall (2014). Helium: visualization of large scale plant pedigrees, BMC Bioinformatics, 15 (1) 259. DOI: http://dx.doi.org/10.1186/1471-2105-15-259

Note: OpenHelix is a part of Cambridge Healthtech Institute.

Video Tip of the Week: GWATCH, for flying over chromosomes

Ok, so it’s not *just* for flying over chromosomes. There’s more to it, of course. But that’s the part of GWATCH (Genome-Wide Association Tracks Chromosome Highway) that caught my attention. I’m always looking for different ideas and strategies to visualize data, and this was the first time I drove along the whole length of a human Chromosome 9 highway, seeing the various SNPs along the way.

A post on Google+ pointed me to the GWATCH paper and software, so hat tip to Taras Oleksyk. And I was pleased to see that they’ve done a video explaining their project and demonstrating the software, so that will be this week’s Tip of the Week.

It’s not the first time I’ve seen a 3D representation of SNPs. I remember seeing that from GeneSNPs in the past. But GeneSNPs visual option was a way to look at the features within a single gene–you could seen introns, exons, and choose to view SNPs by features like “non-synonymous”, and you could examine the frequency. It was an interesting way to combine a lot of data, but captured only one limited region. GWATCH goes much wider than that, letting you scan along whole chromosomes for patterns. That said–it would be very cool to have those features, and maybe a pointer to possible promoter regions, along the roadway as well. At first I didn’t notice the gene symbol track–er sidewalk?–along the edge of the view. But seemed to me you could add more sidewalk, a bike lane….Of course, then I want to add a domain bypass….Anyway–it’s got me thinking about ways to explore.

And I’ve focused on that unusual “moving browser” for this post, but there’s more to the tool that that. There are other ways to slice the data in 2D that can be helpful for your analyses. And it’s not limited to GWAS data either. But you can see more about that in both the video and it’s covered in their paper. So explore GWATCH more from their site, and you can load up their sample data and take it for a spin. You go to the site and click on the “Active Datasets” to see the ones they’ve provided. Open one, click on the “Highway Chromosome Browser” to select one. But you can also see the other types of tools they have from there.

Quick links:

GWATCH: http://gen-watch.org/ for taking it for a spin

Reference:

Svitin A., Sergey Malov, Nikolay Cherkasov, Paul Geerts, Mikhail Rotkevich, Pavel Dobrynin, Andrey Shevchenko, Li Guan, Jennifer Troyer, Sher Hendrickson & Holli Dilks & (2014). GWATCH: a web platform for automated gene association discovery analysis, GigaScience, 3 (1) 18. DOI: http://dx.doi.org/10.1186/2047-217x-3-18

Video Tip of the Week: Genome assemblers and #Docker

Last fall there was a tip I did on Docker, which was starting to pick up a lot of chatter around the genoscenti. It was starting to look like a good solution for some of the problems of reproducibility and re-use of software in genomics–containerize it. Box it up, hand it off. There’s certainly a lot of interest and appeal in the community, but there are still some issues to resolve with rolling out Docker everywhere. However, my impression is that the Docker team and community seems interested and active in evolving the tools to be as broadly useful as possible.

So when this tweet rolled through the #bioinformatics twitter column on my Tweetdeck, I was excited to see this talk by Michael Barton (who has the best twitter handle in the field: @bioinformatics). It’s a terrific example of how Docker can be aimed at some of the problems in the bioinformatics tool space. It’s not the only option, or course. Some workflow resources like Galaxy can cover other features of genomics researchers’ needs. But as a general solution to the problems of comparing software and distributing complete working containers, Docker seems to developing into a very useful strategy.

Here’s the video:

Although this is longer than our typical “tips”, I’d recommend that you carve out some time to watch if you are new to the idea of Docker. In case you don’t have time right now for the talk, here’s a summary. For the first 10 minutes, there’s a gentle introduction for non-genomics nerds about what sequencing is like right now. Then Michael describes how the assembler literature works–with completing claims about the “better” assembler as each new paper comes along. This includes a sample of the types of problems that assemblers are trying to tackle with different strategies.

Around 14min, we begin to look at what it’s like to be the researcher who needs to access some assembler software. Then he describes how different lab groups–like remote islands–can instantly ship their sequence data around today. But that biologists are like “longshoremen for data”: they have to unload, unpack, install, try to get all the right pieces together to make it work in a new lab. We are doing “break bulk” science right now. That was a really terrific assessment of the state of play, I thought.

If you are ok with the other pieces, you can skip to around 16min, where we get to know about a specific example of the benefits of Docker for this type of research. Michael goes on to describe how Docker has helped him to build a system to catalog and evaluate various assemblers. He developed the project called nucleotid.es (pronounced just as “nucleotides”),  which he goes on to describe. It offers details about various assemblers, which have been put into containers that are easy to access and to use to compare different software. There are examples of benchmarks, but you can also use these containers for your own assembly purposes. You can explore the site for more detail and a lot of data on the assembler comparisons that they have already. A good overview of the reasons to do this can also be found in the blog post over there:  Why use containers for scientific software?

At about 25min, some of the constraints and problems they are noted. Fitting Docker into existing infrastructure, and incentivising developers to create Docker containers, can be issues.  But the outcomes–having a better strategy than traditional publication for reproducibility, having ongoing access to the software, and the “deduplication of agony” seems to be worth investigating, for sure. deduplication_of_agony Then Barton describes what the pipeline could look like for a researcher with some new sequence–you can use the data from a variety of assemblers to make decisions about how to proceed, rather than sifting through papers or just using what the lab next door did. And if you have a new assembler, you can use this setup to benchmark it as well.

So if you’ve been hearing about Docker, and have been concerned about access and reproducibility issues around genomics data and software, have a look at this video. It nicely presents the problems we face, and one possible solution, with a concrete example. There may be other useful methods as well–like offering a central portal for uses to access multiple tools, like AutoAsssemblyD has described–but that’s really for a different subset of users. But for the more general problem of software comparisons, benchmarking, and access to bioinformatics tools, Docker seems to offer a useful strategy. And I did a quick PubMed check to see if Docker is percolating through the traditional publication system yet, and found that it is. I found that ballaxy (“a Galaxy-based workflow toolkit for structural bioinformatics”) is offered as a Docker image, which means that having a grasp of Docker going forward may really be useful for software users rather quickly….

Quick links:

nucleotid.es: http://nucleotid.es

Docker: http://www.docker.com

References (and in this case the slide deck):



And other useful and related items from this post:

Automating the Selection Process for a Genome Assembler, JGI Science Highlights. October 17, 2014. http://jgi.doe.gov/automating-selection-process-genome-assembler/

Veras A., Pablo de Sá, Vasco Azevedo, Artur Silva, Rommel Ramos, Institute of Biological Sciences, Federal University Pará, Belém, Pará & Brazil (2013). AutoAssemblyD: a graphical user interface system for several genome assemblers, Bioinformation, 9 (16) 840-841. DOI: http://dx.doi.org/10.6026/97320630009840

Hildebrandt A.K.,  D. Stockel, N. M. Fischer, L. de la Garza, J. Kruger, S. Nickels, M. Rottig, C. Scharfe, M. Schumann, P. Thiel & H.-P. Lenhof & (2014). ballaxy: web services for structural bioinformatics, Bioinformatics, 31 (1) 121-122. DOI: http://dx.doi.org/10.1093/bioinformatics/btu574

Video Tip of the Week: PhosphoSitePlus, protein post-translational modifications

Nucleotide sequence data and analysis commands the bulk of my attention on most days. But certainly post-translational modification of proteins has a lot of influence on the ultimate function (or dysfunction, in some cases) of the genes in play in a given situation. A recent paper reminded me of a resource that I’ve known about for a long time, but I was pleased to have a fresh look at, PhosphoSitePlus. Although it has phosphorylation in the name, it’s broader than that–hence the “Plus”, I imagine.

PhosphoSitePlus® (PSP) is a publicly-accessible web-based portal offering detailed information on post-translational modifications (PTM) of proteins. The PTMs include various types of modifications, not only phosphorylation, but also methylation, acetylation, glycosylation, caspase cleavage, and ubiquitination, among others. There’s a helpful summary on the landing page of the numbers and types of PTMs.

phosphosite_mods

The information comes from high quality curation of literature (low-throughput, LTP) as well as from many high-throughput (HTP) studies. They have been building this resource over many years, and have been refining the collection over that time. In fact, they re-examined some of the older data they had and re-evaluated the quality to improve their collection. So it is actively being built and maintained.

They have a nice video tutorial, which is this week’s Video Tip of the Week. But it’s hosted on their site and I can’t locate an embed feature, so you will have to go over there to have a look. Here’s an image of it below. It has my favorite structure: overview, intro, advanced, and examples. I thought it was a helpful walk through the types of information you can get from their site. PhosphoSite Tutorial screen cap You can navigate to different sections with their menu, or you can close that menu out of the way as the material proceeds. In under 20 minutes you’ll have a great grasp of the features of the site.

In the new paper they reference some features that weren’t in their prior tutorial. Special emphasis on mutations and variations that affect modification sites are now included in their PTMVar collection. This is one of the newer features described in the paper, and this component of their collection offers a look at missense mutations that can impact post-translational modification aspects. This is particularly helpful as we get more sequence information from individuals, and we may come across some that affect PTM sites. The new paper provides details on the sources of this information, which includes cancer resources and OMIM, as well as UniProt.

I also found their Motif Analysis Tool quite handy. On the homepage in the Downloads and Applications area–check out the options. It will let you enter your sequences to analyze and deliver a Sequence Logo if you would like one. Again, there’s more details and nice examples in the paper of the logos. There’s also the option of downloading a Cytoscape plug-in.

So check out PhosphoSitePlus for knowledge about post-translational modifications on proteins you are interested in, and further details on motifs and pathways that are involved.

Quick links:

PhosphoSitePlus®: http://www.phosphosite.org/

PhosphoSitePlus® tutorial: http://www.phosphosite.org/staticTrainingTutorial.do

Reference:

Hornbeck P.V., B. Zhang, B. Murray, J. M. Kornhauser, V. Latham & E. Skrzypek (2014). PhosphoSitePlus, 2014: mutations, PTMs and recalibrations, Nucleic Acids Research, DOI: http://dx.doi.org/10.1093/nar/gku1267

Hornbeck P.V.,  J. M. Kornhauser, S. Tkachev, B. Zhang, E. Skrzypek, B. Murray, V. Latham & M. Sullivan (2011). PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse, Nucleic Acids Research, 40 (D1) D261-D270. DOI: http://dx.doi.org/10.1093/nar/gkr1122

Video Tips of the Week, Annual Review 2014 (part 2)

As you may know, we’ve been doing these video tips-of-the-week for seven years now. We have completed or collected around 350 little tidbit introductions to various resources through this past year, 2014. At first we had to do all of our own video intros, but as the movie technology became more accessible and more teams made their own, we were able to find a lot more that were done by the resource providers themselves. So we began to collect those as well. At the end of the year we’ve established a sort of holiday tradition: we are doing a summary post to collect them all. If you have missed any of them it’s a great way to have a quick look at what might be useful to your work.

You can see past years’ tips here: 2008 I, 2008 II, 2009 I, 2009 II, 2010 I, 2010 II, 2011 I, 2011 II, 2012 I, 2012 II, 2013 I, 2013 II, 2014 I.

July
July 2: NCBI Variation Viewer
July 9: Google Genomics, API and GAbrowse
July 16: VectorBase, for invertebrate vectors of human pathogens
July 23: Nowomics, set up alert feeds for new data
July 30: PhenDisco, “phenotype discoverer” for dbGap data

August
August 6: Biodalliance browser with HiSeq X-Ten data
August 13: EpiViz Genome Browsing (and more)
August 20: Immune Epitope DB (IEDB)
August 27: Phenoscape, captures phenotype data across taxa

September
September 3: NIH 3D Print Exchange
September 10: #Docker, shipping containers for software and data
September 17: GOLD, Genomes OnLine Database
September 24: StratomeX for genomic stratification of diseases

October
October 1: MEGA, Molecular Evolutionary Genetics Analysis
October 8: UCSC #Ebola Genome Portal
October 15: MedGen, GTR, and ClinVar
October 22: SeqMonk
October 29: PaleoBioDB, for your paleobiology searches

November
November 5: Genome Browser in a Box
November 12: UpSet about genomics Venn Diagrams?
November 19: GeneFriends
November 26: Thanksgiving week, light posting. One holiday genome (cranberries).

December
December 3: Video Tip of the Week: BioLayout Express3D for network visualizations
December 10: Video Tip of the Week: “Virtually Immune” computational immune system modeling
December 17: Video Tip of the Week: yEd Graph Editor for visualizing pathways and networks
December 24: Video Tips of the Week, Annual Review 2014 (part 1)
December 31: [this post]