Video tip of the week: VarSifter for identifying key sequence variations
Recently many of the bioinformatics tweeps I follow were excited about the tool called VarSifter. Here’s the notice that I saw:
I just had a chance to watch the video, and now I can see why they were impressed! Over the years in the workshops we do, people have asked questions in various theme groups. For a while it was lists of genes and microarrays. Then it was known SNP variations. Then it became transcription factor binding sites. Lately it’s been: I have a giant set of sequence data that I need to process to find new variants that might impact genes. How do I do that? This video tip-of-the-week will help you to understand how to do that.
In this video that was part of a day of lectures at the NHGRI about how to deal with exome sequencing data: Next-Gen 101: Video Tutorial on Conducting Whole-Exome Sequencing Research . There is a whole series of video and slide material available from NHGRI’s page. And the one I’m highlighting here is number 3 on that list. Be sure to download the slides if you want to take notes, and access the references and URLs that are key to the material.
Jamie Teer gives a terrific talk about dealing with the exome sequence data output that next-gen projects are yielding. It starts with just managing and viewing the reads, and he highlights a couple of different ways to do this. It includes SAMtools, and also showing how they look in both UCSC Genome Browser and in the Broad’s Integrative Genomics Viewer, IGV. It’s nice to see a comparison of these to illustrate what you might expect to see. We could help you to understand how to load this kind of data as custom tracks in the UCSC Genome Browser with our advanced tutorial, and you’ll find some nice guidance on what to expect from IGV from the paper listed below in the references area.
The video also describes annotation software that helps you to identify where the variations and consequences are in the data. Many of these tools we have talked about either in our tutorials or our other tips-of-the-week.
He also describes how people generate pipelines to flow the data through a series of steps to do the analysis. Sometimes these are home-made programs used by a local group. But he also mentioned how Galaxy can help to accomplish this now. We’ve been fans of Galaxy for a long time, and we know people are using it in exactly this manner.
You still should have a basic understanding of all the tools individually if you want to use them all, or tools that incorporate them all into workflows/processes, though. It will help you to create better workflows/pipelines. And it also matters that you know what you aren’t seeing/using.
Teer closes by introducing the VarSifter software that he’s been involved with creating. This software is freely available for you to download at the VarSifter site. Usually we prefer to highlight web-based interfaces, but there isn’t one for VarSifter. But if you see the utility in it you can also try to get a local copy set up for yourself. VarSifter will help you to view, sort, and filter variants in a lot of ways.
So have a look at this video if you are interested in understanding how these analyses are done, and if you are interested in knowing more about the tools that can be used. It’s worth the 40 minutes–really.
YouTube page: http://www.youtube.com/watch?v=I7azpqTWFuM
VarSifter home page: http://research.nhgri.nih.gov/software/VarSifter/
Exome analysis Talks at NHGRI: http://www.genome.gov/27545880
IGV: Robinson, J., Thorvaldsdóttir, H., Winckler, W., Guttman, M., Lander, E., Getz, G., & Mesirov, J. (2011). Integrative genomics viewer Nature Biotechnology, 29 (1), 24-26 DOI: 10.1038/nbt.1754
UCSC new paper: Dreszer, T., Karolchik, D., Zweig, A., Hinrichs, A., Raney, B., Kuhn, R., Meyer, L., Wong, M., Sloan, C., Rosenbloom, K., Roe, G., Rhead, B., Pohl, A., Malladi, V., Li, C., Learned, K., Kirkup, V., Hsu, F., Harte, R., Guruvadoo, L., Goldman, M., Giardine, B., Fujita, P., Diekhans, M., Cline, M., Clawson, H., Barber, G., Haussler, D., & James Kent, W. (2011). The UCSC Genome Browser database: extensions and updates 2011 Nucleic Acids Research DOI: 10.1093/nar/gkr1055
SAMtools: Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R., & , . (2009). The Sequence Alignment/Map format and SAMtools Bioinformatics, 25 (16), 2078-2079 DOI: 10.1093/bioinformatics/btp352