What’s the Answer? (zoomable browser)

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

This week’s highlighted question hits on something I’ve mused about before. We could really use some more lightweight browsing tools that are more consumer-level appropriate. Although I think there are some tools to accomplish various different things that sophisticated end users could work with. But there’s still a gap, which I still think would be a great project for some student team.

The extra wrinkle on this question, though, is it needs to be able to run without an internet connection.

Question: Any good offline zoomable genome browsers?

More as an educational than a research tool, I’d like to give a presentation on human genetics using a human genome browser that can start at the single-base level, and zoom (smoothly if possible) out to the entire chromosome. The catch is that I won’t have internet access when giving the presentation. I’m happy to download reference sequences, annotations, etc. The main things I’d like to display are (at the fine scale) intron/exon positions, and at the wider scale, gene positions, alu/SINE positions, and (if possible) simple gene repeats such as human MW/LW opsins. I like http://chromozoom.org, but it’s hard to get working offline. I find that http://www.biodalliance.org doesn’t have a terribly nice chromosome-wide view. Are there others that people could suggest?

hyanwong

If you have some suggestions, please bring ‘em along. This question pops up on a somewhat regular basis, and we could really use more ideas on this.

OpenHelix to exhibit at TRICON

OpenHelix will be exhibiting at the International Molecular Medicine Tri-Conference (MMTC or TRICON) at booth 129, February 16-18. While onsite at the Tri-Conference, we invite you to demo OpenHelix and not only learn how to enable more effective research, but receive a Starbucks gift card for completing the demo.

The 22nd International Molecular Medicine Tri-Conference is the industry’s Preeminent Event on Molecular Medicine, focusing on Drug Discovery, Genomics, Diagnostics and Information Technology. Spanning six days this year, the Tri-Conference includes an expanded program that includes 6 symposia, over 20 short courses, and 17 conference programs.

OpenHelix provides over 100 tutorial suites on popular and powerful bioinformatics and genomics tools. Each tutorial suites includes 30-60 minute tutorials highlight and explain the features and functionality needed to start using a resource effectively. The tutorial suites also include PowerPoint slides, handouts and exercises to save time and money in teaching others.

An OpenHelix subscription to the tutorials enables quicker and more effective research at your institution through more efficient use of the publicly available tools to access biological data. Join some of the best universities, research institutions, and biotech companies in training scientists on how to use these critical tools.

To schedule your 5 minute demonstration of the OpenHelix site and tutorial suites (and receive a Starbuck’s gift card), email Scott Lathe or call (425) 442-0322. We will be at booth 129 in the Tri-Conference Exhibit Hall.

Video Tip of the Week: Helium plant pedigree software, because “Plants are weird.”

A lot of people find our blog by searching for “pedigree” tools. We’ve covered them in the past, and we’ve got some training on the Madeline 2.0 web tools that we like. We have groused about the fact that some pedigree tools do not accommodate same-sex families. Largely focused on human relationships, there are a variety of options.

Another branch of this type of software is animal colony management software. This can be used to track animals in breeding situations. We’ve highlighted The Jackson Lab’s Mouse Colony Management Software, and we see a lot people going over to take a look. But there are other types of breeding software out there too.

Plant pedigrees are a special challenge, though. Although I did begin to look into that software at one point, I hadn’t looked again for a while. So when I saw the announcement about an upcoming talk at the  Bio-IT World conference, I thought it was time to look again. Helium was new to me, and I admit I laughed out loud at my first introduction to it:

BioVis 2013: Poster: Evaluation of Helium: Visualization of Large Scale Plant Pedigrees from VGTCommunity on Vimeo.

“Ok, so, plants are weird….” Best poster intro I’ve heard.

But really, the potential complexity of plant breeding pedigrees is much more daunting than even tricky human pedigrees. Their paper on the Helium efforts (linked below) describes some of those aspects in more detail:

Firstly, the named entities in plant pedigrees may, but not always, represent a population of genetically identical individuals, not a single plant. While it is relatively simple to grow many plants from seed, potentially many decades after production, in humans and animals this is understandably not the norm. The generation of these genetically identical (homozygous) varieties is possible through doubled haploidy, inbreeding, or crossing of pairs of inbred lines to achieve what is termed an F1 hybrid. Successive inbreeding by self-pollination of these F1 generation plants leads to individual plants that are close to homozygous across all alleles.

There are no standards for plant pedigrees yet, I learned from this paper. Zoiks! Well, I guess that gives them free rein to design something that users want. The folks on the Helium project got a bunch of potential users, asked them what they needed, what worked, what didn’t work, and they are building a nice looking tool with the specs they got. Their paper goes on to describe their paper prototyping, the feedback, and other interactions they got further downstream in the process. It’s a nice example of how to get some direction from the likely end users.

Another video offers a bit longer view on their software, but there’s no audio (below). The most detailed video is the one attached to their paper in the supplemental files, but I can’t embed it. Go over there to download and watch that, with captions about what’s happening.

I wasn’t able to find any downloadable software yet to kick the tires myself. And because of the blizzard I’m worried I won’t have power for the next few days to check it out. But from what I can see and read in the paper, it looks promising and I’m eager to try it out at some point. Looking forward to Jessie Kennedy’s talk.

Quick link:

Helium project page: http://ics.hutton.ac.uk/helium/

Best intro video version, with explanation captions: http://www.biomedcentral.com/1471-2105/15/259/additional

This is the item that caught my eye, via email. I’m going to be at Bio-IT World, so I’m hoping to be able to see this presented live.

Dr. Jessie Kennedy to Deliver Keynote Presentation on Visualization Tools Designed for Biologists at 2015 Bio-IT World Conference, as part of the Data Visualization and Exploration Tools Track.

Jessie KennedyKeynote Presentation: Pedigree Visualization in Genomics
Jessie Kennedy, Ph.D., Professor & Director, Institute for Informatics and Digital Innovation, Edinburgh Napier University Most visualizations that display pedigree structure for genetic research have been designed to deal with human family trees. Animal and plant breeders study the inheritance of genetic markers in pedigrees to identify regions of the genome that contain genes controlling traits of economic benefit and, ultimately, to improve the quality of animal and plant breeding programs. However, due to the size and nature of plant and animal pedigree structures, human pedigree visualizations tools are unsuitable for use in studying animal and plant genotype data. We discuss two visualization tools, VIPER (designed for cleaning genotyping errors in animal pedigree genotype datasets) and Helium (designed to visualize the transmission of alleles encoding traits and characteristics of agricultural importance in a plant pedigree-based framework), and show how they support the work of biologists.

Early Registration Rates Available Now!
Register by January 30 to Save up to $400

 

Reference:

Shaw P.D., Martin Graham, Jessie Kennedy, Iain Milne & David F Marshall (2014). Helium: visualization of large scale plant pedigrees, BMC Bioinformatics, 15 (1) 259. DOI: http://dx.doi.org/10.1186/1471-2105-15-259

Note: OpenHelix is a part of Cambridge Healthtech Institute.

Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

What’s the Answer? (3D structures with mutations)

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

This week’s highlighted question was about visualizing variations in linear graphics as well as in 3D structural representations.

Question: Map genetic mutations to protein domain/structure

I am trying to map genetic mutations to protein domain/structure. Ideally, I want to visualize the variants in linear protein domain diagram and 3D protein structure like the attached images. I did research, but I can’t find good tools/databases for such work.

I know similar questions have been asked here like How To Create Mutation Diagram In R Or In Any Tools?. But it is only the protein domain diagram (with no 3D structure), plus the protein domain annotation there seems to be limited.

Thank you all in advance!

[Graphic over there shows an image of what the original poster wants to visualize]

mittjohns

I had recently talked about Mutation Mapper as another answer to a related question. But at that time I didn’t note that you can also get a 3D structure from there. Glad to see someone mention it as a possible answer on this new question.

Video Tip of the Week: GWATCH, for flying over chromosomes

Ok, so it’s not *just* for flying over chromosomes. There’s more to it, of course. But that’s the part of GWATCH (Genome-Wide Association Tracks Chromosome Highway) that caught my attention. I’m always looking for different ideas and strategies to visualize data, and this was the first time I drove along the whole length of a human Chromosome 9 highway, seeing the various SNPs along the way.

A post on Google+ pointed me to the GWATCH paper and software, so hat tip to Taras Oleksyk. And I was pleased to see that they’ve done a video explaining their project and demonstrating the software, so that will be this week’s Tip of the Week.

It’s not the first time I’ve seen a 3D representation of SNPs. I remember seeing that from GeneSNPs in the past. But GeneSNPs visual option was a way to look at the features within a single gene–you could seen introns, exons, and choose to view SNPs by features like “non-synonymous”, and you could examine the frequency. It was an interesting way to combine a lot of data, but captured only one limited region. GWATCH goes much wider than that, letting you scan along whole chromosomes for patterns. That said–it would be very cool to have those features, and maybe a pointer to possible promoter regions, along the roadway as well. At first I didn’t notice the gene symbol track–er sidewalk?–along the edge of the view. But seemed to me you could add more sidewalk, a bike lane….Of course, then I want to add a domain bypass….Anyway–it’s got me thinking about ways to explore.

And I’ve focused on that unusual “moving browser” for this post, but there’s more to the tool that that. There are other ways to slice the data in 2D that can be helpful for your analyses. And it’s not limited to GWAS data either. But you can see more about that in both the video and it’s covered in their paper. So explore GWATCH more from their site, and you can load up their sample data and take it for a spin. You go to the site and click on the “Active Datasets” to see the ones they’ve provided. Open one, click on the “Highway Chromosome Browser” to select one. But you can also see the other types of tools they have from there.

Quick links:

GWATCH: http://gen-watch.org/ for taking it for a spin

Reference:

Svitin A., Sergey Malov, Nikolay Cherkasov, Paul Geerts, Mikhail Rotkevich, Pavel Dobrynin, Andrey Shevchenko, Li Guan, Jennifer Troyer, Sher Hendrickson & Holli Dilks & (2014). GWATCH: a web platform for automated gene association discovery analysis, GigaScience, 3 (1) 18. DOI: http://dx.doi.org/10.1186/2047-217x-3-18

Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

What’s the Answer? (Docker, actually…)

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

In kind of an amusing pairing to this weeks Tip-of-the-Week on Docker, I was looking out for good questions/discussions at Biostar when I came across this one. The discussion opens with the idea of curating a bunch of “efficient” bioinformatics programs. This is a worthy exercise in time and resources. The conversation flows around then to include Docker, but also to note that Docker isn’t the right thing in every case potentially. So have a look  at how the idea percolated around–a good demonstration of crowdsourcing something useful for the community.

News: Pandora’s Toolbox – a collection of bioinformatics programs

Hello everyone,

I am developing a toolbox with collection of source codes of efficient bioinformatics programs.

They are available under the condition that you cite the individual authors and not Pandora’s toolbox.

My goal is to develop additional code so that we can mix and match them to solve various problems efficiently.

———————————–

github page -
https://github.com/homologus/Pandoras-Toolbox/

Blog posts -
Introducing ‘Pandora’s Toolbox’ and ‘Pandora’s Modules’
http://www.homolog.us/blogs/blog/2015/01/05/introducing-pandoras-toolbox-and-pandoras-modules/

An Update on Pandora’s Toolbox
http://www.homolog.us/blogs/blog/2015/01/08/an-update-on-pandoras-toolbox/
The following programs are currently included in the collection.

[list of stuff in there, go have a look over there for that set]

ugly.betty77

So, in short, more ideas for using Docker in the genomics software community. Jess’ sayin. And a nice coincidence for my blogging this week.

Video Tip of the Week: Genome assemblers and #Docker

Last fall there was a tip I did on Docker, which was starting to pick up a lot of chatter around the genoscenti. It was starting to look like a good solution for some of the problems of reproducibility and re-use of software in genomics–containerize it. Box it up, hand it off. There’s certainly a lot of interest and appeal in the community, but there are still some issues to resolve with rolling out Docker everywhere. However, my impression is that the Docker team and community seems interested and active in evolving the tools to be as broadly useful as possible.

So when this tweet rolled through the #bioinformatics twitter column on my Tweetdeck, I was excited to see this talk by Michael Barton (who has the best twitter handle in the field: @bioinformatics). It’s a terrific example of how Docker can be aimed at some of the problems in the bioinformatics tool space. It’s not the only option, or course. Some workflow resources like Galaxy can cover other features of genomics researchers’ needs. But as a general solution to the problems of comparing software and distributing complete working containers, Docker seems to developing into a very useful strategy.

Here’s the video:

Although this is longer than our typical “tips”, I’d recommend that you carve out some time to watch if you are new to the idea of Docker. In case you don’t have time right now for the talk, here’s a summary. For the first 10 minutes, there’s a gentle introduction for non-genomics nerds about what sequencing is like right now. Then Michael describes how the assembler literature works–with completing claims about the “better” assembler as each new paper comes along. This includes a sample of the types of problems that assemblers are trying to tackle with different strategies.

Around 14min, we begin to look at what it’s like to be the researcher who needs to access some assembler software. Then he describes how different lab groups–like remote islands–can instantly ship their sequence data around today. But that biologists are like “longshoremen for data”: they have to unload, unpack, install, try to get all the right pieces together to make it work in a new lab. We are doing “break bulk” science right now. That was a really terrific assessment of the state of play, I thought.

If you are ok with the other pieces, you can skip to around 16min, where we get to know about a specific example of the benefits of Docker for this type of research. Michael goes on to describe how Docker has helped him to build a system to catalog and evaluate various assemblers. He developed the project called nucleotid.es (pronounced just as “nucleotides”),  which he goes on to describe. It offers details about various assemblers, which have been put into containers that are easy to access and to use to compare different software. There are examples of benchmarks, but you can also use these containers for your own assembly purposes. You can explore the site for more detail and a lot of data on the assembler comparisons that they have already. A good overview of the reasons to do this can also be found in the blog post over there:  Why use containers for scientific software?

At about 25min, some of the constraints and problems they are noted. Fitting Docker into existing infrastructure, and incentivising developers to create Docker containers, can be issues.  But the outcomes–having a better strategy than traditional publication for reproducibility, having ongoing access to the software, and the “deduplication of agony” seems to be worth investigating, for sure. deduplication_of_agony Then Barton describes what the pipeline could look like for a researcher with some new sequence–you can use the data from a variety of assemblers to make decisions about how to proceed, rather than sifting through papers or just using what the lab next door did. And if you have a new assembler, you can use this setup to benchmark it as well.

So if you’ve been hearing about Docker, and have been concerned about access and reproducibility issues around genomics data and software, have a look at this video. It nicely presents the problems we face, and one possible solution, with a concrete example. There may be other useful methods as well–like offering a central portal for uses to access multiple tools, like AutoAsssemblyD has described–but that’s really for a different subset of users. But for the more general problem of software comparisons, benchmarking, and access to bioinformatics tools, Docker seems to offer a useful strategy. And I did a quick PubMed check to see if Docker is percolating through the traditional publication system yet, and found that it is. I found that ballaxy (“a Galaxy-based workflow toolkit for structural bioinformatics”) is offered as a Docker image, which means that having a grasp of Docker going forward may really be useful for software users rather quickly….

Quick links:

nucleotid.es: http://nucleotid.es

Docker: http://www.docker.com

References (and in this case the slide deck):



And other useful and related items from this post:

Automating the Selection Process for a Genome Assembler, JGI Science Highlights. October 17, 2014. http://jgi.doe.gov/automating-selection-process-genome-assembler/

Veras A., Pablo de Sá, Vasco Azevedo, Artur Silva, Rommel Ramos, Institute of Biological Sciences, Federal University Pará, Belém, Pará & Brazil (2013). AutoAssemblyD: a graphical user interface system for several genome assemblers, Bioinformation, 9 (16) 840-841. DOI: http://dx.doi.org/10.6026/97320630009840

Hildebrandt A.K.,  D. Stockel, N. M. Fischer, L. de la Garza, J. Kruger, S. Nickels, M. Rottig, C. Scharfe, M. Schumann, P. Thiel & H.-P. Lenhof & (2014). ballaxy: web services for structural bioinformatics, Bioinformatics, 31 (1) 121-122. DOI: http://dx.doi.org/10.1093/bioinformatics/btu574

How to nearly derail a woman’s career: Mary-Claire King’s BRCA1 project grant

This item was floating around the twitterz this weekend. I can’t remember who pointed it out first, but I didn’t have a chance to get to it then. I was able to look for it later, and it was worth looking for. Mary-Claire King talks about her some career struggles she faced right on the cusp of getting the grant that became the BRCA1 project. It’s a story of some personal agony, persistence, and a nearly magical assist from an unlikely source.