Tag Archives: data visualization

Friday SNPpets

This week’s tips offer some software, including a pre-print from one of my favorite groups–the folks who do great visualizations of sets. I have talked about UpSet before, but now there’s an R package for it. Speaking of great visuals, check out the sponge-microbe symbiosis. And 10 legume genomes. Cannabis as a gateway to plant genomics. And a story of entrepreneurship, and how it ends.

And for kicks, there’s a video that I helped to script-edit that’s been popular: Are GMOs Good or Bad?


SNPpets_2Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…


SNPpets_2

Friday SNPpets

This week’s SNPpets offer a rather eclectic collection. Visualization with PanViz, LepBase for lepidopterans, a simulated data generator, and a new collection of community-curated phylogenetic estimates. But the big noise was the cancer “moonshot” data commons and the clinical trial for NCI-MATCH and precision medicine. Also newborn genome sequencing. Funniest thing: passive-aggresive bioinformatics. Coolest thing: Paul Simon and CRISPR (scroll to the bottom of the list).


SNPpets_2Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…


reddit_icon

What’s The Answer? (software poster advice)

reddit_iconThis week’s highlighted question is from the Bioinformatics subreddit. A simple question about advice for some poster design turned into quite the conversation about what people want from software posters

reddit question icon

need advice for poster about a bioinformatics software
submitted 7 days ago by Elendol

Hi,

I’m a first year PhD student in Bioinformatics. I’m creating a poster for the symposium of the department to present a software I developed.

There is no algorithm to discuss, no discovery, no findings as it’s mainly a data visualisation tool. The public will be biologists and bioinformaticians

I have problems to organise the abstract and the poster, do you have some advices / examples of posters “only” presenting a software. What are the important things to talk about, the things I don’t have to forget ?

I guess no one is really interested to read something about the implementation, software architecture and other developper’s stuff.

Thank you

So the discussion ensued about the role of the poster. And the role of the presenter. Whether to have code or not to have code. I was actually kind of amused at how deep people’s feelings were on some of these matters. Anyway, I thought it was a fun read. Have a look.

Grinstein on dataviz at VIZBI.

Video Tip of the Week: Weave, Web-based Analysis and Visualization Environment

At the recent Discovery On Target conference, a workshop on data and analytics for drug discovery contained several informative talks. This week’s Video Tip of the Week was inspired by the first speaker in that session, Georges Grinstein. Not only was the software he talked about something I wanted to examine right away (Weave)–his philosophy on visualization of data was so in line with my informal thoughts on the topic that I just connected with it immediately. But also–stay for the “living figures” down below.

Grinstein on dataviz at VIZBI.

Grinstein on dataviz at VIZBI.

Grinstein has been working on dataviz for a long time. And he’s been working with big data since long before big data was trendy. For some of his background and philosophy, check out this talk at a VIZBI conference. Because so many of the problems are the same across big data types, the software that he’s been working on could really be useful for the new issues facing big data in biology. But I don’t know that I’ve heard about it among the genoscenti just yet. (In this talk he also covers RadViz, a radial visualization tool that some folks might find useful. It was also mentioned in the workshop.)

One of the key things that he wanted us to take away from the workshop was that we need to offer people multiple, interactive visualizations for them to get the most of out the data. This is something I’ve been looking for quite a bit. I fell in love with an early version of the Caleydo stuff for exactly this reason. But I understand that it can be tricky.

Weave, or the Web-based Analysis and Visualization Environment, gets closer to this with super responsiveness than I’ve seen elsewhere. This week’s Video Tip is a short intro to this platform, but I’ll link you below to a longer form that you should watch if you want to dive into this tool. Here you’ll see that just by dragging a CSV file in, you can then set up a scatter plot, bar chart, parallel coordinates, a color histogram, and a table. In seconds. Really.

This brief intro doesn’t do full justice to this tool, of course. I joined the Weave-users discussion group and found a recent webinar recording that you should watch. But you’ll have to grab it from the group, it doesn’t appear to be stored on a video platform site (search for the thread called IVPR Update on Weave Monday 3/23). It goes into more detail on the features, of course. And sharing data, and reproducibility of the visualizations with the session history options.

I downloaded the Weave Desktop and ran it on my little system. I grabbed some transcription factor score data from the ENCODE project with the UCSC Table Browser, got it in csv format, pulled it in, and within seconds was looking over all the data on the X chromosome for this TFBS I was interested in. Clicking an item in my table highlighted it in my histogram. And that was just to kick the tires. According to the video, you could have had a tile of Cytoscape (because you can integrate with Cytoscape–I didn’t get that far yet though) and checked out interaction data as well. Although I mention Cytoscape because readers here probably know it, that’s just one of the linkable tools. R is embedded, and other stats tools, and you can modify your scripts right from Weave. Some of these additional features may be part of the Analyst Workstation sub-project. I couldn’t always tell which tool had which features in my early explorations.

But if there’s one thing I’d like you to do after reading this post (if you read this far) is look at this paper that is just out. As I was noodling on Weave, I thought to myself that it was PERFECT to create the kind of “living figures” that I want to see in more papers. Now go see Dynamic Data Visualization with Weave and Brain Choropleths. I don’t care if you aren’t interested in brain choropleths–go look at the figures. In each one, there’s a link to a Weave demo, like this:

Weave demo PLOS

Click on those demos to load them. You can be interacting with the data on the brain maps, with pre-set Weave tiles of different features of the data set for you. Open the gears icons to change the settings. Now imagine this with gene expression maps in C. elegans bodies. Or with transcription factors and scores in mouse embryos. Or Venns with big piles of GO terms (but what I really want there is UpSet anyway). Or any of a dozen other types of data we get in big data papers now that are really impossible to explore in traditional publication format. I want this for genomics papers in the future, okay?

This software has a lot of potential for analysis, visualization, and sharing of data. I can’t cover it all in a brief blog post. The Weave team has thought carefully about sharing with colleagues, reusable templates, and provenance of data, and all this is built right into to this tool. If you are analyzing data for others, you can set up dashboards for them to see specific views. See their help and info docs for more details, and check out the longer videos in the forum.  I think it would connect with a lot of people–and could benefit the genomics community greatly. Have a look. I think you’ll like it.

Quick links:

Weave: http://iweave.com/

GitHub: https://github.com/WeaveTeam

Weave-users discussion: https://groups.google.com/forum/#!forum/weave-users

Weave desktop: http://info.oicweave.org/projects/weave/wiki/Installer

More videos, Weave IVPR channel: https://www.youtube.com/channel/UCXJrO9cug3c7B7eRJSwZ4vQ

References:

Patterson, D., Hicks, T., Dufilie, A., Grinstein, G., & Plante, E. (2015). Dynamic Data Visualization with Weave and Brain Choropleths PLOS ONE, 10 (9) DOI: 10.1371/journal.pone.0139453

Daniels, K., Grinstein, G., Russell, A., & Glidden, M. (2012). Properties of normalized radial visualizations Information Visualization, 11 (4), 273-300 DOI: 10.1177/1473871612439357

OpenHelix_logo_2015

Video Tip of the Week: World Tour of Genomics Resources, part II

This week’s tip is not our usual short video. We’ll connect you to our newest tutorial suite, our World Tour of Genomics Resources, part II. Our previous tour was really popular–because as much as bench researchers know about the tools they currently use–everyone realizes there are more tools out there. And many of them don’t realize that there could be some very handy ones for tasks that they have.

This time the tour discusses not only tools for which we have full tutorial suites (video, slides, handouts, exercises), but also a lot of the handy problem-solving tools that we cover in our weekly tips. Things like UpSet for exploring data relationships among sets–which scales way better than Venn diagrams for genomics data sets. Or like Slidify to make slides from RStudio directly. We won’t have full training suites on these, but people will find them really useful in their daily work.

Sometimes we will also add tips about tools for which we have suites, but that have new features. For example, although thousands of people watch our UCSC Genome Browser full trainings, we also have tips that highlight new features or tools that aren’t part of the basic intro–such as new wiggle track features, or the Genome Browser in a Box. So we help people keep current in the field this way, even with existing tools they use.

But still we adhere to our philosophy that we explained in our paper (below). Raising awareness of tools that are out there, and help with how to find and use them effectively. This World Tour illustrates that.

worldtour2_click

Quick links:

New tutorial suite: http://www.openhelix.com/worldtour2

References:
Williams, J., Mangan, M., Perreault-Micale, C., Lathe, S., Sirohi, N., & Lathe, W. (2010). OpenHelix: bioinformatics education outside of a different box Briefings in Bioinformatics, 11 (6), 598-609 DOI: 10.1093/bib/bbq026

Biostars

What’s the Answer? (to Venn or not to Venn)

This week’s highlighted item from Biostars gets back to the visualization challenges that I love to think about. The question posted asked for help for an 11-set Venn diagram. What was funny about the response was that the overwhelming consensus was: please, no! And alternatives were offered.


Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.


Question: Venn diagram with 11 sets

Hi All,

can anyone recommend me a tool/package, which allows me to create Venn diagrams up to 11 sets? The packages which I have found so far can support to create less sets only.

Many thanks!

szuszmok

The question was a frequent problem in various data sets. You want to find the members of groups that overlap in different conditions, treatment situations, genes present or absent in different species, whatever. In the most famous case of Venn illustration, the banana genome team created a much-discussed masterpiece of sets of genes in shared gene families among other plant species. It was so astonishing to look at that it even got Cory Doctorow’s attention: Just look at that banana genome Venn diagram. But genomics Venn diagrams get around. Here’s one that became fashion:

However, part of the problem with the Venn is that it was so difficult to interpret. As a developer of visualization tools told me later, Venns do not scale well for genomics types of data. He was UpSet about genome folks trying to force the data in, and created the very neat UpSet tool to help that: Video Tip of the Week: UpSet about genomics Venn Diagrams?

So I added that to the responses, but there were other suggestions too. Go have a look at the ensuing discussion.

The PhenogramViz team illustrates how they analyze and visualize gene-phenotype relationships

Video Tip of the Week: PhenogramViz for evaluating phenotypes and CNVs

As I’ve mentioned before, once I start looking over some new tools I’m often led to others in the same arena that offer related but different features. That’s what happened when I looked at the Proband iPad app for human pedigrees. I noted that they are using important community standards, and I decided to follow those threads a bit. That led me to last week’s tip, the Human Phenotype Ontology (HPO).

HPO has been around for a while and I’ve been aware of it, but this recent re-investigation made me realize how mature it has become, and I was impressed with the amount of adoption there’s been in the genomics community in the big projects. But it also led me to some new tools that I hadn’t encountered before. This week’s tip highlights PhenogramViz–combining my appreciation for controlled vocabularies, standards, and data visualization.

The PhenogramViz team illustrates how they analyze and visualize gene-phenotype relationships

The PhenogramViz team illustrates how they analyze and visualize gene-phenotype relationships

Here’s now the PhenogramViz team describes their tool:

A tool that automatically analyses and visualizes gene-to-phenotype relations for a set of genes affected by CNV of a patient and a set of HPO-terms representing the symptoms of said patient. The tool makes full use of the cross-species phenotype ontology “uberpheno” (see here).

So if you have a patient with copy-number variation issues in their genome, you may be able to use this tool to lead to the genes in that CNV segment that convey certain phenotypes. So the goal–as stated in their paper linked below–is to assist with the clinical interpretation of the genome alterations.

The additional layer of this effort that I find useful is that they use another ontology to take this even further for supporting information. They employ the “Uberpheno” cross-species phenotype ontology to find further details in model organisms.

I’ll let you get a sense of how this works with one of their tutorial videos from their YouTube channel. They have others too–which will help you with different aspects on everything from installation to analyses. I’ll embed the one that shows how you start with a list of patient symptoms or phenotypes, then loading the CNVs or genes, then from the results list you can simply click for graphical representations of the gene-phenotype relationships. Then with the Cytoscape tools you can interact with the “phenograms” in more detail. There’s no sound, you can read the guidance in the callouts.

The videos include some abbreviations–like HPO. That’s why I talked last week about the Human Phenotype Ontology. I was prepping you for this one.  And in another video (Prioritization of pathogenic CNVs) they reference the scoring strategies, which you will find need further explanation in their paper linked below (Journal of Medical Genetics one). I would spend some time looking over how the scoring and ranking happens to understand what’s shown.

Although the focus of this is using the data for human diagnosis, I think it could also help researchers to choose more appropriate animal model for further testing. There are lots of complaints about the unsuitability of animal models for a range of subjects–but refining those choices would also be a huge benefit. Saving resources by helping to choose the right animal model would be another worthwhile use of this tool.

Check out PhenogramViz as a bridge between genomic segments and possible phenotypes. You can try it yourself with sample files they have available on their landing page.

Quick links:

PhenogramViz: http://compbio.charite.de/contao/index.php/phenoviz.html

Cytoscape: http://cytoscape.org/

References:

Köhler, S., Doelken, S., Mungall, C., Bauer, S., Firth, H., Bailleul-Forestier, I., Black, G., Brown, D., Brudno, M., Campbell, J., FitzPatrick, D., Eppig, J., Jackson, A., Freson, K., Girdea, M., Helbig, I., Hurst, J., Jahn, J., Jackson, L., Kelly, A., Ledbetter, D., Mansour, S., Martin, C., Moss, C., Mumford, A., Ouwehand, W., Park, S., Riggs, E., Scott, R., Sisodiya, S., Vooren, S., Wapner, R., Wilkie, A., Wright, C., Vulto-van Silfhout, A., Leeuw, N., de Vries, B., Washingthon, N., Smith, C., Westerfield, M., Schofield, P., Ruef, B., Gkoutos, G., Haendel, M., Smedley, D., Lewis, S., & Robinson, P. (2013). The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data Nucleic Acids Research, 42 (D1) DOI: 10.1093/nar/gkt1026

Köhler S., Doelken S.C., Ruef B.J., Bauer S., Washington N., Westerfield M., Gkoutos G., Schofield P., Smedley D. & Lewis S.E. & (2013). Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research., F1000Research, PMID: http://www.ncbi.nlm.nih.gov/pubmed/24358873

Köhler, S., Schoeneberg, U., Czeschik, J., Doelken, S., Hehir-Kwa, J., Ibn-Salem, J., Mungall, C., Smedley, D., Haendel, M., & Robinson, P. (2014). Clinical interpretation of CNVs with cross-species phenotype data Journal of Medical Genetics, 51 (11), 766-772 DOI: 10.1136/jmedgenet-2014-102633

Shannon P., Markiel A., Ozier O., Baliga N.S., Wang J.T., Ramage D., Amin N., Schwikowski B. & Ideker T. Cytoscape: a software environment for integrated models of biomolecular interaction networks., Genome research, PMID: http://www.ncbi.nlm.nih.gov/pubmed/14597658

Video Tip of The Week: Jalview for multiple sequence alignment editing and visualization

The multiple sequence alignment editing question recently on our What’s the Answer? feature was popular. We have covered MSA editors in the past, and we include a bit on Jalview in our Clustal tutorial, but I hadn’t revisited them lately. In preparation for that post I specifically looked over at the Jalview site, and I realized that they have recently provided a number of training videos to help people use their tools. So this week’s tip of the week will highlight them.

At the Jalview site, they give this brief description of the features:

Jalview is a free program for multiple sequence alignment editing, visualisation and analysis. Use it to view and edit sequence alignments, analyse them with phylogenetic trees and principal components analysis (PCA) plots and explore molecular structures and annotation.

There are 2 flavors of Jalview. There is a JalviewLite applet you can demo by simply clicking on some examples at their site. Or you can run the Jalview desktop for more features (you can do this from the web or by downloading a local copy). The description on their About page will tell you more about the distinctions. You may also encounter Jalview that’s being incorporated in other tools. Here’s a handy list of those on their Community resources page.

On the Jalview online training Youtube channel, they have a number of videos. Some are general overview, some are specific tasks. For a general overview of what it does, this intro video will help you to decide if it’s a tool that would help you:

If you are ready to try it out, there are some handy tips in this video with more details about actually using the features of the software. It covers basic navigation, understanding the interface layout, working on editing, and good tips for accomplishing things efficiently.

For more of the philosophy and foundations of Jalview, check out their paper (linked below). And check out their other videos to go further.

Quick link:

Jalview: http://www.jalview.org/

Reference:

Waterhouse, A.M., Procter, J.B., Martin, D.M.A, Clamp, M. and Barton, G. J. (2009)
“Jalview Version 2 – a multiple sequence alignment editor and analysis workbench”
Bioinformatics25 (9) 1189-1191 doi: 10.1093/bioinformatics/btp033

Video Tip of the Week: LineUp, data ranking visualization tool

 

Caleydo, from the Institute of Computer Graphics and Vision, a suite of genomics and biomolecular visualization tools. As the project developers state, it’s strength is “the visualization of interdependencies between multiple datasets.” The tip of the week this week is a video introducing one of their newest tools: LineUp.

LineUp is an open source scalable visualization technique for ranking systems that use several disparate ranks. Lineup was developed to

address [the] need to understand the ranking of genes by mutation frequency and other clinical parameters in a group of patients,…It is an ideal tool to create and visualize complex combined scores of bioinformatics algorithms.

Yet, it can be used for many different ranking systems whether that is to view rankings of universities or restaurants, or ranked datasets from from various sources. In the video above, the users explain how to use Lineup to look at and visual the ranking of universities based on several different rankings such as student reputation, student-to-faculty ratio and many others. The tool  allows users to assign weights to different parameters to create a custom ranking.

You really need to watch the video to understand the power of the visualization tool and the broad applicability. I immediately saw several uses in research, but even down to choosing schools for my children. In San Francisco schools are by “lottery,” and you rank the schools by preference. There are so many datasets that affect that for parents, distance, academic ranking, teacher to student ratio, diversity ranking and several more. I could see this tool as a great way to determine the ranking of our choices. The uses are endless.

Quick Links:

Caleydo
LineUp

Reference:

 

Gratzl S, Lex A, Gehlenborg N, Pfister H, & Streit M (2013). LineUp: visual analysis of multi-attribute rankings. IEEE transactions on visualization and computer graphics, 19 (12), 2277-86 PMID: 24051794

Video Tip of the Week: 1000 Genomes Dataset Browser from NCBI


A recent NCBI Newsletter announced the release of a new resource named the 1000 Genomes Dataset Browser, and that is the resource that I will be featuring in this tip. It is one of the tools available through the new NCBI Variation resources page, which also features resources such as dbSNP, dbVar, dbGaP and ClinVar (many of which OpenHelix has tutorials for) as well as other variation tools – Variation Reporter (pre-release version), Clinical Remap (beta version) and the Phenotype-Genotype Integrator.

Before I discuss NCBI’s 1000 Genomes Dataset Browser, I’d like to spend a bit of time on the 1000 Genomes project, in order to distinguish what is from NCBI and what is from the project itself. From the 1000 Genomes Pilot paper:

“The aim of the 1000 Genomes Project is to discover, genotype and provide accurate haplotype information on all forms of human DNA polymorphism in multiple human populations. Specifically, the goal is to characterize over 95% of variants that are in genomic regions accessible to current high-throughput sequencing technologies and that have allele frequency of 1% or higher (the classical definition of polymorphism) in each of five major population groups (populations in or with ancestry from Europe, East Asia, South Asia, West Africa and the Americas).”

You can access the full paper from the link below. The project has now moved past the pilot phase and is releasing new data all the time. You can see announcements and project details, or access that data, through the official 1000 Genomes project site, or through the official 1000 Genomes version of the Ensembl Browser. As you might imagine for a “big data” project such as this, data has been added to a variety of NCBI databases, including dbSNP, the Sequence Read Archive (SRA) and BioSample. Although you could search for this data through the universal Entrez search system, previously to view the data you would have to view individual results at each separate database. The 1000 Genomes Browser at NCBI has been created as a powerful interface for comprehensively searching for, and viewing, 1000 Genomes data contained in NCBI resources on a single page.

In the video tip I will familiarize you to the various areas of the page - the browser is created with series of widgets, each with its own function. I will not be able to cover all of the features, or demonstrate how users can upload their own variation data to the browser – I’ll leave you the fun of exploring those on your own. Because the tool is so young, bugs and suggestions/comments are still being actively requested – if you find something, check out the FAQs (which discuss bugs at various stages of being fixed) and then email the team.

Quick Links:
NCBI Newsletter announcement July 20, 2012: http://1.usa.gov/RQu5dR

NCBI Variation page: http://www.ncbi.nlm.nih.gov/variation/

NCBI 1000 Genomes Browser page:
http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes/

1000 Genomes Project site: http://www.1000genomes.org/home

The 1000 genomes project specific version of the Ensembl Browser:
http://browser.1000genomes.org

Reference:
The 1000 Genomes Project Consortium (2010). A map of human genome variation from population-scale sequencing Nature, 467, 1061-1073 DOI: 10.1038/nature09534