This week’s tip is a follow-up to the PathWhiz one featured last week. After I had finished writing that one, the second video in the series became available. It has a lot more detail on how to work with the tool.
I’m not going to go into the introduction here again, you can flip back and read the piece for some more of the foundation. But if you are ready to look at the more advanced guidance in this video, it’s worth the time.
“Pathway diagrams are the road maps of biology.” This is how the folks from PathWhiz begin their recent paper. I came across it in the Nucleic Acids Research web server issue which was recently announced. The NAR database issue in January and the mid-year web server issue are perfectly timed items that I can content mine all year ’round. And I am always drawn to the tools which are offering better visualizations for data. PathWhiz is offering better road maps. So I definitely wanted to take a look.
They note that historically pathway data has been artistically rendered for print applications like papers and posters, and can be quite engaging and attractive. But actually working with pathways in computational settings can be a bit more, um, sterile–I guess I would characterize it. Their goal seems to be to merge the two: better options for graphical components, but still machine-readable for further manipulation and exploration. They summarize their goal in this hybrid approach:
PathWhiz is essentially a web server designed for the facile creation of colourful, visually pleasing and biologically accurate pathway diagrams that are machine-readable, interactive and fully web compatible.
The paper goes on to describe a lot of the foundational concepts and the implementation. There are important technical aspects covered about the formats and file types. But the best way to get a feeling for it is their intro video. You can also access that on their tutorial page and I’ll include it here.
Mid-way through this PathWhiz video 1, they show you the difference between a KEGG, Reactome, and WikiPathways visualization to give you a sense of the differences. (~4:15). A part II video is coming, but not available yet. It has posted since I started this.
Look through the “legends” area to see the kind of handy diagrams you might need–molecules, membranes, or cellular organelles, or even tissues like brain or liver. Tab through the different types of graphics that are available to get a sense of how your pathways could look rendered in the PathWhiz system. You can try it out easily too: there’s a “guest” mode where you can just kick the tires. Or you can create a login and work on some of the ones that might be useful for your work and your presentations and papers. Those can be saved and locked, but can also be cloned and expanded on by other people. You can also get a sense of what some of the more mature diagrams can look like by browsing the pathway collection. I thought this one: 17-alpha-hydroxylase deficiency (CYP17), had nice examples of the tissue (kidney) and organelles involved that quickly give you a grasp of what’s going on and where. I’ve just shown a small part of it in this image, it’s much more detailed at full size. You can zoom in to see the pathway components. And you can see from here that the details are exportable in a number of ways by clicking the “Downloads” tab.
So for better representations for humans to view, while also preserving the important functions that computational renderings can offer, PathWhiz is worth a look. Go over and try it out.
This week’s tip is quite multi-media. There’s a video, as required. But there’s a traditional published paper format, too. And there’s also the free training slides and exercises from us, sponsored by the folks who create the UCSC Genome Browser. So if you prefer audio, graphics, or text–we’ve got it all in this week’s tip.
For years we’ve been doing the UCSC Genome Browser online training suites. And those materials are still available for everyone to see. But I know some people prefer to have someone walk through the stuff in a webinar or workshop. And if you are using our materials yourself for training others, it might help to hear how I present it “live”. For this week’s tip, here’s a snippet of the recent webinar I just did, with my most current slide set.
This paper was a fun paper to write. I like to do the step-by-step series. It forces me to really think like a new user, looking at the menus and the buttons and everything that someone new to the software might face. And if you are in a teaching situation, you could offer this paper to students to let them try these things out. You could pair it with either the webinar or our standard recording. And I think this multi-media strategy could be really effective in getting people to grasp the concepts, and also build their confidence with the tools.
I spent some time working through the paper to see if there were any serious differences since we submitted it a while ago. I will note that there are a few changes since we wrote this. For example, the former “Variations and Repeats” group has become “Variations” and “Repeats” as separate track groups. And “Literature” moved into “Phenotype and Literature”. But I don’t think that will trip up most users. Use that as a teachable moment about interfaces changing…. Also, of course, version numbers for dbSNP have changed. But again–most people can follow along, or even try the old version to see the differences.
Probably the biggest difference is the part with the evolutionary relationships. Now that there are 100 species instead of the prior 46 species version, a couple of things about that interface changed. Now you need to check “All species” instead. They don’t separate out vertebrates the way they used to.
Another interface change in the part with the Track Hubs will be potentially confusing. As an introduction to hubs, Bob Kuhn wrote this part that walks you through the basic structure of the setup of a hub. All of our text it still ok, but you can’t just get the URL like we had originally shown. Still, it walks you through the structure of the hubs without problems if you just type the URL instead of copying it.
So use our materials to teach yourself, or to teach others. We hope this offers different ways that will work for everyone.
Mangan ME, Williams JM, Kuhn RM, & Lathe WC (2014). The UCSC Genome Browser: What Every Molecular Biologist Should Know Current Protocols in Molecular Biology., 107 (19.9), 199-199 DOI: 10.1002/0471142727.mb1909s107
Disclosure: These tutorials are freely available because UCSC sponsors us to do training and outreach on the UCSC Genome Browser.
Silos. This is a big problem for us with human genome data from individuals. We’re getting sequences, but they are locked up in various ways. David Haussler’s talk at the recent Global Alliance for Genomics and Health meeting (GA4GH) emphasized this barrier, and also talked about ways they are looking to work around the legal, social, and institutional barriers that we’ve created. He talked about Beacon, which I highlighted recently as a Tip of the Week. But there are other strategies needed to connect physicians and patients with other folks who might help them get to answers. Heidi Rehm’s talk provided information about a possible tool for this: PhenomeCentral.
Unfortunately, the videos aren’t uploaded to YouTube, you have to go to the June 10 Meeting page and obtain them from there. The one that contained the information on PhenomeCentral is the one called “Matchmaker Exchange”.
PhenomeCentral is a repository for secure data sharing targeted to clinicians and scientists working in the rare disorder community. PhenomeCentral encourages global scientific collaboration while respecting the privacy of patients profiled in this centralized database.
Certainly people in bioinformatics are familiar with the really crucial information from OMIM and Orphanet. But these are aggregators of information, not patient-specific. There may be lists of features of a condition, but how they appear in a given patient’s situation might vary.
What this new strategy will do is let doctors and researchers take the phenotype and genotype data (you can upload VCF files), and make predictions about the genes involved. They also have ways to “matchmake” possibly similar disease manifestations. This project is part of the larger “MatchMaker Exchange” collection (Note: MME is not a dating site…it’s also still under development). But the idea is that with patient details one could search for matches with other similar patients (depending on the privacy level of the records, of course). It sounded to me like a kind of BLAST for medical conditions (they didn’t call it that). But it also has ways to semantically link phenotype concepts, because they might be entered differently by different evaluating physicians, yet be the same type of issue underneath. That Human Phenotype Ontology (HPO) that I’ve covered a couple of times lately enables this.
They have 3 levels of privacy settings included: private, matchable (where you can find it in a search, but it’s not wide open to everyone), and public.
So although I used the GA4GH talk as a launching point to learn more about the features and conceptual parts of the PhenomeCentral software, I also came across this other webinar that was more specific about the software features (which is what I typically prefer for our tips, the specific software tools). The Genetic Alliance is a patient-centric group interested in answers for genetic and genome-variant medical situations, actively working with advocacy groups and researchers to bridge the needs of both. In their webinar series last year they included PhenomeCentral.
What I didn’t realize from the GA4GH overview was that there are additional tools, including a pedigree tool in the PhenoTips part. We find a lot of people find our blog searching for pedigree tools, so I wanted to be sure to mention that specifically. You can try it out by entering fake data in the playground over there, and accessing the Pedigree Tool from that record. This was also handy for me because I didn’t create a login for the main PhenomeCentral site due to the privacy issues.
So have a look at PhenomeCentral. And from the GA4GH video I learned that there is a special journal issue coming up in the fall that will have papers related to these projects. So I’ll link to the PhenoTips publication below now, but when more references become available for this tool or project I’ll add them in. I expect there will be metrics about algorithms in use and other technical details that are important for fully evaluating the tool.
References: Girdea, M., Dumitriu, S., Fiume, M., Bowdin, S., Boycott, K., Chénier, S., Chitayat, D., Faghfoury, H., Meyn, M., Ray, P., So, J., Stavropoulos, D., & Brudno, M. (2013). PhenoTips: Patient Phenotyping Software for Clinical and Research Use Human Mutation, 34 (8), 1057-1065 DOI: 10.1002/humu.22347
This week’s Video Tip of the Week covers a different aspect of bioinformatics than some of our other tips. But having been trained as a cell biologist, I do consider imaging software as an important part of the crucial software ecosystem. Also, since it’s a holiday week and traffic may be light in the US, I thought something really nice to look at was a good plan.
I found out about this software via ResearchBlogging, via The Node’s Anne-Lise Routier-Kierzkowska’s post about the work she and her team have done: MorphoGraphX: A platform for quantifying morphogenesis in 4D. It’s a nice overview of the kinds of things that this software can do, and what the origins were. I really like the backstory types of posts from researchers writing about their own work–go read that, I’m not going to replicate it here.
On the MorphoGraphX site, the other things they describe as features of their software include:
The introductory video from their team is a nice overview. But you should definitely see their paper, which has additional video figures that show more of the features and the utility. There are several different video figures that are fascinating to watch. Really–go watch the paper–don’t print it. Paper or PDFs wouldn’t cut it for this story.
No audio for this video. Just lovely images with some text guidance. I don’t have the computing capacity to try it myself, nor to I have the stacks of images that I used to have. But there are many nice examples of what it can do. And Anne-Lisa’s blog post speaks about what researchers are doing with it.
Barbier de Reuille, P., Routier-Kierzkowska, A., Kierzkowski, D., Bassel, G., Schüpbach, T., Tauriello, G., Bajpai, N., Strauss, S., Weber, A., Kiss, A., Burian, A., Hofhuis, H., Sapala, A., Lipowczan, M., Heimlicher, M., Robinson, S., Bayer, E., Basler, K., Koumoutsakos, P., Roeder, A., Aegerter-Wilmsen, T., Nakayama, N., Tsiantis, M., Hay, A., Kwiatkowska, D., Xenarios, I., Kuhlemeier, C., & Smith, R. (2015). MorphoGraphX: A platform for quantifying morphogenesis in 4D eLife, 4 DOI: 10.7554/eLife.05864
This is not a typical tip–where we explore the features and details of bioinformatics tools. But it’s one of those handy little features that may make your life easier. It’s made mine better lately.
I had been using the ScienceSeeker citation generator system for creating citations that would then aggregate to either ScienceSeeker or ResearchBlogging. But ScienceSeeker’s model recently changed. And ResearchBlogging’s support and stability is…well, uneven. But I still would like my posts tagged with appropriate citations and DOIs so they can be found later with other tools and searches.
The helpful folks at ScienceSeeker offered this alternative strategy for quickly grabbing a citation. I’ve already used it a few times now. And I thought other science bloggers might also find this handy. Or anyone wanting a quick formatted cite. And then to just tag it with the DOI is simple. (But boy, I wish they had a version that had DOIs. Maybe I should ask for that.)
I’ve been using Google Scholar a lot lately because the collection is getting better as the paper below notes, and it is becoming a bit more refined with less nonsense items pulled in. In the past I was really upset to see detritus like “Mr. Happy’s Health News” in there. But I looked recently and Mr. Happy was gone. There were also some really terrible activist “reports” on biotechnology loaded with unsourced and incorrect information. I’ve seen less of that too, but I haven’t looked specifically for those of late.
But there have been many times I’ve been able to locate a PDF over there that has come in very handy. Yet I had never tried to use that software feature before to create the links. I’m glad it’s available. I just wish there was a version for blog posts with the links done up right. I checked with the Altmetric support pages to see what I need to have in the structure to be sure it gets counted, and here’s the suggested syntax: How do I ensure that my blog posts are picked up by Altmetric?
2. Always include links to the papers that you reference
If you blog a lot about research, the best way to make sure that your posts get picked up by Altmetric is to include a direct link to a scholarly article.
You can include a link to the journal in a variety of different formats, which include but are not limited to:
I know it’s not that hard to add a DOI URL. But it is an extra step I didn’t have to do with the sciblogging citation generators. However, I can’t see an obvious place to offer suggestions or contact the developers. If anyone knows how to reach that team, let me know.
Maybe you’ve heard of the others. ABrowse. BBrowse. CBrowse. [you get the idea] GBrowse has been widely adopted. JBrowse is picking up steam. Into the orderly arrangement we now throw ZBrowse: a new way to look at genome-wide association study data.
Sharing and chatter about ZBrowse for viewing GWAS was abundant when the paper was published recently.
I could see the appeal immediately. One of the first things I check when exploring new software is the species range. See, I’m agnostic on species, and especially like to find tools that support a wide range of species. ZBrowse does this. Right in their paper they provide a chart comparing their features to other tools, and that tidbit jumped right out at me.
Although we usually like to highlight web-based tools, this one was really different and worth covering even though it requires you to do a bit more lifting on installing it. But they help with that, in their videos and instructions. And ultimately it runs in your browser, once you’ve got the right pieces in place. I was able to set it up and run it (after updating my R and RStudio).
I’m going to skip the installation and data loading videos for now, but you should go over and see them when you are ready to try it out. I’ll just give you a look at the features they show in their introductory video for the browser part. That will give you the best idea of why it’s worth trying it out.
It comes loaded with some plant data, but you can use other data you have. It was very easy to look at the Manhattan plot view, and then focus on smaller chosen regions. I really liked how easy it was to see what’s in the neighborhood of a selected item when you turned to the annotation tab.
It might also be worth trying this out as a software delivery strategy–I was just reading about other folks who are offering tools that sit on top of R and RStudio this way (come back tomorrow for another example). People who want to offer you the chance to look over large data sets they are providing are considering this.
GenomeConnect is part of the larger ClinGen effort that I began to discuss last week, but this aspect is specifically a portal for patients who have (or may get) genetic testing results of various types. The ClinGen team will use this interface to capture the testing data–the genotypes, and the health history, or phenotypes, and they want the patients to feel like active partners in the use of the information with doctors and researchers. It also allows patients to connect with others who have similar medical history or diagnoses. The participation details page notes that this is all de-identified, and that participants can choose which types of inquiries to respond to later.
This week’s video provides details about the goals of this piece of the project (see screen shot). Adults and children (with guardian consent) can be included [there’s a brochure with that note, PDF]. They also specifically note that you don’t have to have a genetic diagnosis yet. They also show examples of the kind of health survey data they will collect from participants. They also note that they intend to maintain contact with participants, in case they need to clarify or update issues in the records, or potentially involve them in future studies. You can log back in to change health data if your health changes over time. They will include single gene testing results, disease panels, whole exome or genome sequencing, karyotypes, chromosomal microarrays, or GWAS. The input data will be curated by trained Genome Connect professionals, and de-identified data will be shared with other participants if you choose. They say that currently they cannot get submission information directly from health care providers, but that may be a direction they add in the future.
Getting patients involved, and delivering real benefits to them, will be crucial for acceptance and adoption of genomics. And it may advance research by connecting investigators with families who wish to participate in studies. But there are a lot of barriers. This week I was watching the NHGRI’s meeting: Genomic Medicine Meeting VIII: NHGRI’s Genomic Medicine Portfolio (GM8) #GenomicMed8. I was surprised at how much of the discussion was devoted to getting payers to cover the sequencing and testing. This included both insurance systems US-style, but also national payer systems (there were representatives from Canada, the EU, etc there too). Consenting (and re-consenting later for future research) was also discussed. Counseling access can be a problem. There was a whole segment on the second day on these issues, but they kept coming up interspersed in all the other discussions too. GenomeConnect was one of the named projects in an overview of patient-facing tools. Others included Genetics Home Reference, Cancer Genetics PDQ, Genetic Alliance, MEDLINEPlus, NORD / GARD, and OrphaNet. Other tools that were aimed more at explanation of results included LabTestsOnline, MyResults, YourGenome, and My46. All of these have different scope and features, of course, and some are limited to certain research or treatment facilities. But more efforts continue to be developed to get people involved in research and effective use of genomic information for health.
References: Rehm, H., Berg, J., Brooks, L., Bustamante, C., Evans, J., Landrum, M., Ledbetter, D., Maglott, D., Martin, C., Nussbaum, R., Plon, S., Ramos, E., Sherry, S., & Watson, M. (2015). ClinGen — The Clinical Genome Resource New England Journal of Medicine DOI: 10.1056/NEJMsr1406261
The sequence data tsunami begins to crash into the shore, at the feet of clinicians and patients who want answers and treatment directions. But sometimes the tsunami is washing in debris. As the amount of sequence and variation information grows, some of it comes without clear evaluations of the impacts. Some of it comes with conflicting information. And some of it comes in wrong.
Attempting to wrangle the information into useful understanding and treatments with standardized descriptions, the team building the ClinGen resources published a paper last week that details their efforts. The paper describes their history and goals, and how they are moving to get to a point where they have useful information for and from patients, their doctors, testing labs, and researchers. Because of the different needs of different groups, there are several moving parts to the overall ClinGen collection.
In addition to the paper–and several related articles in this NEJM special report–there are videos on their site that tackle different aspects of the ClinGen projects. I’m going to highlight one of them here as the Tip of the Week, but you should also check out the others that are available on their webinars page or their YouTube channel. This video shows the Dosage Sensitivity Map features.
This video provides some of the history and framework for the ClinGen efforts, and then also introduces one of the tools that they have made available, a dosage sensitivity map. This piece focuses on “evidence based reviews of dosage sensitivity”, and they indicate haploinsufficiency losses of regions, and triplosensitivity duplications of regions. They describe a scoring system they use to rank structural variations (CNVs, SVs), and their curation of the evidence to support or to refute dosage sensitivity. They also note that their process is conservative, and you should keep that in mind as you consider the their team’s review of the evidence. But they are definitely open and interested in feedback and they hope you will contact them if you have a different understanding from their posted evaluations.
To follow along with the video, use this site to explore the features of this part of the ClinGen tool set: http://www.ncbi.nlm.nih.gov/projects/dbvar/clingen/. But you can also just click their example genes–for instance, the ZEB2 link shows you a typical page with the score information, links to other resources, and a genome viewer right on the page. But you can also choose to look at external browsers at NCBI, Ensembl, or UCSC. I clicked the UCSC Genome Browser one to see how it displayed, and they automatically present to you tracks with the relevant ClinGen data loaded.
In other tips I’ll talk about other pieces of the infrastructure that they are building or coordinating with. Some we’ve talked about before–you can see a previous tip that included the ClinVar resource at NCBI that is foundational to the ClinGen suite and is discussed in their paper as well. They also note the importance of the data from OMIM, and how their mutual efforts are providing important feedback loops to be alerted to needed updates. ClinGen also employs the Human Phenotype Ontology that keeps coming up at OpenHelix lately. Another important piece to this is the standards for naming variants that were recently described by the American College of Medical Genetics and Genomics (paper linked below).
ClinGen, and the various component tools within, are worth looking at, and contributing to, as we try to move more and better information to the clinic for patients and doctors to use effectively. Steven Salzberg has a take on the value of ClinGen here: 17% Of Our Genetic Knowledge Is Wrong.
It’s also very possible that some really important things will happen in the database–new submissions, changes to the status of a variant–that will occur before any papers come out about it. Or it is even possible that a paper never will come out about it. Spend some time learning about the features; I think it will be worth the time.
Rehm, H., Berg, J., Brooks, L., Bustamante, C., Evans, J., Landrum, M., Ledbetter, D., Maglott, D., Martin, C., Nussbaum, R., Plon, S., Ramos, E., Sherry, S., & Watson, M. (2015). ClinGen — The Clinical Genome Resource New England Journal of Medicine DOI: 10.1056/NEJMsr1406261
Richards, S., Aziz, N., Bale, S., Bick, D., Das, S., Gastier-Foster, J., Grody, W., Hegde, M., Lyon, E., Spector, E., Voelkerding, K., & Rehm, H. (2015). Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology Genetics in Medicine, 17 (5), 405-423 DOI: 10.1038/gim.2015.30
This week’s Video Tip of the Week demonstrates PANDA, a tool for generating and examining annotations that are available for a list of genes, and evaluating them in the context of pathways. Two great tastes that taste great together, you know? So have a look at how PANDA can help you and your team to annotate lists with both curated and personalized details, and see relevant connections among items in your lists. Note: all the data appears to be human–there are no actual pandas here, except one of the available icons to use to designate your gene list. If you are looking for actual panda genome stuff, try here.
We have developed PANDA (Pathway AND Annotation) Explorer, a visualization tool that integrates gene-level annotation in the context of biological pathways to help interpret complex data from disparate sources. PANDA is a web-based application that displays data in the context of well-studied pathways like KEGG, BioCarta, and PharmGKB.
Their nice intro video here will introduce the basic features. One thing though–the sample list that is used has been moved to the GitHub repository, and the one used is GENELISTB in the EXAMPLES folder. I copied that to an excel file and did the same thing as illustrated in the video and it worked great. [And yes, I know y'all hate excel, but it works for biologists, they'll get the annotation thing this way.]
It was really easy to move from my sample gene list into a KEGG pathway, and so clear which genes in my list were components of this pathway because of the overlay icon that PANDA let you assign to your data. And you can further overlay that with the DGIdb, OMIM, HPO (yes, I mentioned to you recently we were going to need to understand this), MalaCards, and PharmGKB links too.
So the idea is whatever kind of -omics data you have, as long as it is tied to a gene, you can upload it and explore the relationships in more detail with these handy mappings and additional details.
And there is a way for you to have multiple colleagues access your annotation set and add further details.
But you aren’t limited to the pathways already in the PANDA system, there are also ways to customize what you know about your pathways and annotations. There is a second video that will offer more detail on that using a network with Cytoscape. You can get the other video from the Mayo Bioinformatics YouTube channel or on their site.
Hart, S., Moore, R., Zimmermann, M., Oliver, G., Egan, J., Bryce, A., & Kocher, J. (2015). PANDA: pathway and annotation explorer for visualizing and interpreting gene-centric data PeerJ, 3 DOI: 10.7717/peerj.970