There are many tools at NCBI, with a huge range of functions. Literature, sequence data, variations, protein structure, chemicals and bioassays, and more. It’s hard to keep track of what’s available. Their video tutorials are helping me to be aware of new tools, and new features within existing tools. For this week’s Tip of the Week, we’ll look at their recent video for ProSplign. It’s a tool that will help you align protein information to genomic sequences.
Although the Genome Workbench itself has been around for a while (we featured it as a tip it first in 2013), it is constantly underdevelopment, and new features are available regularly. And although this tip focuses on how to use the ProSplign piece, if you haven’t used it much it will help you to understand how a number of tools within the Workbench can be accessed. You can also see that Splign is available in the tool list–which is another NCBI tool for a similar type of process, but with mRNA sequences as the focus.
If you want to have a text-based type of walk-through instead, there is a page that will take you through the features (see the quick links below). And there are other videos that will help you to explore the Genome Workbench features as well–there’s a handy special playlist of just those videos. Subscribe to their YouTube channel for notices of their new items.
This week’s Video Tip of the Week is actually a whole bunch of videos. Although I’ll highlight one here as our tip, there are many great talks from the recent JGI Genomics of Energy & Environment meeting. Although typically we focus on specific software tools for our tips, I think this is a nice case of also looking at the type of research done with the tools.
This is a nice example of how to make a meeting accessible for a lot of people as well, using multiple strategies. The video channel, a Storify, dropboxes of slides (below), and the agenda details can help you to decide what might be relevant for your work. For example, we’ve talked about Docker, but you can now see how it’s deployed by the folks who are talking about it here. There’s a talk with Phytozome. And much more.
For today I’ll highlight MetaSub as one of the projects from the Mason lab. The Mason lab has participated in projects you probably heard about in the media–including swabbing the NYC subway system. You can see that data at PathoMap. MetaSUB stands for a data collection effort coming up soon, the Metagenomics & Metadesign of Subways and Urban Biomes. A global swabbing festival of the 10 busiest subways in the world (including my own–I wonder if I can do the station in my neighborhood?), to get more geospatial metagenomics maps, find antimicrobial resistance markers, and look for new biosynthetic gene clusters. It will be held on June 21, 2016–the summer solstice. It will tell us way more about our urban environments than we currently know. Maybe too much. But it’s a great idea, sure to reveal things we don’t know about our lived environment right now.
And here are the slides for the talk, as promised in the video. Mason tweets them:
He seriously did get through those 138 slides in 30 minutes. I was skeptical when I downloaded them before watching through them with the talk–but he really managed it. I was kind of out-of-breath just watching it.
He also talked about extreme environment sampling, and MetaPhlan2 and HUMAnN2 analyses, in a later segment. The whole thing is an excellent and breezy discussion of real-world genomics and a lot of appealing stories that the public would connect with. They are also doing educational outreach with a HTGAA course (How To Grow Almost Anything). There some really fun stuff with the Gowanus canal (seriously), and so much opportunity just hanging around in our cities. But also–what’s growing in space. They are working on space station mold. And astronauts–the NASA twins. They are also sending up a MinION (which they checked to see would work in microgravity–see paper below).
It was a very engaging talk. From an apparently very busy guy.
Afshinnekoo, E., Meydan, C., Chowdhury, S., Jaroudi, D., Boyer, C., Bernstein, N., Maritz, J., Reeves, D., Gandara, J., Chhangawala, S., Ahsanuddin, S., Simmons, A., Nessel, T., Sundaresh, B., Pereira, E., Jorgensen, E., Kolokotronis, S., Kirchberger, N., Garcia, I., Gandara, D., Dhanraj, S., Nawrin, T., Saletore, Y., Alexander, N., Vijay, P., Hénaff, E., Zumbo, P., Walsh, M., O’Mullan, G., Tighe, S., Dudley, J., Dunaif, A., Ennis, S., O’Halloran, E., Magalhaes, T., Boone, B., Jones, A., Muth, T., Paolantonio, K., Alter, E., Schadt, E., Garbarino, J., Prill, R., Carlton, J., Levy, S., & Mason, C. (2015). Geospatial Resolution of Human and Bacterial Diversity with City-Scale Metagenomics Cell Systems, 1 (1), 72-87 DOI: 10.1016/j.cels.2015.01.001
Alexa B.R. McIntyre, Lindsay Rizzardi, Angela M Yu, Gail L. Rosen, Noah Alexander, Douglas J. Botkin, Kristen K. John, Sarah L. Castro-Wallace, Aaron S. Burton, Andrew Feinberg, & Christopher E. Mason (2015). Nanopore Sequencing in Microgravity bioRxiv DOI: 10.1101/032342
However, the main gateway page was largely the familiar look. The gateway–where you begin to do most text-based or region-based queries for a species–was mostly altered only with some additional buttons and options. And an increasingly long list of species to choose from. But now–it’s time to look again. The gateway is very different today. You’ll have faster and easier access to get started when you go to the site, and new ways to engage with the data that you want to begin to access.
There are additional details on the UCSC landing page in the News area, including credits to the development team involved. The other key pieces include some relocations of the previous button options:
Note that a few browser utilities that were previously accessed through links and buttons on the Gateway page have been moved to the top menu bar:
*Browser reset: Genome Browser > Reset All User Settings
*Track search: Genome Browser > Track Search
*Add custom tracks: My Data > Custom Tracks
*Track hubs: My Data > Track Hubs
*Configure tracks and display: Genome Browser > Configure
The UCSC team has created a short intro video to the new look. That is our Video Tip of the Week:
Of course, this means we’ll need to update our slides and exercises. We like things to stabilize a bit after a rollout to be sure things are solid. But soon we’ll include the new navigation in our materials.
The underlying ways to access the particular assembly features you need for a given genome, and the data for your tracks of interest, is unchanged. So those parts of our training materials will still help you to get the most out of your searches. We’ll let you know when we’ve made the changes to the materials as well.
Speir, M., Zweig, A., Rosenbloom, K., Raney, B., Paten, B., Nejad, P., Lee, B., Learned, K., Karolchik, D., Hinrichs, A., Heitner, S., Harte, R., Haeussler, M., Guruvadoo, L., Fujita, P., Eisenhart, C., Diekhans, M., Clawson, H., Casper, J., Barber, G., Haussler, D., Kuhn, R., & Kent, W. (2015). The UCSC Genome Browser database: 2016 update Nucleic Acids Research DOI: 10.1093/nar/gkv1275
Disclosure: UCSC Genome Browser tutorials are freely available because UCSC sponsors us to do training and outreach on the UCSC Genome Browser.
As I mentioned last week, I am watching a lot of farmers on twitter talk about this year’s North American growing season. To get a taste of that yourself, have a look at #Plant16 + wheat as a search. This is where the rubber of tractor tires and plant genomics hits the…well…rows. And just coincidentally I saw a story about this new plant genomics research tool–actually in the farming media.
expVIP stands for expression Visualization and Integration Platform. Although the emphasis here is plant data, it can be used for any species. A good summary of their project is taken from their paper (linked below):
expVIP takes an input of RNA-seq reads (from single or multiple studies), quantifies expression per gene using the fast pseudoaligner kallisto (Bray et al., 2015) and creates a database containing the expression and sample information.
And it can handle polyploid species–try that on some of the tools aimed at human genomics! They illustrate this with some wheat samples from a number of different studies. And then they use the metadata about the studies, such as tissues and treatment conditions, to show how it works with some great sorting and filtering options. They created a version of this for you to interact with on the web: Wheat Expression Browser. But you can create your own data collections with their tools, aimed at your species or topics of interest.
This week’s Video Tip of the Week is their sample of how this Wheat Expression Browser works. Although you see the wheat data here, it’s just an example of how it can work with any species you’d like to examine.
I followed along and tried what they were showing in the video, and I found it to be a really slick and impressive way to explore the data. The dynamic filtering and sorting was really nice. You can customise the filtering/sorting/etc for the visualizations with the metadata that’s useful to your research. So you could set the tissue types, or treatment conditions, or whatever you want–and filter around to look at the expression with those. They go on to show that their strategies to compare genes in different situations seemed to reflect known biology in disease and abiotic stress conditions.
So their pipeline for gene matching, as well as the tools to explore and visualize RNA-Seq data, offer a great way to look at data that you might generate yourself or you could mine from existing submitted data–but that might not be well organized and available in a handy database just yet.
Over the years I’ve started to follow a lot of farmers on twitter. It might sound odd to folks who are immersed in human genomics and disease. But I actually find the plant and animal genomics communities to be pushing tech faster and further to the hands of end-users than a lot of the clinical applications are at this point in time. And as #Plant16 rolls out to feed us, there was a lot of soybean chatter in my twittersphere.
So when SoyBase tweeted a reminder about some of their videos, I thought the timing was great. They have a YouTube channel for some videos to help users access the SoyBase data. And one of the tools they illustrate is CMap. Although we’ve touched on CMap a couple of times on the blog and in our training videos, we never featured it. It’s one of the GMOD family members that can offer you comparisions of different map coordinate data sets. But conceptually I think it’s a good idea for people to think about physical map vs sequence mapping data. And this video shows how you can examine these different representations at SoyBase.
Besides their software videos, though, SoyBase also links to a lot of other videos that help people to understand more about many aspects of soybean cultivation. Check out their wide range of topics on their Video Tutorials page. You never see how to use a two row harvester at human genomics databases, do you?
I didn’t expect to do another tip on the paths through experiments or data this week. But there must be something in the water cooler lately, and all of these different tools converged on my part of the bioinformatics ecosphere. As I was perusing my tweetdeck columns, a new tool from the folks who do the Caleydo projects offered more paths through data: Pathfinder, Visual Analysis of Paths in Graphs.
This new tool offers another way to look across relationships in data sets. Finding paths through data is only getting harder with every new data set we get, but continues to become more important to pull in the characteristics of the alternate routes and yet still have the context of the overall picture. Scaling paths is hard. So the Calydo team aims at several key aspects of the problem with their new Pathfinder tool. The full details are in the paper (cited below), but I’ll list the points for the features they deliver here:
1. Query for paths.
2. Visualize attributes.
3. Visualze group structures in paths.
4. Rank paths.
5. Visual topology context.
6. Compare paths.
7. Group paths.
In addition to clever visualization and query strategies, the team always offers an nice intro video to give you a sense of what the tool can do for you. So the new video on Pathfinder is our Video Tip of the Week.
The example used is the sets of authors on publications. But it’s easy to imagine signalling pathways, or some types of sequence variation pathways, or many other kinds of paths researchers need to represent. They have a use case example in the paper of KEGG pathways. In the video, there’s a quick look at a pathway that includes copy number variations and gene expression data as attributes that may be important for understanding the paths.
Try it out. There’s a demo site available (linked below), and start to think about how you could use Pathfinder to analyze data that you are interested for your research directions.
Hat tip to Alexander Lex for the notice of the new tool:
Christian Partl, Samuel Gratzl, Marc Streit, Anne Mai Wassermann, Hanspeter Pfister, Dieter Schmalstieg, & Alexander Lex (2016). Pathfinder: Visual Analysis of Paths in Graphs Computer Graphics Forum ((EuroVis ’16)) In press.
Recently I highlighted a decision tree tool for experimental design. EDA, or Experimental Design Assistant, helps you to plan your experiment, choose the approrpiate groups and numbers you’ll need. Set some variables, etc. This week’s video also offers decision trees–but these help you to evaluate the data from your studies of interest instead. Branch is a web-based tool to help you test your hypotheses and develop models using data that’s available in a given data collection.
There’s a paper (linked below) with the backstory and information about how the tool works. But they’ve also done a nice series of videos to show you how to interact with the tools. The first one will be this week’s Video Tip of the Week. But be sure to check out the other ones for additional features as well. Each video tackles different aspects of the functionality that will help you to get the most from your explorations.
Try it out. You can use existing examples, or include your own data. You can make your own data private, or make it available to share with others. Be sure to read their disclaimers and think carefully if you are using certain data sets that have privacy issues. But there are probably many publicly available data sets that could get you exploring some hypothesis around your topic of interest.
Hat tip to the author whose tweet sent me looking to investigate this:
Gangavarapu, K., Babji, V., Meißner, T., Su, A., & Good, B. (2016). Branch: an interactive, web-based tool for testing hypotheses and developing predictive models Bioinformatics DOI: 10.1093/bioinformatics/btw117
One of the really persistent issues in genomics is how to either get a list of things, or handle a list of things. or the overlap among the things. I think that was one of the most popular topics we dealt with in the early days of OpenHelix, but it’s still a issue that people need to handle in various ways. Some of the most interesting solutions have been various organism Venn diagrams, and the Rat Genome one is a classic, modeled here by Lior Pachter. I’m certain the need to list and organize genome features won’t go away. So when I saw that the RGD folks had another tool to offer ways to do this, I put it right in my list of upcoming tips. And then the draft post got buried under a list of other things I had to do. But I wanted to get back to it–so here is their step-by-step guide to the OLGA tool they offer, as this week’s Video Tip of the Week.
OLGA is a straightforward list builder for rat, human and mouse genes or QTLs, or rat strains, using any (or all) of a variety of querying options. The new tutorial video will walk you through the process of querying the RGD database using OLGA, including
how to perform a simple query in OLGA
how to further expand or filter your result set using additional criteria
how to change your query parameters on the fly to refine your result set
what options OLGA gives for analysis of your list once you have it.
You can get a list of items using various ontologies–maybe you want a specific type of receptor, for example, you can get a list of them. Or you can quickly create a list of genes in a certain genomic span. You can get the items that fall in a QTL. Or you can start with a list and get annotations. You can also look for overlaps among sets.
The video is a nice walk-through of how to construct your query and what you can access. One key feature is that it’s not just rat data as you might expect at RGD. Mouse and human data are also available.
You can create complex and clever queries, and link to all sorts of related data in very easy steps. Have a look at their resources, and their other videos for more help with different aspects of their collections.
Shimoyama, M., De Pons, J., Hayman, G., Laulederkind, S., Liu, W., Nigam, R., Petri, V., Smith, J., Tutaj, M., Wang, S., Worthey, E., Dwinell, M., & Jacob, H. (2014). The Rat Genome Database 2015: genomic, phenotypic and environmental variations and disease Nucleic Acids Research, 43 (D1) DOI: 10.1093/nar/gku1026
Most of the bioinformatics tools we examine are things that come into play downstream of an experiment. People wish to analyze their data, look at genes that popped up (or dropped down), visualize relationships, etc. So this week’s Video Tip tool is unusual–it’s software that helps people design the upstream pieces of their experiments.
Experimental Design Assistant is targeted at the proper design of animal research studies. Using animals carefully and responsibly includes well-designed experiments, because wasted experiments due to poor design is something researchers should want to avoid. It’s bad animal welfare practice, and it’s also expensive. The EDA folks describe this very nicely on a background piece linked on their site.
The 13 minute video is a nice overview of how the workflow will guide you. They recommend that you start with some of their templates that might be similar to your research goals, and edit that. They show you how to start with a blank canvas or a template in the video. They illustrate how you can set up different groups of animals, denote some kind of pharmaceutical intervention or treatment–in the case they show it’s different light cycles. You can establish doses or other variables that are appropriate. Then you move on to a “Measurement” node. They demonstrate that only the right connections in the diagrams can be made, or you get warnings. Then an outcome node can be added. There’s a way to add numerous variables and other experimental details that need to be accounted for.
Other shorter tutorials cover other pieces–like critiquing your experiment, power calculation and randomization sequence, exporting/importing and sharing the diagrams you create.
This is a different but really useful kind of biology software tool. I think it could be great in teaching situations as well. You should check it out.
This week’s video tip demonstrates a new feature at the UCSC Genome Browser. I think it’s kind of unusual, and conceptually took me a little while to get used to when I started testing it. So I wanted to go over the basics for you, and give you a couple of tips on things that I had to grok as I got used to this new visualization option.
Have you ever wished you could remove all of the intronic or intergenic regions from the Genome Browser display? Have you ever dreamed of being able to visualize multiple far-flung regions of a genome? Well, now you can with the new “multi-region” option in the Genome Browser!
I should probably start with the first thing that confused me–the name “multi-region”. I thought that I was going to be able to see maybe part of a region on chromosome 1, and something on chromosome 8, maybe at the same time. But that’s not how this works. In this case, you look at multiple regions along the same chromosome, with some of the intervening sequences snipped out. This creates a sort of “virtual chromosome” for you to interact with.
In this week’s video, I’ll show you how that looks using the BRCA1 gene. First I show how you can look at all the exons together–with introns clipped out. And then I show how you can see the genes in the neighborhood displayed together, with the non-coding regions clipped out. These are 2 of the separate options for viewing.
I use the “View” menu option to illustrate this feature. But there is another way to access it–you can use the “multi-region” button in the browser buttons area.
To keep the video short, I didn’t go into every detail on this tool. You should check out the news announcement for it, and the link to the additional details in the User Guide documentation for more. The new feature is also mentioned briefly in the lastest NAR paper on the UCSC Genome Browser (linked below). And you should try it out, of course! That’s the best way to really understand how it might help you to visualize regions of the genome that you might be interested in.
Also as in the news, thanks to the development team. I am always looking for new visualizations, and this fun to test!
Thank you to Galt Barber, Matthew Speir, and the entire UCSC Genome Browser quality assurance team for all of their efforts in creating these exciting new display modes.
Speir, M., Zweig, A., Rosenbloom, K., Raney, B., Paten, B., Nejad, P., Lee, B., Learned, K., Karolchik, D., Hinrichs, A., Heitner, S., Harte, R., Haeussler, M., Guruvadoo, L., Fujita, P., Eisenhart, C., Diekhans, M., Clawson, H., Casper, J., Barber, G., Haussler, D., Kuhn, R., & Kent, W. (2016). The UCSC Genome Browser database: 2016 update Nucleic Acids Research, 44 (D1) DOI: 10.1093/nar/gkv1275
Disclosure: UCSC Genome Browser tutorials are freely available because UCSC sponsors us to do training and outreach on the UCSC Genome Browser.