Over the years I’ve started to follow a lot of farmers on twitter. It might sound odd to folks who are immersed in human genomics and disease. But I actually find the plant and animal genomics communities to be pushing tech faster and further to the hands of end-users than a lot of the clinical applications are at this point in time. And as #Plant16 rolls out to feed us, there was a lot of soybean chatter in my twittersphere.
So when SoyBase tweeted a reminder about some of their videos, I thought the timing was great. They have a YouTube channel for some videos to help users access the SoyBase data. And one of the tools they illustrate is CMap. Although we’ve touched on CMap a couple of times on the blog and in our training videos, we never featured it. It’s one of the GMOD family members that can offer you comparisions of different map coordinate data sets. But conceptually I think it’s a good idea for people to think about physical map vs sequence mapping data. And this video shows how you can examine these different representations at SoyBase.
Besides their software videos, though, SoyBase also links to a lot of other videos that help people to understand more about many aspects of soybean cultivation. Check out their wide range of topics on their Video Tutorials page. You never see how to use a two row harvester at human genomics databases, do you?
I didn’t expect to do another tip on the paths through experiments or data this week. But there must be something in the water cooler lately, and all of these different tools converged on my part of the bioinformatics ecosphere. As I was perusing my tweetdeck columns, a new tool from the folks who do the Caleydo projects offered more paths through data: Pathfinder, Visual Analysis of Paths in Graphs.
This new tool offers another way to look across relationships in data sets. Finding paths through data is only getting harder with every new data set we get, but continues to become more important to pull in the characteristics of the alternate routes and yet still have the context of the overall picture. Scaling paths is hard. So the Calydo team aims at several key aspects of the problem with their new Pathfinder tool. The full details are in the paper (cited below), but I’ll list the points for the features they deliver here:
1. Query for paths.
2. Visualize attributes.
3. Visualze group structures in paths.
4. Rank paths.
5. Visual topology context.
6. Compare paths.
7. Group paths.
In addition to clever visualization and query strategies, the team always offers an nice intro video to give you a sense of what the tool can do for you. So the new video on Pathfinder is our Video Tip of the Week.
The example used is the sets of authors on publications. But it’s easy to imagine signalling pathways, or some types of sequence variation pathways, or many other kinds of paths researchers need to represent. They have a use case example in the paper of KEGG pathways. In the video, there’s a quick look at a pathway that includes copy number variations and gene expression data as attributes that may be important for understanding the paths.
Try it out. There’s a demo site available (linked below), and start to think about how you could use Pathfinder to analyze data that you are interested for your research directions.
Hat tip to Alexander Lex for the notice of the new tool:
Christian Partl, Samuel Gratzl, Marc Streit, Anne Mai Wassermann, Hanspeter Pfister, Dieter Schmalstieg, & Alexander Lex (2016). Pathfinder: Visual Analysis of Paths in Graphs Computer Graphics Forum ((EuroVis ’16)) In press.
Recently I highlighted a decision tree tool for experimental design. EDA, or Experimental Design Assistant, helps you to plan your experiment, choose the approrpiate groups and numbers you’ll need. Set some variables, etc. This week’s video also offers decision trees–but these help you to evaluate the data from your studies of interest instead. Branch is a web-based tool to help you test your hypotheses and develop models using data that’s available in a given data collection.
There’s a paper (linked below) with the backstory and information about how the tool works. But they’ve also done a nice series of videos to show you how to interact with the tools. The first one will be this week’s Video Tip of the Week. But be sure to check out the other ones for additional features as well. Each video tackles different aspects of the functionality that will help you to get the most from your explorations.
Try it out. You can use existing examples, or include your own data. You can make your own data private, or make it available to share with others. Be sure to read their disclaimers and think carefully if you are using certain data sets that have privacy issues. But there are probably many publicly available data sets that could get you exploring some hypothesis around your topic of interest.
Hat tip to the author whose tweet sent me looking to investigate this:
Gangavarapu, K., Babji, V., Meißner, T., Su, A., & Good, B. (2016). Branch: an interactive, web-based tool for testing hypotheses and developing predictive models Bioinformatics DOI: 10.1093/bioinformatics/btw117
One of the really persistent issues in genomics is how to either get a list of things, or handle a list of things. or the overlap among the things. I think that was one of the most popular topics we dealt with in the early days of OpenHelix, but it’s still a issue that people need to handle in various ways. Some of the most interesting solutions have been various organism Venn diagrams, and the Rat Genome one is a classic, modeled here by Lior Pachter. I’m certain the need to list and organize genome features won’t go away. So when I saw that the RGD folks had another tool to offer ways to do this, I put it right in my list of upcoming tips. And then the draft post got buried under a list of other things I had to do. But I wanted to get back to it–so here is their step-by-step guide to the OLGA tool they offer, as this week’s Video Tip of the Week.
OLGA is a straightforward list builder for rat, human and mouse genes or QTLs, or rat strains, using any (or all) of a variety of querying options. The new tutorial video will walk you through the process of querying the RGD database using OLGA, including
how to perform a simple query in OLGA
how to further expand or filter your result set using additional criteria
how to change your query parameters on the fly to refine your result set
what options OLGA gives for analysis of your list once you have it.
You can get a list of items using various ontologies–maybe you want a specific type of receptor, for example, you can get a list of them. Or you can quickly create a list of genes in a certain genomic span. You can get the items that fall in a QTL. Or you can start with a list and get annotations. You can also look for overlaps among sets.
The video is a nice walk-through of how to construct your query and what you can access. One key feature is that it’s not just rat data as you might expect at RGD. Mouse and human data are also available.
You can create complex and clever queries, and link to all sorts of related data in very easy steps. Have a look at their resources, and their other videos for more help with different aspects of their collections.
Shimoyama, M., De Pons, J., Hayman, G., Laulederkind, S., Liu, W., Nigam, R., Petri, V., Smith, J., Tutaj, M., Wang, S., Worthey, E., Dwinell, M., & Jacob, H. (2014). The Rat Genome Database 2015: genomic, phenotypic and environmental variations and disease Nucleic Acids Research, 43 (D1) DOI: 10.1093/nar/gku1026
Most of the bioinformatics tools we examine are things that come into play downstream of an experiment. People wish to analyze their data, look at genes that popped up (or dropped down), visualize relationships, etc. So this week’s Video Tip tool is unusual–it’s software that helps people design the upstream pieces of their experiments.
Experimental Design Assistant is targeted at the proper design of animal research studies. Using animals carefully and responsibly includes well-designed experiments, because wasted experiments due to poor design is something researchers should want to avoid. It’s bad animal welfare practice, and it’s also expensive. The EDA folks describe this very nicely on a background piece linked on their site.
The 13 minute video is a nice overview of how the workflow will guide you. They recommend that you start with some of their templates that might be similar to your research goals, and edit that. They show you how to start with a blank canvas or a template in the video. They illustrate how you can set up different groups of animals, denote some kind of pharmaceutical intervention or treatment–in the case they show it’s different light cycles. You can establish doses or other variables that are appropriate. Then you move on to a “Measurement” node. They demonstrate that only the right connections in the diagrams can be made, or you get warnings. Then an outcome node can be added. There’s a way to add numerous variables and other experimental details that need to be accounted for.
Other shorter tutorials cover other pieces–like critiquing your experiment, power calculation and randomization sequence, exporting/importing and sharing the diagrams you create.
This is a different but really useful kind of biology software tool. I think it could be great in teaching situations as well. You should check it out.
This week’s video tip demonstrates a new feature at the UCSC Genome Browser. I think it’s kind of unusual, and conceptually took me a little while to get used to when I started testing it. So I wanted to go over the basics for you, and give you a couple of tips on things that I had to grok as I got used to this new visualization option.
Have you ever wished you could remove all of the intronic or intergenic regions from the Genome Browser display? Have you ever dreamed of being able to visualize multiple far-flung regions of a genome? Well, now you can with the new “multi-region” option in the Genome Browser!
I should probably start with the first thing that confused me–the name “multi-region”. I thought that I was going to be able to see maybe part of a region on chromosome 1, and something on chromosome 8, maybe at the same time. But that’s not how this works. In this case, you look at multiple regions along the same chromosome, with some of the intervening sequences snipped out. This creates a sort of “virtual chromosome” for you to interact with.
In this week’s video, I’ll show you how that looks using the BRCA1 gene. First I show how you can look at all the exons together–with introns clipped out. And then I show how you can see the genes in the neighborhood displayed together, with the non-coding regions clipped out. These are 2 of the separate options for viewing.
I use the “View” menu option to illustrate this feature. But there is another way to access it–you can use the “multi-region” button in the browser buttons area.
To keep the video short, I didn’t go into every detail on this tool. You should check out the news announcement for it, and the link to the additional details in the User Guide documentation for more. The new feature is also mentioned briefly in the lastest NAR paper on the UCSC Genome Browser (linked below). And you should try it out, of course! That’s the best way to really understand how it might help you to visualize regions of the genome that you might be interested in.
Also as in the news, thanks to the development team. I am always looking for new visualizations, and this fun to test!
Thank you to Galt Barber, Matthew Speir, and the entire UCSC Genome Browser quality assurance team for all of their efforts in creating these exciting new display modes.
Speir, M., Zweig, A., Rosenbloom, K., Raney, B., Paten, B., Nejad, P., Lee, B., Learned, K., Karolchik, D., Hinrichs, A., Heitner, S., Harte, R., Haeussler, M., Guruvadoo, L., Fujita, P., Eisenhart, C., Diekhans, M., Clawson, H., Casper, J., Barber, G., Haussler, D., Kuhn, R., & Kent, W. (2016). The UCSC Genome Browser database: 2016 update Nucleic Acids Research, 44 (D1) DOI: 10.1093/nar/gkv1275
Disclosure: UCSC Genome Browser tutorials are freely available because UCSC sponsors us to do training and outreach on the UCSC Genome Browser.
The GenomeSpace site has been highlighted before in our “tips of the week”. We appreciated this site that pulled together a lot of different useful types of data sources and analysis strategies. On their site they describe their ethos as “Frictionless connection of bioinformatics tools”. Since that time (2012), it’s continued to grow and provide new features. So I was delighted to see that there was a new orientation video that they offered, and that is this week’s Video Tip of the Week.
Currently there are 20 tools connnected in GenomeSpace, many more than when we first looked. These include mining, visualizing, and workflow tools. This intro video focuses on a couple of them, GenePattern for demonstrating workflow, and Cytoscape for visualization. But you can see how the others would help support many types of genomics analyses.
This overview talks about their “recipes” concept, with step-by-step analysis protocols, which can be found here: http://recipes.genomespace.org . And there’s a demo of the recipe resource. There are some “official” recipes from their team, but they definitely want to have people contributing their own as well. Towards the end of the video they describe how to do that (~28min).
The one used to illustrate the features of the recipes includes a narrative description, but also the specific steps that would be employed. This has the GenePattern and Cytoscape steps examples that they use in the demo.
About half-way through, the demo of the analysis starts (~14min). It’s a helpful walk-through of how to use the recipes effectively to reproduce an analysis. Sara Garamszegi, our guide here, completes the pieces of the work that need to be done with GenePattern, and then shows how to pull out the file you generate from GenomeSpace for Cytoscape to use on your desktop.
There’s also a separate video of the question/answer section, so if you had some unresolved issues you might check if they were covered, or you can hear about how others might be considering using the tools. I often learn as much from the questions as from the formal presentation pieces. They have transcribed the issues in their video info section as well so you could just quickly scan them.
Follow them on Twitter for more like this, and you can also follow their YouTube channel:
Qu, K., Garamszegi, S., Wu, F., Thorvaldsdottir, H., Liefeld, T., Ocana, M., Borges-Rivera, D., Pochet, N., Robinson, J., Demchak, B., Hull, T., Ben-Artzi, G., Blankenberg, D., Barber, G., Lee, B., Kuhn, R., Nekrutenko, A., Segal, E., Ideker, T., Reich, M., Regev, A., Chang, H., & Mesirov, J. (2016). Integrative genomic analysis by interoperation of bioinformatics tools in GenomeSpace Nature Methods, 13 (3), 245-247 DOI: 10.1038/nmeth.3732
The ISB is a professional organization for biocurators
At OpenHelix, we’ve long sung the praises of curators. Some of us have been curators and worked with curation and database development teams. All of us have relied on quality information in the databases for research and teaching. But I think there are a lot of people who don’t understand the value of quality curation, how it’s done, and who curators are. They are widely taken for granted.
A recent talk by Claire O’Donovan of EBI-EMBL helps to explain the roles and the importance of biocurators. So although this talk isn’t a typical software talk, I think understanding this is crucial to everyone’s appreciation of how information you rely on gets into the databases you use. And if you find yourself in situations where you are guiding students, knowing about this career is also worthwhile.
Claire O’Donovan has had a front row seat to the development of this field, and has great enthusiasm for the future. And going forward, in your doctor’s office as precision medicine and treatments become a thing–how much do you want correct information in the databases? Mining data, standardizing language for descriptions of features, and sharing this information is crucial for all of us.
Here’s what’s covered in this video, from the agenda slide:
Introduction to the concept of biocuration.
The different kinds of biocurators, and the skill set needed.
Our community: Biocuration Society and conference.
The future of biocuration and career paths.
Specific examples of what curators do are illustrated (~6:30min). A sample UniProt entry illustrates what kind of information is captured and where it appears. She also touches on their work with Gene Ontology. And a bit about the ecosystem of curation, how teams at different resources help each other but don’t wish to duplicate work, using HGNC nomenclature as an example.
About 8min, the skill sets for biocuration are covered: data basics, curation skills, programming and database concepts, ontologies, and usability of the data collected. This also includes data access and management, as well as dissemination and outreach. This includes user training (yay!) and the concepts of data analysis for users.
There’s no formal degree path for curation practitioners at this point, and different groups will have different needs. But the community is begining to think about this, and about professional qualifications. She also mentioned a recent report from the National Academy of Sciences press on the topic of the future workforce skills and needs (linked below). This is an alternative career route for people with science training, and it’s important to understand not only the science but computational pieces. And it should be taken seriously as a discipline. There is now a journal that reflects this (also linked below).
Claire also takes a look at the future of biocuration, using the Center for Target Validation (CTTV) as an example. And she talks about the importance of quality information in medical records as we increasingly have genomic details in diagnosis and treatment situations. If we want precision medicine to work, we have to have the precise and correct information in the databases. So respect and value the curators. They are worth it. And if you know anyone that deserves special recognition–nominate!
Nominate your favorite @biocurator 2016 Career Award for sustained contributions to biocuration, email: email@example.com, due March20
COMMITTEE ON FUTURE CAREER OPPORTUNITIES AND EDUCATIONAL REQUIREMENTS FOR, & DIGITAL CURATION (2015). PREPARING THE WORKFORCE FOR DIGITAL CURATION National Academies Press : 10.17226/18590
Holliday, G., Bairoch, A., Bagos, P., Chatonnet, A., Craik, D., Finn, R., Henrissat, B., Landsman, D., Manning, G., Nagano, N., O’Donovan, C., Pruitt, K., Rawlings, N., Saier, M., Sowdhamini, R., Spedding, M., Srinivasan, N., Vriend, G., Babbitt, P., & Bateman, A. (2015). Key challenges for the creation and maintenance of specialist protein resources Proteins: Structure, Function, and Bioinformatics, 83 (6), 1005-1013 DOI: 10.1002/prot.24803
Gaudet, P., Munoz-Torres, M., Robinson-Rechavi, M., Attwood, T., Bateman, A., Cherry, J., Kania, R., O’Donovan, C., & Yamasaki, C. (2013). DATABASE, The Journal of Biological Databases and Curation, is now the official journal of the International Society for Biocuration Database, 2013 DOI: 10.1093/database/bat077
Lately, though, I’m seeing more resources that are offering JBrowse in addition to their existing GBrowse installations. Increasingly the announcements are coming out that GBrowse support is ending and the move to JBrowse is underway. And sometimes people with new genomes to release are going right to JBrowse, as I noticed with this one not too long ago: YersiniaBase (http://www.ncbi.nlm.nih.gov/pubmed/25591325).
This week, though, for our video tip we’ll have a look at the Araport video that was recently released that examines how to search JBrowse with short sequences, using their installation. If you are new to Araport, you can see a previous tip we did on the basics: Video Tip of the Week: Araport, Arabidopsis Portal. Like the other GMOD tools, a group with a given focus–in this case Arabidopsis, but it could be any species or topic–can take the basic framework for JBrowse and customize it to serve the needs of the research community. So although this is a plant genomics site, the foundations of the JBrowse software will be the same and you can expect similar features at other sites.
So start getting comfortable with JBrowse, as you’ll be seeing more and more of it as new species and research focus groups choose this as their browser. In fact, as I was preparing this tip yesterday, the new LotusBase JBrowse came along:
References: Krishnakumar, V., Hanlon, M., Contrino, S., Ferlanti, E., Karamycheva, S., Kim, M., Rosen, B., Cheng, C., Moreira, W., Mock, S., Stubbs, J., Sullivan, J., Krampis, K., Miller, J., Micklem, G., Vaughn, M., & Town, C. (2014). Araport: the Arabidopsis Information Portal Nucleic Acids Research, 43 (D1) DOI: 10.1093/nar/gku1200
Skinner, M., Uzilov, A., Stein, L., Mungall, C., & Holmes, I. (2009). JBrowse: A next-generation genome browser Genome Research, 19 (9), 1630-1638 DOI: 10.1101/gr.094607.109
For quite a while I’ve been watching the development of ContentMine. There have been a number of different ways to text-mine the scientific literature over the years. Most of the efforts I’m familiar with aim at a specific subset of the literature. This could be species-specific mining, topic-specific (such as interaction data, or a field like cancer or virology), to extract gene-related tidbits, and so on. Sometimes the tools have been limited to abstracts which are publicly available, which would miss much of the knowledge that’s embedded in the actual papers and lately the extraordinary “supplemental” sections–which are making me crazy because much of the key information I need on software tools is buried deep within those. But the philosophy of ContentMine is to go big across the entire realm of scientific publication–as they describe in their “about” page:
To make this a reality we are building software and training resources so that together we can liberate 100,000,000 facts from the scientific literature.
And they want to make all of this available to you, so you can pull out the subset that’s useful to your research. You can learn about their philosophy and strategies from this video, as well as some of the specific tasks that they have been working on to get to the point where people could use their resources and tools to extract information.
One of the things that always worried me about mining was how much of the information in images and tables and supplements wasn’t available. But they are also tackling this, as the video explains.
The reason this floated to the top of my “blog drafts” list, though, was because of this great and current example of using their resources for an emerging public health issue. They’ve got a sample video of accessing information related to the Zika virus that they’ve just released. I think it’s a nice concrete demonstration of how ContentMine can be quickly deployed on a topic to pull out relevant research details.
So have a look at their project. There are details about specific tools that have also been written about–linked below. And there are more videos from their YouTube and Vimeo collections that can help you to learn more. Some are longer, and some are more specific for a task. Thre’s a lot more information at their site as well. They are eager to help people get the most out of the literature. You should have a look and see how it can help you–and maybe how you can help them.
Smith-Unna, R., & Murray-Rust, P. (2014). The ContentMine Scraping Stack: Literature-scale Content Mining with Community-maintained Collections of Declarative Scrapers D-Lib Magazine, 20 (11/12) DOI: 10.1045/november14-smith-unna
Murray-Rust, P., Smith-Unna, R., & Mounce, R. (2014). AMI-diagram: Mining Facts from Images D-Lib Magazine, 20 (11/12) DOI: 10.1045/november14-murray-rust