VISTA has added a couple of new features to their great comparative genomics resource, dot plots and synteny browser. They are excellent features and additions, but they are not yet easy to find from the homepage. Today’s tip I’m going to show you where they are and take a quick look at what they do. If you want to look at comparative genomics and synteny, you’ll want to check out this feature. They linked from Vista-Point (which was added early last year), which you can learn more about in the open-access tutorial here. To learn more about what they do and how to use them, check out the VISTA help section linked here.
As you may know, we’ve been doing tips-of-the-week for three years now. We have completed around 150 little tidbit introductions to various resources. At the end of the year we’ve established a sort of holiday tradition: we are doing a summary post to collect them all. If you have missed any of them it’s a great way to have a quick look at what might be useful to your work.
July 7: Mint for Protein Interactions, an introduction to MINT to study protein-protein interactions
July 14: Introduction to Changes to NCBI’s Protein Database, as it states
July 21: 1000 Genome Project Browser, 1000 Genomes project has pilot data out, this is the browser.
July 28: R Genetics at Galaxy, the Galaxy analysis and workflow tool added R genetics analysis tools.
August 4: YeastMine, SGD adds an InterMine capability to their database search.
August 11: Gaggle Genome Browser, a tool to allow for the visualization of genomic data, part of the “gaggle components”
August 18: Brenda, comprehensive enzyme information.
August 25: Mouse Genomic Pathology, unlike other tips, this is not a video but rather a detailed introduction to a new website.
September 1: Galaxy Pages, and introduction to the new community documentation and sharing capability at Galaxy.
September 8: Varitas. A Plaid Database. A resource that integrates human variation data such as SNPs and CNVs.
September 15: CircuitsDB for TF/miRNA/gene regulation networks.
September 21: Pathcase for pathway data.
September 29: Comparative Toxicogenomics Database (CTD), VennViewer. A new tool to create Venn diagrams to compare associated datasets for genes, diseases or chemicals.
October 6: BioExtract Server, a server that allows researcher to store data, analyze data and create workflows of data.
October 13: NCBI Epigenomics, “Beyond the Genome” NCBI’s site for information and data on epigenetics.
October 20: Comparing Microbial Databases including IMG, UCSC Microbial and Archeal browsers, CMR and others.
October 27: iTOL, interactive tree of life
November 3: VISTA Enhancer Browser explore possible regulatory elements with comparative genomics
November 10: Getting canonical gene info from the UCSC Browser. Need one gene version to ‘rule them all’?
November 17: ENCODE Data in the UCSC Genome Browser, an entire 35 minute tutorial on the ENCODE project.
November 24: FLink. A tool that links items in one NCBI database to another in a meaningful and weighted manner.
LBL servers are down today. If you’ve been trying to use VISTA (the comparative genomics tools) , you will get a “server not found” error. We’ve contacted someone over there and were informed they were having hardware issues. The site will be available within a day or so.
In the meant time, you can view VISTA enhancers at the UCSC Genome Browser
UPDATE (by Mary, Tuesday morning ET): Site appears to be back up for me http://genome.lbl.gov/vista/index.shtml Go compare sequences! Or browse enhancers!
At the recent (and excellent) Beyond the Genome 2010 conference, Len Pennachio gave a talk about the VISTA Enhancer Browser that reminded me how much I have always liked this project. It’s the kind of project I’d do if I had a lab: it takes the computational data we’ve been accumulating + developmental biology bench techniques = cool new insights into the function of conserved regions of the genome that we previously didn’t know much about.
The foundation of the project is that we’ve got a number of species genomic sequences that we can compare–and the VISTA suite offers a number of ways to perform these types of comparative genomics analyses and provides really nice visualizations of the data (we’ve got a free tutorial sponsored by VISTA that you can watch to see how it works). You can see peaks of high conservation across multiple species, which suggest there’s something important going on in that region. But when they are outside of the gene region per se, it’s not always obvious what the sequence represents–but the idea is that they may be cis-regulatory elements. So the Enhancer Browser team clones out those regions, and hooks them to reporter constructs. The constructs are placed into mouse oocytes and then put into pseudopregnant mice, and the embryos are examined on day 11 to see if there is an interesting pattern of expression of the reporter construct. Now, these are subject to limitations: it’s one time point they are examining so earlier or later activity is not known. And it’s possible that integration of the construct has affected expression (in positive or negative ways). But they examine multiple embryos for each construct to work around that location effect.
This data is accumulated and becomes available in the Enhancer Browser. You can search by genes of interest to see if a region near your favorite gene has been examined. Or you can examine them by tissue/localization pattern if there’s a developmental time point you may be interested in. To get a quick sense of the kind of things you can find take a look at the handy Gallery set of images. There are various ways to search or browse the data. That’s what I’ll be introducing in the Tip movie this week.
But they also “enhanced” this project but adding another technique to the process. Beyond the computational identification of conserved regions, they also began to do ChIP-Seq to pull down sequences that are bound to the p300 protein in embryos in various tissues of the embryo. That’s illustrated nicely in Figure 1 of the second paper. They obtain the sequence of those pieces and put those into eggs as well, and the rest of the process is similar. So the starting point is different: this is protein-bound sequences to start with, from a given tissue. But it also seems to be identifying working elements that can influence spatial and temporal expression of the reporter constructs. They say it has increased their success in finding working elements by 5x to 16x.
So I think this is a great way to use computational techniques and bench work in a pretty-big-data way. It’s not easy to do the mouse benchwork part so it’s not quite as big as a pure sequencing foray. But it’s exactly the kind of project I’d design if I had access to a lab. I have a different topic I’d be interested in, but the same kinds of strategies would be useful for that as well.
Anyway–explore the Enhancer Browser to learn more about these possible regulatory elements.
Visel, A., Blow, M., Li, Z., Zhang, T., Akiyama, J., Holt, A., Plajzer-Frick, I., Shoukry, M., Wright, C., Chen, F., Afzal, V., Ren, B., Rubin, E., & Pennacchio, L. (2009). ChIP-seq accurately predicts tissue-specific activity of enhancers Nature, 457 (7231), 854-858 DOI: 10.1038/nature07730
Visel, A., Minovitsky, S., Dubchak, I., & Pennacchio, L. (2007). VISTA Enhancer Browser–a database of tissue-specific human enhancers Nucleic Acids Research, 35 (Database) DOI: 10.1093/nar/gkl822
Sorry for the sparseness of late. We were all over the place doing UCSC Genome Browser (we do intro + advanced), ENCODE, and Galaxy workshops. At NIH we also did IMG and VISTA (Man, that security at NIH is fierce….). Trey is still on the road, in fact, doing the training in Morocco.
Ok, you couldn’t be there…but all of those trainings are available on our web site right now, except for ENCODE. The same material we do in the online materials is what we do in workshops. The only one of those that requires subscription is IMG. And you won’t find ENCODE as a stand-alone tutorial yet–but that’s coming. We now have sent the script to the studio and we’ll be assembling that soon.
I do want to mention one thing that we think is interesting, and we see in almost every training we do. Nearly every time, more than half of the attendees at our trainings are female. Based on what you read about women falling out of the pipeline in science, you’d think there would be no way we’d even get 50%. But generally it is more than half women in these trainings. (We have the data if anyone can think of a way we can use that to get a grant )
Our current theory is that women are more likely to admit they could use the training (something like asking for directions…you know…?). Or do men prefer documentation? We don’t know. What’s your theory?
Free Tutorial Suite available from OpenHelix on the VISTA bioinformatics resource.
Bainbridge Island, WA (PRWEB) March 1, 2010 — The Lawrence Berkeley National Laboratory (LBNL) announces an updated, free OpenHelixTM tutorial suite on the VISTA bioinformatics resources (http://genome.lbl.gov/vista).
VISTA’s focus is comparative genomics, providing a comprehensive suite of programs and databases for biomedical researchers. VISTA includes excellent tools to examine genomic sequences—coding, non-coding, and important regulatory regions like transcription factor binding sites.
|Our history with OpenHelix has proven that their tutorial suite is an excellent and cost-effective method for us to provide that training.|
VISTA recently added VISTA Point which combines capabilities of the three tools currently available at the site – VISTA Gateway, VISTA Browser, and Text Browser. VISTA Point makes analyzing multiple and pairwise genome alignments and extracting relevant numerical data much more straightforward.
“As we continually update and improve the VISTA suite of tools, it is critical we give users the training they need to use the tools efficiently and effectively,” said Inna Dubchak, Principal Investigator for the VISTA project, “Our history with OpenHelix has proven that their tutorial suite is an excellent and cost-effective method for us to provide that training.”
The online narrated tutorial (www.openhelix.com/vista), which runs in just about any browser, can be viewed from beginning to end or navigated using chapters and forward and backward sliders. The approximately 60 minute tutorial highlights and explains the features and functionality needed to start using VISTA effectively. The tutorial can be used by new users to introduce them to VISTA, or by previous users to view new features and functionality or simply as a reference tool to understand specific features.
In addition to the tutorial, VISTA users can also access useful training materials including the animated PowerPoint slides used as a basis for the tutorial, suggested script for the slides, slide handouts, and exercises. This can save a tremendous amount time and effort for teachers and professors to create classroom content.
In addition to the VISTA tutorial, OpenHelix offers nearly 90 tutorial suites on some of the most powerful and popular bioinformatics and genomics tools available on the web. Some of the tutorials suites are freely available through support from the resource providers. The whole catalog of tutorials suites is available through a subscription. Users can view the tutorials and download the free materials atwww.openhelix.com. ;
About VISTA and LBNL
VISTA family of tools has been developed and hosted at Genomics Division of Lawrence Berkeley National Laboratory. This project was originally supported by the Programs for Genomic Applications grant from the NHLBI/NIH and is currently supported by the Office of Biological and Environmental Research, Office of Science, US Department of Energy.
Lawrence Berkeley National Laboratory (Berkeley Lab) has been a leader in science and engineering research for more than 70 years. Located on a 200 acre site in the hills above the University of California’s Berkeley campus, adjacent to the San Francisco Bay, Berkeley Lab holds the distinction of being the oldest of the U.S. Department of Energy’s National Laboratories. The Lab is managed by the University of California, operating with an annual budget of more than $500 million and a staff of about 3,800 employees, including more than 500 students.
OpenHelix, LLC, (www.openhelix.com) provides a bioinformatics and genomics search and training portal, giving researchers one place to find and learn how to use resources and databases on the web. The OpenHelix Search portal searches hundreds of resources, tutorial suites and other material to direct researchers to the most relevant resources and OpenHelix training materials for their needs. Researchers and institutions can save time, budget and staff resources by leveraging a subscription to nearly 100 online tutorial suites available through the portal. More efficient use of the most relevant resources means quicker and more effective research.
Our first guest post in our new semi-regular Guest Post series is from Inna Dubchak , principal investigator at the LBNL/JGI group, developers of the VISTA comparative genomics resource (who sponsors a tutorial, free to the users). If you are a provider of a free, publicly available genomics tool, database or resource and would like to convey something to users on our guest post feature, please feel free to contact us at wlathe AT openhelix DOT com.
I would like to give you a heads up on some new VISTA updates and ongoing development!
Updates: As you probably know from this blog, a new, still free VISTA tutorial is available now. We have introduced a lot of updates to these tools - built new programs, improved the existing ones, and entirely changed the design of the site to make it more up-to-date and convenient.
Main addition to the site – VISTA Point – combines capabilities of the three tools currently available at the site – VISTA Gateway, VISTA Browser, and Text Browser usually used step-by-step. VISTA Point makes analyzing multiple and pairwise genome alignments and extracting relevant numerical data much more straightforward, it is easy to update, expand and add new programs.
Soon: We are actively working on visualizing synteny at scales ranging from whole-genome alignment to the conservation of individual genes, with seamless navigation across different levels of resolution. In our upcoming VISTA-Dot tool we used the concept of two-dimensional “dot-plots”, historically employed in the analysis of local alignment, and an interactive Google-map-like interface to visualize whole-genome alignments. You will be able to get a display and analyze large-scale duplication in plants in one click! It can also be useful in genome assembly and finishing. Another addition coming in the near future, VISTA Synteny Viewer, presents a novel interface as three cross-navigable panels representing different scales of the alignment.
Attention: do not forget to use our whole-genome capabilities – Whole-genome VISTA to align sequence of any quality, from draft to finished, up to 10MB long, and Whole Genome rVISTA to evaluate which transcription factor binding sites (TFBS) are over-represented in upstream regions in a group of genes.
Greetings! OpenHelix Blog is instituting a new semi-weekly feature. Every Wednesday we have our “Tip of the Week,” on Thursdays we have our “What’s Your Problem,” and now on an occasional Tuesdays we are going to have our “Provider Guest Post.” These will be posts from providers of genomics tools and database and will be opinions, updates and upcoming features of the resource, whatever the provider of the resource would like to convey to users. We have several lined up for the coming weeks, so keep checking back.
Additionally, if you are a developer or provider of an free, publicly available genomics or biological resource, database or analysis tool and would like to post in our guest feature, be it an introduction to your tool, updates or upcoming features or even an opinion about the current state of genomics research and data, please write us at wlathe AT openhelix DOT com. We would love to put you in the queue for the next guest post.
Our first guest post next Tuesday will be from Inna Dubchak , principal investigator at the LBNL/JGI group, developers of the VISTA comparative genomics resource (who sponsors a tutorial, free to the users). She’ll discuss some new tools at VISTA and give you a quick preview of some new upcoming features.
The VISTA comparative genome analysis resource updated their interface a few months ago. Additionally, they’ve added VISTA-Point (which replaces and greatly extends VISTA text browser) which, as the site says, allows the user to:
Access complete data and visual presentation of pairwise and multiple alignments of whole genome assemblies.
The homepage has undergone a very nice redesign. Much of the underlying VISTA browser and other tools functionality and use is similar (though updated of course). We understand also that there will be upcoming updates to some tools and the addition of others. Look for that here :D.
Also, we’ve updated our tutorial to reflect the new site and functions. As before, this tutorial is free to users and sponsored by VISTA. Check it out.
So I’m all excited about the genome festival that I’m seeing, related to the publication of the new sequence version of corn. You can access the main paper in Science, and there’s a very neat diagram in figure 1 that is like looking across time at the sequence data and into the corn nebula. But the thing that cracked me up was this line from the abstract:
Nearly 85% of the genome is composed of hundreds of families of transposable elements, dispersed nonuniformly across the genome.
That means 85% of corn isn’t corn!! And what business do those elements have messing with the genomes?? I am told all the time that messing with plant genomes is wrong and unnatural. Heh.
For full coverage of the big news today I’ll point you to James and the Giant Corn (appropriately enough) who seems to be the CNN (Corn News Network) of 24-hour coverage of many aspects of the work.
I spent my morning looking over the PLoS Maize Special Collection papers, including the intriguing appetizer: 10 Reasons to be Tantalized by the B73 Maize Genome. But I spent longer looking at the CNVs and PAVs paper. I’ve been thinking about CNVs a lot lately, and was interested to see this covered in a non-mammalian species.
Figure 1 is a nice example of how to use VISTA for effective displays in comparative genomics. (If you haven’t used VISTA before you might check out our sponsored free tutorial on that–we are currently working with the VISTA team to update that with their new features too.)
There’s a really striking segment of chromosome 6 that appears to be present in one of the strains they examine and absent in the other (illustrated in figure 4). And it looks like it has genes that are expressed and active in the B73 strain. The ongoing investigation of that is pretty intriguing as well.
The structural variations are not evenly distributed across the genomes. Some places have large occurrences, and some are untouched. It’s clear that just in these two strains there’s a lot more structural diversity than in other species that have been examined:
In the human, rat, dog, mouse, macaque and chimpanzee genomes the average number of CNVs between two individuals is between 15 and 75 –. A high resolution study of eight human genomes  revealed only several hundred insertions and deletions, including CNV and PAV sequences, in the comparison of any two human genomes. In contrast, even after very stringent filtering we identified >3,700 CNV or PAV sequences that represent at least 2,000 events between these two maize genomes.
Emphasis mine. Plants are so much more flexible, apparently….
This is going to lead to some neat clues on heterosis (or hybrid vigor) as the research proceeds with these new tools. What a great time to be a plant scientist. There are some very exciting projects coming along with the tools of genomics.
What I couldn’t locate was any reference to a CNV database (like DGV or CHOP CNV) where you can examine the whole set. I’ll dig through the supplement data to see if I can find out more on that. But I wanted get this post out to celebrate the very nice work and collection of papers on this project. Congrats to the teams involved!
Springer, N., Ying, K., Fu, Y., Ji, T., Yeh, C., Jia, Y., Wu, W., Richmond, T., Kitzman, J., Rosenbaum, H., Iniguez, A., Barbazuk, W., Jeddeloh, J., Nettleton, D., & Schnable, P. (2009). Maize Inbreds Exhibit High Levels of Copy Number Variation (CNV) and Presence/Absence Variation (PAV) in Genome Content PLoS Genetics, 5 (11) DOI: 10.1371/journal.pgen.1000734
Schnable, P., Ware, D., Fulton, R., Stein, J., Wei, F., Pasternak, S., Liang, C., Zhang, J., Fulton, L., Graves, T., Minx, P., Reily, A., Courtney, L., Kruchowski, S., Tomlinson, C., Strong, C., Delehaunty, K., Fronick, C., Courtney, B., Rock, S., Belter, E., Du, F., Kim, K., Abbott, R., Cotton, M., Levy, A., Marchetto, P., Ochoa, K., Jackson, S., Gillam, B., Chen, W., Yan, L., Higginbotham, J., Cardenas, M., Waligorski, J., Applebaum, E., Phelps, L., Falcone, J., Kanchi, K., Thane, T., Scimone, A., Thane, N., Henke, J., Wang, T., Ruppert, J., Shah, N., Rotter, K., Hodges, J., Ingenthron, E., Cordes, M., Kohlberg, S., Sgro, J., Delgado, B., Mead, K., Chinwalla, A., Leonard, S., Crouse, K., Collura, K., Kudrna, D., Currie, J., He, R., Angelova, A., Rajasekar, S., Mueller, T., Lomeli, R., Scara, G., Ko, A., Delaney, K., Wissotski, M., Lopez, G., Campos, D., Braidotti, M., Ashley, E., Golser, W., Kim, H., Lee, S., Lin, J., Dujmic, Z., Kim, W., Talag, J., Zuccolo, A., Fan, C., Sebastian, A., Kramer, M., Spiegel, L., Nascimento, L., Zutavern, T., Miller, B., Ambroise, C., Muller, S., Spooner, W., Narechania, A., Ren, L., Wei, S., Kumari, S., Faga, B., Levy, M., McMahan, L., Van Buren, P., Vaughn, M., Ying, K., Yeh, C., Emrich, S., Jia, Y., Kalyanaraman, A., Hsia, A., Barbazuk, W., Baucom, R., Brutnell, T., Carpita, N., Chaparro, C., Chia, J., Deragon, J., Estill, J., Fu, Y., Jeddeloh, J., Han, Y., Lee, H., Li, P., Lisch, D., Liu, S., Liu, Z., Nagel, D., McCann, M., SanMiguel, P., Myers, A., Nettleton, D., Nguyen, J., Penning, B., Ponnala, L., Schneider, K., Schwartz, D., Sharma, A., Soderlund, C., Springer, N., Sun, Q., Wang, H., Waterman, M., Westerman, R., Wolfgruber, T., Yang, L., Yu, Y., Zhang, L., Zhou, S., Zhu, Q., Bennetzen, J., Dawe, R., Jiang, J., Jiang, N., Presting, G., Wessler, S., Aluru, S., Martienssen, R., Clifton, S., McCombie, W., Wing, R., & Wilson, R. (2009). The B73 Maize Genome: Complexity, Diversity, and Dynamics Science, 326 (5956), 1112-1115 DOI: 10.1126/science.1178534