I have a soft spot for Flybase. My Ph.D. work used Drosophila and I’ve used Drosophila species to teach after that. Something about Dipteran genetics fascinates me. FlyBase is also one of the older genetics and genomics databases and we’ve got a tutorial on it. Today’s tip is their 12 minute video of FlyBase for Undergrads. One of the things I always believed is that the databases and analysis tools we train on and come across in our daily work are excellent places for teaching and learning genomics for undergraduates. Lots of data and lots of analysis that would make very interesting projects and experiences that an undergraduate could do.
Today’s video starts off with a kind of silly live-action sequence :D, but fun silly, and walks through FlyBase on an introductory level. Check it out.
They have a youtube channel with two additional (and much shorter) videos on using TermLink, a controlled vocabulary search tool, and an introduction to Fast-Track, a community paper curation tool.
NCBI was created in 1988 and has maintained the GenBank database for years. They also provide many computational resources and data retrieval systems for many types of biological data. As such they know all too well how quickly the data that biologists collect has changed and expanded. As uses for various data types have been developed, it has become obvious that new types of information (such as expanded metadata) need to be collected, and new ways of handling data are required.
NCBI has been adapting to such needs throughout the years and recently has been adapting its genome resources. Today’s tip will be based on some of those changes. My video will focus on the “completely redesigned Genome site”, which was recently rolled out and announced in the most recent NCBI newsletter. I haven’t found a publication describing the changes, but the newsletter goes into some detail and the announcement found at the top of the Genome site (& that I point out in the video) has very helpful details about the changes.
As you will see in the announcement, the Genome resource is not the only related resource to have undergone changes recently, including the redesign of the Genome Project resource into the BioProject resource and the creation of the BioSample resource. I won’t have time to go into detail about those two resources but at the end of my post I will link to two recent NCBI publications that came out in Nucleic Acids Research this month – these are good resources to read for more information on BioProject, BioSample, and on the NCBI as a whole. For a historical perspective I also link to the original Genome reference, which is in Bioinformatics and currently free to access.
Some of the changes are very interesting, including that “Single genome records now represent an organism and not a genome for one isolate.” The NCBI newsletter states that “Major improvements include a more natural organization at the level of the organism for prokaryotic, eukaryotic, and viral genomes. Reports include information about the availability of nuclear or prokaryotic primary genomes as well as organelles and plasmids. ” There’s also a note that “Because of the reorganization to a natural classification system, older genome identifiers are no longer valid. Typically these genome identifiers were not exposed in the previous system and were used mainly for programmatic access. ” That makes me wonder what changes this will mandate to other NCBI’s resources, as well as external resources. I haven’t seen any announcements on that yet, so I’ll just have to stay tuned & check around often.
Enjoy the tip & let us, or NCBI, know what you think of their changes!
Historic Entrez Genome reference: Tatusova, T., Karsch-Mizrachi, I., & Ostell, J. (1999). Complete genomes in WWW Entrez: data representation and analysisBioinformatics, 15 (7), 536-543 DOI: 10.1093/bioinformatics/15.7.536
Barrett, T., Clark, K., Gevorgyan, R., Gorelenkov, V., Gribov, E., Karsch-Mizrachi, I., Kimelman, M., Pruitt, K., Resenchuk, S., Tatusova, T., Yaschenko, E., & Ostell, J. (2011). BioProject and BioSample databases at NCBI: facilitating capture and organization of metadataNucleic Acids Research DOI: 10.1093/nar/gkr1163
Sayers, E., Barrett, T., Benson, D., Bolton, E., Bryant, S., Canese, K., Chetvernin, V., Church, D., DiCuccio, M., Federhen, S., Feolo, M., Fingerman, I., Geer, L., Helmberg, W., Kapustin, Y., Krasnov, S., Landsman, D., Lipman, D., Lu, Z., Madden, T., Madej, T., Maglott, D., Marchler-Bauer, A., Miller, V., Karsch-Mizrachi, I., Ostell, J., Panchenko, A., Phan, L., Pruitt, K., Schuler, G., Sequeira, E., Sherry, S., Shumway, M., Sirotkin, K., Slotta, D., Souvorov, A., Starchenko, G., Tatusova, T., Wagner, L., Wang, Y., Wilbur, W., Yaschenko, E., & Ye, J. (2011). Database resources of the National Center for Biotechnology InformationNucleic Acids Research DOI: 10.1093/nar/gkr1184
Today’s tip will revisit the database and redo a search that was done in the tip from 2009, this time using a protein search instead of a category search. Gnad, F., Gunawardena, J., & Mann, M. (2010). PHOSIDA 2011: the posttranslational modification database Nucleic Acids Research, 39 (Database) DOI: 10.1093/nar/gkq1159
MycoCosm is a fungal genomics database and browser at JGI, home of a lot of great resources. This week’s tip is from their video tips, which are useful. MycoCosm includes browsers of annotated data of many fungal genomes, KEGG pathway data, synteny data and much more. Their list of video tips include an introduction to the resource, functional annotation browser and protein page intro. Here is the one to get you started on the browser (unfortunately, there is no embed capability, so the image here will take you to the tutorial). These video tips are longer than our usual 5 minute tips ranging from 6 minutes (intro) to just over 30 (browser).
microRNAs have become a rich source of research as they probably have a huge effect on gene expression and disease. The human genome may encode over 1,000 miRNAs that target over half of our genes. They might be implicated in a lot of common diseases (which not yet have been picked up in GWAS studies?). They are a fascinating area of biology that has only come of it’s on in the last decade. As such, the number of databases to catalog miRNAs is large. Today’s tip is on a new one, RepTar, which is reported in the upcoming NAR database issue. The niche RepTar is attempting to fill is to get predictions of miRNAs more comprehensive by including new research in the algorithm. This new research suggests there are more possible target sites than previously thought. As mentioned in the article,
Recently, the miRNA binding options were expanded further with the identification of ‘centered sites’, functional miRNA target sites that lack both perfect seed pairing and 3′-compensatory pairing and instead exhibit pairing with the target along 11–12 contiguous pairs at the center of the miRNA (4). While some algorithms relaxed the evolutionary conservation criterion (5–11) and/or offer also predictions of 3′-compensatory sites [e.g. (6,12,13)], few databases offer predictions of the whole repertoire of miRNA targeting patterns. Furthermore to date, no database lists genome-wide prediction of cellular targets of viral miRNAs. These miRNAs lack significant evolutionary conservation and their targets are not necessarily expected to be evolutionarily conserved. In addition, the few identified viral miRNA targets have shown both conventional seed binding and 3′-compensatory binding [e.g. (3,14)].
Here we present a database of genome-wide miRNA target predictions for mouse and human genes, based on the predictions of our novel target prediction algorithm, RepTar
I’ll leave the predictive value up to miRNA researchers, but I thought I’d introduce the site.
While I’m at it, allow me to list a few other miRNA sites from labs and institutes as far flung as China, Italy, Israel, Canada and the U.S.. Perhaps someday I’ll do a comparison.
Elefant, N., Berger, A., Shein, H., Hofree, M., Margalit, H., & Altuvia, Y. (2010). RepTar: a database of predicted cellular targets of host and viral miRNAs Nucleic Acids Research DOI: 10.1093/nar/gkq1233
Enter the NCBI Image database. This very new database includes over 3 million images that are found in the full-text resources (i.e. PubMed Central) at NCBI. So, I did a search for “drosophila phylogeny” and found some great images and figures. The results will not only pull out the figure, but also the figure legend. I got over 200 results. The links in the search result figure titles take you directly to the figure. Below the legend you can see links to the full text. It’s a great start to searching figures and figure legends.
Of course, as stated, not all articles will have images in the database, only those deposited in PubMed Central. You’ll find a lot of your searches won’t have this image strip because the journal isn’t deposited there . But with 3 million images and more journal articles going to PMC every day, this database and feature of PubMed could prove to be quite useful.
Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…
We had a tip last week on converting genome coordinates using Galaxy. This week I’d like to introduce you to the Galaxy interface. This screencast was actually done by one of the developers of Galaxy and is a quick introduction to the interface. We are currently working with Galaxy on a longer introduction to the tool, but thought I’d give you taste of it here. Galaxy is an excellent analysis tool. It’s not a database, but rather a tool to analyze data you can obtain from other sources and allows you to save your workflows and many other tools that help you analyze and collaborate. (if the movie to the left doesn’t load, try this link to view the movie).
Comprehensive tutorials on the ASTD, Entrez Protein, and MMDB databases enable researchers to quickly and effectively use these invaluable variation resources.
Seattle, WA September 24, 2008 — OpenHelix today announced the availability of new tutorial suites on the Alternative Splicing and Transcript Diversity (ASTD) database, Entrez Protein and the Molecular Modeling Database (MMDB). ASTD is an European Bioinformatics Institute (EBI) resource for alternative splice events and transcripts for the human, mouse, and rat systems. Entrez protein is a comprehensive database of protein information brought to you by the National Center for Biotechnology Information (NCBI). MMDB is another NCBI resource which contains an extensive collection of three-dimensional protein structures with detailed annotation that can be used to learn about the structure and function of many proteins. Together these three tutorials give the researcher an excellent set of resources to carry their research from transcript to 3d protein structure.
The tutorial suites, available for single purchase or through a low-priced yearly subscription to all OpenHelix tutorials, contain a narrated, self-run, online tutorial, slides with full script, handouts and exercises. With the tutorials, researchers can quickly learn to effectively and efficiently use these resources. These tutorials will teach users:
to perform Quick and Advanced searches
to navigate gene and transcript report pages
to predict intron/exon boundaries and likely regulatory protein binding site
to search manually curated data regarding alternate splicing
to perform basic and advanced searches utilizing the many available tools and options
to understand the protein records and exploit the many internal and external links you are provided with
to explore some of the resources provided by the NCBI network of databases, such as “My NCBI”
to search MMDB using both basic and advanced query techniques
to understand the detailed results you obtain
to visualize and manipulate structures using NCBI’s Cn3D structural viewer
OpenHelix, LLC, provides the genomics knowledge you need when you need it. OpenHelix currently provides online self-run tutorials and on-site training for institutions and companies on the most powerful and popular free, web based, publicly accessible bioinformatics resources. In addition, OpenHelix is contracted by resource providers to provide comprehensive, long-term training and outreach programs.
Comprehensive tutorials on the model organism databases ZFIN, SGD and PlantGDB and GBrowse, a model organism genome browser, enable researchers to quickly and effectively use these invaluable resources.
Seattle, WA September 15, 2008 — OpenHelix today announced the availability of new tutorial suites on several model organism resources including Zebrafish Information Network (ZFIN), Saccharomyces Genome Database (SGD) and the Plant Genome Database (PlantGDB) and also a tutorial using genome browsers with GBrowse. These four tutorials expand OpenHelix’s model organism database training which now also includes tutorials on MGI (mouse), FlyBase (drosophila), Gramene (grasses), RGD (rat), WormBase and more to come soon. Model organisms are integral to our understanding of basic biology and modern biomedical research. ZFIN is a collection of data, tools, and resources on the zebrafish (Danio rerio), a popular model organism for developmental biology and genetics research and SGD is a collection of data, tools and analyses centered around Saccharomyces cerevisiae, commonly known as bakers’ or budding yeast. PlantGDB is the primary resource for plant comparative genomics.
Additionally, OpenHelix has added a tutorial on GBrowse, a web application that allows you to explore genomic sequences together with annotated data. GBrowse is rapidly becoming the genomic browser of choice amongst model organism databases, because the browser is both universal and yet customizable.
The tutorial suites, available for single purchase or through a low-priced yearly subscription to all OpenHelix tutorials, contain a narrated, self-run, online tutorial, slides, handouts and exercises. With the tutorials, researchers can quickly learn to effectively and efficiently use these resources. These tutorials will teach users:
to perform effective searches and understand the displays
to access advanced searches enabling multifaceted queries
to use the various databases of genes and markers, expression data, mutant genotype/phenotype details, ontologies, and more
to investigate many related resources associated with ZFIN
to navigate the SGD site, locate Basic and Advanced Search options, and use the site map to access additional search tools
perform the two Basic SGD Quick and Text Search types and understand the displays
to navigate the SGD Locus Page and access data from a variety of tools, tabs, and links
to investigate many related resources associated with SGD
to perform quick searches and navigate sequence pages
to conduct BLAST searches across several plant species of your choice
to create exon/intron gene predictions and sequence alignments
to construct tables displaying highly varied information from many datasets
the basic layout and search methods at GBrowse
how to access detailed annotation data tied to genomic sequences
how to select and customize annotations using Tracks
how to upload and incorporate your own data or other external data sources
take a tour of different GBrowse installations at model organism databases