Tag Archives: database

Tip of the Week: FlyBase

I have a soft spot for Flybase. My Ph.D. work used Drosophila and I’ve used Drosophila species to teach after that. Something about Dipteran  genetics fascinates me. FlyBase is also one of the older genetics and genomics databases and we’ve got a tutorial on it. Today’s tip is their 12 minute video of FlyBase for Undergrads. One of the things I always believed is that the databases and analysis tools we train on and come across in our daily work are excellent places for teaching and learning genomics for undergraduates. Lots of data and lots of analysis that would make very interesting projects and experiences that an undergraduate could do.

Today’s video starts off with a kind of silly live-action sequence :D, but fun silly, and walks through FlyBase on an introductory level. Check it out.

They have a  youtube channel with two additional (and much shorter) videos on using TermLink, a controlled vocabulary search tool, and an introduction to Fast-Track, a community paper curation tool.


Video Tip of the Week: Big Changes to NCBI’s Genome Resources

NCBI was created in 1988 and has maintained the GenBank database for years. They also provide many computational resources and data retrieval systems for many types of biological data. As such they know all too well how quickly the data that biologists collect has changed and expanded. As uses for various data types have been developed, it has become obvious that new types of information (such as expanded metadata) need to be collected, and new ways of handling data are required.

NCBI has been adapting to such needs throughout the years and recently has been adapting its genome resources. Today’s tip will be based on some of those changes. My video will focus on the “completely redesigned Genome site”, which was recently rolled out and announced in the most recent NCBI newsletter. I haven’t found a publication describing the changes, but the newsletter goes into some detail and the announcement found at the top of the Genome site (& that I point out in the video) has very helpful details about the changes.

As you will see in the announcement, the Genome resource is not the only related resource to have undergone changes recently, including the redesign of the Genome Project resource into the BioProject resource and the creation of the BioSample resource. I won’t have time to go into detail about those two resources but at the end of my post I will link to two recent NCBI publications that came out in Nucleic Acids Research this month – these are good resources to read for more information on BioProject, BioSample, and on the NCBI as a whole. For a historical perspective I also link to the original Genome reference, which is in Bioinformatics and currently free to access.

Some of the changes are very interesting, including that “Single genome records now represent an organism and not a genome for one isolate.” The NCBI newsletter states that “Major improvements include a more natural organization at the level of the organism for prokaryotic, eukaryotic, and viral genomes. Reports include information about the availability of nuclear or prokaryotic primary genomes as well as organelles and plasmids. ” There’s also a note that “Because of the reorganization to a natural classification system, older genome identifiers are no longer valid. Typically these genome identifiers were not exposed in the previous system and were used mainly for programmatic access. ” That makes me wonder what changes this will mandate to other NCBI’s resources, as well as external resources. I haven’t seen any announcements on that yet, so I’ll just have to stay tuned & check around often.

Enjoy the tip & let us, or NCBI, know what you think of their changes! :)

Quick Links:

NCBI Homepage: http://www.ncbi.nlm.nih.gov/

Entrez Genome Resource Homepage: http://www.ncbi.nlm.nih.gov/genome

BioProject Resource Homepage: http://www.ncbi.nlm.nih.gov/bioproject/


Historic Entrez Genome reference: Tatusova, T., Karsch-Mizrachi, I., & Ostell, J. (1999). Complete genomes in WWW Entrez: data representation and analysis Bioinformatics, 15 (7), 536-543 DOI: 10.1093/bioinformatics/15.7.536

Barrett, T., Clark, K., Gevorgyan, R., Gorelenkov, V., Gribov, E., Karsch-Mizrachi, I., Kimelman, M., Pruitt, K., Resenchuk, S., Tatusova, T., Yaschenko, E., & Ostell, J. (2011). BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata Nucleic Acids Research DOI: 10.1093/nar/gkr1163

Sayers, E., Barrett, T., Benson, D., Bolton, E., Bryant, S., Canese, K., Chetvernin, V., Church, D., DiCuccio, M., Federhen, S., Feolo, M., Fingerman, I., Geer, L., Helmberg, W., Kapustin, Y., Krasnov, S., Landsman, D., Lipman, D., Lu, Z., Madden, T., Madej, T., Maglott, D., Marchler-Bauer, A., Miller, V., Karsch-Mizrachi, I., Ostell, J., Panchenko, A., Phan, L., Pruitt, K., Schuler, G., Sequeira, E., Sherry, S., Shumway, M., Sirotkin, K., Slotta, D., Souvorov, A., Starchenko, G., Tatusova, T., Wagner, L., Wang, Y., Wilbur, W., Yaschenko, E., & Ye, J. (2011). Database resources of the National Center for Biotechnology Information Nucleic Acids Research DOI: 10.1093/nar/gkr1184

Video Tip of the Week: Phosida, a post-translational modification database

Over 2 years ago I did a tip of the week on Phosida (links to Phosida). Phosida is a database of phosphorylation, acetylation, and N-glycosylation data. Since the last tip, Phosida has undergone significant growth and some changes, including the addition of much more data (80,000 phosphorylation, acetylation and N-glycosylated sites from 9 different species) and tools (prediction and motif analysis). You can read more about those changes in this year’s NAR database issue article.


Today’s tip will revisit the database and redo a search that was done in the tip from 2009, this time using a protein search instead of a category search.
Gnad, F., Gunawardena, J., & Mann, M. (2010). PHOSIDA 2011: the posttranslational modification database Nucleic Acids Research, 39 (Database) DOI: 10.1093/nar/gkq1159

Quick link to Phosidahttp://www.phosida.com/

Below the fold you’ll find the text of the last tip of the week more information:

Continue reading

Tip of the Week: MycoCosm

MycoCosm is a fungal genomics database and browser at JGI, home of a lot of great resources. This week’s tip is from their video tips, which are useful. MycoCosm includes browsers of annotated data of many fungal genomes, KEGG pathway data, synteny data and much more. Their list of video tips include an introduction to the resource, functional annotation browser and protein page intro. Here is the one to get you started on the browser (unfortunately, there is no embed capability, so the image here will take you to the tutorial). These video tips are longer than our usual 5 minute tips ranging from 6 minutes (intro) to just over 30 (browser).

Tip of the Week: RepTar, a database of miRNA target sites

microRNAs have become a rich source of research as they probably have a huge effect on gene expression and disease. The human genome may encode over 1,000 miRNAs that target over half of our genes. They might be implicated in a lot of common diseases (which not yet have been picked up in GWAS studies?). They are a fascinating area of biology that has only come of it’s on in the last decade. As such, the number of databases to catalog miRNAs is large. Today’s tip is on a new one, RepTar, which is reported in the upcoming NAR database issue. The niche RepTar is attempting to fill is to get predictions of miRNAs more comprehensive by including new research in the algorithm. This new research suggests there are more possible target sites than previously thought. As mentioned in the article,

Recently, the miRNA binding options were expanded further with the identification of ‘centered sites’, functional miRNA target sites that lack both perfect seed pairing and 3′-compensatory pairing and instead exhibit pairing with the target along 11–12 contiguous pairs at the center of the miRNA (4). While some algorithms relaxed the evolutionary conservation criterion (5–11) and/or offer also predictions of 3′-compensatory sites [e.g. (6,12,13)], few databases offer predictions of the whole repertoire of miRNA targeting patterns. Furthermore to date, no database lists genome-wide prediction of cellular targets of viral miRNAs. These miRNAs lack significant evolutionary conservation and their targets are not necessarily expected to be evolutionarily conserved. In addition, the few identified viral miRNA targets have shown both conventional seed binding and 3′-compensatory binding [e.g. (3,14)].

Here we present a database of genome-wide miRNA target predictions for mouse and human genes, based on the predictions of our novel target prediction algorithm, RepTar

I’ll leave the predictive value up to miRNA researchers, but I thought I’d introduce the site.

While I’m at it, allow me to list a few other miRNA sites from labs and institutes as far flung as China, Italy, Israel, Canada and the U.S.. Perhaps someday I’ll do a comparison.

CircuitsDB, which Jennifer did a great tip of the week tutorial on.

miRBase, which we have a full-length tutorial on.
PicTar, they have an annotation track for UCSC Genome Browser
PuTmiR (in relation to transcription factors)

two lists to catch some others: http://mirnablog.com/microrna-target-prediction-tools/ and  http://www.ncrna.org/KnowledgeBase/link-database/mirna_target_database

Elefant, N., Berger, A., Shein, H., Hofree, M., Margalit, H., & Altuvia, Y. (2010). RepTar: a database of predicted cellular targets of host and viral miRNAs Nucleic Acids Research DOI: 10.1093/nar/gkq1233

New NCBI Image Database

Mary brought up a paper just recently about what we are missing when data mining papers: Figures and figure legends.

Enter the NCBI Image database. This very new database includes over 3 million images that are found in the full-text resources (i.e. PubMed Central) at NCBI. So, I did a search for “drosophila phylogeny” and found some great images and figures. The results will not only pull out the figure, but also the figure legend. I got over 200 results. The links in the search result figure titles take you directly to the figure. Below the legend you can see links to the full text. It’s a great start to searching figures and figure legends.

Along with this, PubMed search results now are enhanced with images from this database (if, remember, the article is in the full-text resources.. but over time a lot of research published with

NIH funding will go there won’t they?). For example, go to this abstract for the paper “Text mining and manual curation of the chemical-gene-disease networks for the comparative toxicogenomics database.” Scroll down just a bit, you’ll see the figures from this paper, which have been deposited in the NCBI image database. You can go directly to the link to all the figures or to the papers.

Of course, as stated, not all articles will have images in the database, only those deposited in PubMed Central. You’ll find a lot of your searches won’t have this image strip because the journal isn’t deposited there . But with 3 million images and more journal articles going to PMC every day, this database and feature of PubMed could prove to be quite useful.

Hattip: APD at CTD :)

Friday SNPets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

Tip of the Week: Galaxy intro

We had a tip last week on converting genome coordinates using Galaxy. This week I’d like to introduce you to the Galaxy interface. This screencast was actually done by one of the developers of Galaxy and is a quick introduction to the interface. We are currently working with Galaxy on a longer introduction to the tool, but thought I’d give you taste of it here. Galaxy is an excellent analysis tool. It’s not a database, but rather a tool to analyze data you can obtain from other sources and allows you to save your workflows and many other tools that help you analyze and collaborate. (if the movie to the left doesn’t load, try this link to view the movie).


New and Updated Online Tutorials for ASTD, Entrez Protein and MMDB

Comprehensive tutorials on the ASTD, Entrez Protein, and MMDB databases enable researchers to quickly and effectively use these invaluable variation resources.

Seattle, WA September 24, 2008 — OpenHelix today announced the availability of new tutorial suites on the Alternative Splicing and Transcript Diversity (ASTD) database, Entrez Protein and the Molecular Modeling Database (MMDB). ASTD is an European Bioinformatics Institute (EBI) resource for alternative splice events and transcripts for the human, mouse, and rat systems. Entrez protein is a comprehensive database of protein information brought to you by the National Center for Biotechnology Information (NCBI). MMDB is another NCBI resource which contains an extensive collection of three-dimensional protein structures with detailed annotation that can be used to learn about the structure and function of many proteins. Together these three tutorials give the researcher an excellent set of resources to carry their research from transcript to 3d protein structure.

The tutorial suites, available for single purchase or through a low-priced yearly subscription to all OpenHelix tutorials, contain a narrated, self-run, online tutorial, slides with full script, handouts and exercises. With the tutorials, researchers can quickly learn to effectively and efficiently use these resources. These tutorials will teach users:


  • to perform Quick and Advanced searches
  • to navigate gene and transcript report pages
  • to predict intron/exon boundaries and likely regulatory protein binding site
  • to search manually curated data regarding alternate splicing

Entrez Protein

  • to perform basic and advanced searches utilizing the many available tools and options
  • to understand the protein records and exploit the many internal and external links you are provided with
  • to explore some of the resources provided by the NCBI network of databases, such as “My NCBI”


  • to search MMDB using both basic and advanced query techniques
  • to understand the detailed results you obtain
  • to visualize and manipulate structures using NCBI’s Cn3D structural viewer
  • to locate and view structurally aligned homologs

To find out more about these and other tutorial suites visit the OpenHelix Tutorial Catalog and OpenHelix or visit the OpenHelix Blog for up-to-date information on genomics.

About OpenHelix
OpenHelix, LLC, provides the genomics knowledge you need when you need it. OpenHelix currently provides online self-run tutorials and on-site training for institutions and companies on the most powerful and popular free, web based, publicly accessible bioinformatics resources. In addition, OpenHelix is contracted by resource providers to provide comprehensive, long-term training and outreach programs.

New Online Tutorials on ZFIN, SGD, PlantGDB and GBrowse Resources

Comprehensive tutorials on the model organism databases ZFIN, SGD and PlantGDB and GBrowse, a model organism genome browser, enable researchers to quickly and effectively use these invaluable resources.

Seattle, WA September 15, 2008 — OpenHelix today announced the availability of new tutorial suites on several model organism resources including Zebrafish Information Network (ZFIN), Saccharomyces Genome Database (SGD) and the Plant Genome Database (PlantGDB) and also a tutorial using genome browsers with GBrowse. These four tutorials expand OpenHelix’s model organism database training which now also includes tutorials on MGI (mouse), FlyBase (drosophila), Gramene (grasses), RGD (rat), WormBase and more to come soon. Model organisms are integral to our understanding of basic biology and modern biomedical research. ZFIN is a collection of data, tools, and resources on the zebrafish (Danio rerio), a popular model organism for developmental biology and genetics research and SGD is a collection of data, tools and analyses centered around Saccharomyces cerevisiae, commonly known as bakers’ or budding yeast. PlantGDB is the primary resource for plant comparative genomics.

Additionally, OpenHelix has added a tutorial on GBrowse, a web application that allows you to explore genomic sequences together with annotated data. GBrowse is rapidly becoming the genomic browser of choice amongst model organism databases, because the browser is both universal and yet customizable.

The tutorial suites, available for single purchase or through a low-priced yearly subscription to all OpenHelix tutorials, contain a narrated, self-run, online tutorial, slides, handouts and exercises. With the tutorials, researchers can quickly learn to effectively and efficiently use these resources. These tutorials will teach users:


  • to perform effective searches and understand the displays
  • to access advanced searches enabling multifaceted queries
  • to use the various databases of genes and markers, expression data, mutant genotype/phenotype details, ontologies, and more
  • to investigate many related resources associated with ZFIN


  • to navigate the SGD site, locate Basic and Advanced Search options, and use the site map to access additional search tools
  • perform the two Basic SGD Quick and Text Search types and understand the displays
  • to navigate the SGD Locus Page and access data from a variety of tools, tabs, and links
  • to investigate many related resources associated with SGD


  • to perform quick searches and navigate sequence pages
  • to conduct BLAST searches across several plant species of your choice
  • to create exon/intron gene predictions and sequence alignments
  • to construct tables displaying highly varied information from many datasets
  • GBrowse

  • the basic layout and search methods at GBrowse
  • how to access detailed annotation data tied to genomic sequences
  • how to select and customize annotations using Tracks
  • how to upload and incorporate your own data or other external data sources
  • take a tour of different GBrowse installations at model organism databases

To find out more about these and other tutorial suites visit the OpenHelix Tutorial Catalog and OpenHelix or visit the OpenHelix Blog for up-to-date information on genomics.