Tag Archives: transcription factor

Video Tip of the Week: ENCODE ChIP-Seq Significance Tool

We’ve been doing training and workshops on the UCSC Genome Browser for 10 years now. It’s a tremendous tool that has to be a foundational item in your toolkit in genomics. But–there may be times when you want to examine some of the data that you can find there in another way, with a different focus or emphasis. It might be possible to craft some clever Table Browser queries that get you what you want. Sometimes, though, someone else has created a way for you to query the underlying data for a topic that could be useful too. And today’s tip of the week is exactly this kind of tool. A web interface to query the ENCODE data that resides in the UCSC Genome Browser, with a focus on finding transcription factors with enriched binding in a region that you might be interested in exploring. Today’s video tip is for the ENCODE ChIP-Seq Significance Tool.

There’s a ton of great data that flowed into the UCSC Genome Browser as part of the ENCODE project. It’s going to provide years of mining for biologists. What would be great is for biomedical researchers who have interest in specific genes–or sets of genes–to take a look at the ENCODE data to see if they can unearth some useful insights about the regulation of these genes or lists of genes. You can use the ChIP-Seq Significance tool to sift through the data.

The video that the Butte lab team did is very nice. Very specific guidance on how to use their tool–what to choose for the menu options, what the choices are, and what to expect from the results. Here’s their video:

Of course you should read their paper about this tool for the background you need (linked below), and the references that will also help you to understand what this tool offers. You should also read up on the associated ENCODE data. The supplement with the paper is also nicely written in clear language to help you to understand the features.

One of the things I was curious about was whether this might be extended to the mouse data too. One thing that people grouse to me about is that ENCODE is cell line data, and tissue data would really be great. But I saw discussion at Stephen Turner’s blog (read the comments) about the focus on human for now. There was also discussion of the CScan tool, though, which does cover the mouse data. So if this is a tool you are interested in, you might want to explore CScan too.

Hat tip to Stephen Turner for the awareness:

Quick links:

ENCODE ChIP-Seq Significance Tool: http://encodeqt.stanford.edu/

CScan: http://www.beaconlab.it/cscan


Auerbach, R., Chen, B., & Butte, A. (2013). Relating Genes to Function: Identifying Enriched Transcription Factors using the ENCODE ChIP-Seq Significance Tool Bioinformatics DOI: 10.1093/bioinformatics/btt316

Video Tip of the Week: AnimalTFDB for transcription factors

Transcription factor details–and sources of information about them and their binding sites–are definitely among the the most common questions we hear in our workshops. There are ways to look at predictions of binding, and for some species evidence of binding, and there are ways to look for binding motifs. But these resources vary in methods and scope, and it’s not easy to obtain collected information about transcription factors in many species.

One group has tried to change this, at least for animal transcription factors, with the AnimalTFDB. They collected and curated information about more than 70 families of TFs from 50 species, and created an interface where you can explore this collection by family, by species, and more.

This was offered as one of the sources of information from a recent query at BioStar. The poster was looking specifically for non-model organism information, and this database was one of the suggestions. But you can explore that question for other details and suggestions too.

The paper from the AnimalTFDB team provides information on other sources of TF information–including bacteria, plants, and various types of related resources too. So if you are looking for non-animal details there could be some guidance for you in there. Some of these you might see in future tips!

In the AnimalTFDB system itself, you’ll find that you have multiple ways to explore the data. From the landing page you can quickly browse to the collected data by TF family, or move right to the data organized by species instead. But there is also a standard search option with a form-based query. You can refine your search in various ways with that search form.

When you get to a transcription factor (or one of the other types that’s curated, transcription co-factors and chromatin remodeling proteins) there will be links to many types of useful additional details. Transcripts, domains, GO terms, and links to multiple related resources and more.

So if you are interested in transcription factors, co-factors, and chromatin remodeling proteins in animal species, check out AnimalTFDB.

Quick link:

AnimalTFDB: http://www.bioguo.org/AnimalTFDB/

Zhang, H.M., Chen, H., Liu, W., Liu, H., Gong, J., Wang, H. & Guo, A.Y. (2011). AnimalTFDB: a comprehensive animal transcription factor database, Nucleic Acids Research, 40 (D1) D149. DOI: 10.1093/nar/gkr965

What’s the answer? (non-model org TFs)

BioStar is a site for asking, answering and discussing bioinformatics questions. We are members of thecommunity and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those questions and answers here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

This week’s question highlights another issue with transcription factor binding site data. Increasingly people are seeking out this data, and this time it’s not for human or a well-studied model organism. As we broaden out with more species sequence data, this will also be another big need.

This week’s question: List of TF and TFBS from a non-model species


How can I find the TF and TFBS from a non-model species (in my case the cow). Maybe is it possible to infer them with the human TF and TFBS ? My goal is to detect the TF from differentially expressed genes. and maybe the differentially expressed TF regulating differentially genes.

Other related question : how to know which gene encode a TF ?

Thanks a lot,



Again, I found a TF resource that was new to me, so I appreciated the answers. Check them out.

Video Tip of the Week: TFBS using Mapper

Need to explore transcription factor binding sites (TFBS)? If you reading this, you might know already, but just to recap:

Transcription is regulated through the binding of transcription factor proteins to specific cis-level regulatory sites in the DNA. The nature of this regulation depends on the transcription factor. For example, some proteins activate transcription by recruiting RNA polymerase, some repress transcription by suppressing this recruitment, and others insulate proximal regions from the activity of nearby transcriptional activators or repressors. A key characteristic of each transcription factor protein is its DNA binding domain. Each DNA binding domain recognizes and interacts with DNA that matches a specific nucleotide pattern, or motif.

Determining these TFBS can help elucidate the regulation of a gene, determination of the cause of disease, and more. There are some very good transcription factor binding site databases and prediction tools available. Two that come to mind are Transfac and Jasper. There are other databases you might want to take a look at such as UniProbe, ORegAnno (which also has a UCSC track), oPOSSOM, UniProbe,  hPDI and many others. UCSC Genome Browser has a track of computationally derived conserved (human/mouse/rat) TFBS and ENCODE TFBS determined by ChIP-seq (of which you can find a mega-table here at FactorBook). PAZAR is a compilation of TF data from many small databases.  ORegAnno has a page  of additional databases and tools for TFBS and regulatory regions. Each of these have different strengths, weaknesses and data. So, get cracking :D.

The database and search tool I will focus on in this tip of the week is Mapper. Mapper uses TFBS from Transfac and Jasper and maps them to genomic locations for several species. Using “the search power of profile hidden Markov models (HMMs),” Mapper includes a database of pre-computed TFBS locations and an on-the-fly search engine for TBFS. Additionally, there is rSNPs, a nice handy tool designed to identify SNPs which have a significant effect on the score of a TFBS.

Today’s tip of the week will focus on the database and rSNPs and a basic intro to using these.
Marinescu, V., Kohane, I., & Riva, A. (2005). MAPPER: a search engine for the computational identification of putative transcription factor binding sites in multiple genomes
BMC Bioinformatics, 6 (1) DOI: 10.1186/1471-2105-6-79

(HT to Biostar and answers found here)

Tip of the week: ORegAnno for regulatory annotation

Lately we’re getting a lot of questions about ways to analyze the promoters and other regulatory aspects of genes. And for a while we were mostly pointing to the prediction data that was available in the UCSC Genome Browser’s TFBS Conserved track. TFBS Conserved is a track of computationally predicted transcription factor binding sites (TFBS) which are conserved across human/mouse/rat and based on Transfac v7.0 by BioBase.  As they say in the track description, it’s important to know this:

The data are purely computational, and as such not all binding sites listed here are biologically functional binding sites.

Though this is useful, people have been wanting more evidence based on real binding and/or activity data. Today’s tip will talk about 2 ways to get other data–beyond computational predictions. First we’ll explore ORegAnno so you’ll understand the data sources, and then we’ll also look at that data in the context of the UCSC Genome Browser and some useful data from the ENCODE project.

ORegAnno is the Open Regulatory Annotation Database, a community literature curation project for regulatory information. Anyone can participate in the curation–they provide helpful curation tools and automated cross-linking and checking features that make it easier. You would register, curate, and the data becomes available to anyone. And with the curator tools that are available the data becomes loaded into projects that coordinate with ORegAnno–including the track at the UCSC Genome Browser of ORegAnno data.

In the paper published in NAR 2008, they stated this:

The current release comprises 30 145 records curated from 922 publications and describing regulatory sequences for over 3853 genes and 465 transcription factors from 19 species.

So that’s a nice set with traceable data that’s not just computational predictions. In the tip I’ll show one example of Stat1 binding, in human, near the Il10 gene. If you look at that record, you’ll see several pieces of evidence that support this data and a link to the publication that offers it.

Now, if you look at ORegAnno data over in the UCSC Genome Browser, you could compare it to the computational predictions, or TFBS data from other projects such as the ENCODE data sets with the Chip-Seq data (Yale TFBS and HAIB, for example; note: you may have to go back an assembly because the ENCODE data is not all on the current assembly at this time). This is what I show in the movie: I take an ORegAnno annotated item, visualize that with the TFBS Conserved predictions and with some ENCODE project data.  So you get all 3 types of data with a few clicks.

So there are several ways to look for TFBS data–some of it computational predictions, some literature curation, and some big data stuff from the ENCODE teams. All of them have strengths and caveats. Computational predictions may be genome wide and independent of a given cell or tissue type, but are subject to the constraints of the algorithms. Community literature curation can offer quality evidence, but may be selected by interested groups and not as broadly representative of the genome-wide situation. Big data projects can be genome-wide and have evidence in some cell types, but may be in progress and subject to checking as they are pre-publication data.  But effectively using them all could help you to understand regulation of genes that you might be interested in.

Quick Links:

ORegAnno: http://www.oreganno.org/

Biobase and Transfac: http://www.gene-regulation.com/pub/databases.html

UCSC Genome Browser: http://genome.ucsc.edu/

ENCODE data at UCSC: http://genome.ucsc.edu/ENCODE/

Griffith, O., Montgomery, S., Bernier, B., Chu, B., Kasaian, K., Aerts, S., Mahony, S., Sleumer, M., Bilenky, M., Haeussler, M., Griffith, M., Gallo, S., Giardine, B., Hooghe, B., Van Loo, P., Blanco, E., Ticoll, A., Lithwick, S., Portales-Casamar, E., Donaldson, I., Robertson, G., Wadelius, C., De Bleser, P., Vlieghe, D., Halfon, M., Wasserman, W., Hardison, R., Bergman, C., Jones, S., & The Open Regulatory Annotation Consortium. (2007). ORegAnno: an open-access community-driven resource for regulatory annotation Nucleic Acids Research, 36 (Database) DOI: 10.1093/nar/gkm967

Tip of the Week: CircuitsDB for TF/miRNA/gene Regulation Networks

In this week’s tip I’d like to introduce you to CircuitsDB, which describes itself as:

“…a database where transcriptional and post-transcriptional (miRNA mediated) network information is fused together in order to propose and recognize non trivial regulatory combinations. “

I found out about the database from the BioMed Central article “CircuitsDB: a database of mixed microRNA/transcription factor feed-forward regulatory circuits in human and mouse“, which I cite below. I had already been thinking about miRNAs because I am slated to update our miRBase tutorial in the near future and have been reading/catching up on the latest in the field. The CircuitsDB paper by Olivier Friard et al does a really nice job of quickly and clearly laying out the background of the project – how transcription factors have long been studied for their transcriptional regulation of protein-coding genes involved in any manor of pathways, including those of disease. It goes on to describe that the study of microRNAs, or miRNAs, is a newer field studying the post-translational regulatory effects of miRNAs on protein-coding genes and their functions. Current efforts are moving to integrate the two areas of research to create more complete, and admittedly more complex, regulatory views of protein-coding genes and the affects on disease and other pathways.

The developers of CircuitsDB also very clearly describe how they have mined, analyzed and connected data from several top databases – many of which we have tutorials on, such as OMIM, miRBase, Ensembl and others – in order to create feed-forward regulatory loops, or FFLs, of TFs, affected miRNAs and ultimately affected protein-encoding genes. The image at the right is from their original paper: “Genome-wide survey of microRNA–transcription factor feed-forward regulatory circuits in human” (cited below), which reported the development of the computational framework for the mixed miRNA/TF Feed-Forward regulatory circuits that are freely available through the  CircuitsDB web interface. This original paper is available for free, with registration to RSC Publishing, and provides a detailed description of their original development, as well as access to several supplemental files.

Essentially networks linking transcription factors and affected genes, miRNAs and affected genes, and transcription factors and miRNAs were painstakingly connected through an ab-initio oligo analysis. Support was then gained for the connections by analyzing enriched GO terms, disease connections, and previously-known connections found in other specialized resources. The CircuitsDB interface offers multiple tools. The main tool (FFL) is what I show in this tip & is the tool that searches for the networks diagrammed above. The MYC FFL is an impressive “curated database of miRNA mediated Feed Forward Loops involving MYC as Master Regulator”, and includes information on the direction of regulation, loop participants, evidence levels and more. The Transcriptional network tool allows a user to search with either a miRNA & find its regulating TF, or search with a TF & find regulated genes or miRNAs. The Post-transcriptional network tool is similar, but allows searches for a miRNA or gene to find regulated genes or regulating miRNA, respectively. So check out the tip & then check out CircuitsDB – enjoy!

Friard, O., Re, A., Taverna, D., De Bortoli, M., & Corá, D. (2010). CircuitsDB: a database of mixed microRNA/transcription factor feed-forward regulatory circuits in human and mouse BMC Bioinformatics, 11 (1) DOI: 10.1186/1471-2105-11-435

Re, A., Corá, D., Taverna, D., & Caselle, M. (2009). Genome-wide survey of microRNA–transcription factor feed-forward regulatory circuits in human Molecular BioSystems, 5 (8) DOI: 10.1039/B900177H

Transcription Start Sites, databases and tools

A recent paper in PLoS One finds hundreds of new putative transcription start sites (TSS): PLoS ONE: Genome-Wide Identification of Transcription Start Sites, Promoters and Transcription Factor Binding Sites in E. coli. I found the paper interesting, and a good example of how high-throughput studies and genomics can advance our understanding of biology and work in concert with experimental biology, while at the sameResearchBlogging.org time dumping a whole lot of new data in our laps.

I’d like to point out some of the databases and resources that are mentioned and used in this paper. In fact, this is the first semi-weekly installment of ‘what did they use?’ post. I’d like to start citing papers that I find interesting and pull out the software, databases and genomics resources used in them. Might help our readers get an understanding of what is being used out there.

First and foremost, this paper has added a large set of new data to RegulonDB, or to paraphrase their about page:

Continue reading

Video Tip of the Week: modENCODE

modencode_logo_small.pngWe have talked about the ENCODE project before–both the successful pilot project and the current new phase of the ENCODE project that is going genome-wide, beyond the 1% coverage of the pilot project. One thing you may have noticed about the ENCODE data we talked about at the UCSC Genome Browser, though, is that it is very human-centric. But fear not–model organisms are in da house! There is actually a separate aspect of the ENCODE effort that I wanted to introduce today: modENCODE.

NHGRI has funded modENCODE researchers to take the ENCODE-style strategies and tools to some of our favorite model organisms: fly and worm–as you probably guessed from the logo. Already we are seeing data from the project, which you can access at the modENCODE project web site: http://www.modencode.org/

In this 4-minute tip of the week movie we’ll take a quick look at the resources available to examine the modENCODE data.

For the main modENCODE web site: http://www.modencode.org/

For the InterMine query tool for modENCODE data: http://intermine.modencode.org/release-3/begin.do