Tag Archives: Gene Expression


Video Tip of the Week: Digital Expression Explorer for RNA-seq

DEE_posterComing over my digital transom in a couple of different ways recently, the Digital Expression Explorer caught my attention. I have a soft spot for expression. It’s the business end of what’s going on, not just the archives of genomic info stored up in the nucleus, you know? Heh.

Anyway–I saw the poster via email from ResearchGate (yeah, I know, but a couple of interesting things have come to me from there). It was authored by someone that I’ve been on a paper with, so it notified me. And the volume of expression data and challenges for users definitely hits on a problem that I think about a lot. There’s such great detail underneath the surface in so many of these repositories we have, but few people have the tools to mine them. They specifically note the GEO and SRA in this case. Digital Expression Explorer (DEE) sits on top of them and provides a new way in, after standardizing the data. I think that’s worthwhile.

There’s no need for me to provide a more details here–the lead author Mark Ziemann has done a great job in this blog post: Introducing “Digital Expression Explorer”. He provides the background to the problem of re-using the data, and what they’ve done to solve it. And you can see the poster online. But better still, check out this quick intro video from Mark that is this week’s Video Tip of the Week:

So you can see how to pull out some data and begin to explore it with DEE. Then there’s a piece that illustrates taking this newly mined set and evaluating it with Degust. This quickly shows what you can get and what you can do next. It will give you an idea of the benefits (and the speed).

And, may I say, nice job on outreach! Poster, tweets, blog, and video. Many ways to reach potential users. Moar like this, plz, for all you folks with software tools.

Try it out, and see if it saves you time and delivers you more than you might have been able to mine out of the repositories so quickly in the past.

Quick link:

Digital Expression Explorer: http://dee.bakeridi.edu.au/


Ziemann M., Kaspi A., Lazarus R., & El-Osta A. (2015). Digital Expression Explorer: A user-friendly repository of uniformly processed RNA-seq data. ComBio2015 : 10.13140/RG.2.1.1707.5926


Friday SNPpets

This week’s SNPpets include a range of things, from Pardis Sabeti’s recovery from a serious accident to tardigrade genome drama. There are new databases and tools such as the GMO sequence tracker in the EU, to new uses of tools such as Docker, to explore. Reports of a serious BLAST bug. A look at common spreadsheet formatting mistakes and some solutions. It’s not a gene-editing moratorium. And more.

SNPpets_2Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

What’s the Answer: genes implicated in…

BioStar is a site for asking, answering and discussing bioinformatics questions. We are members of the community and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those questions and answers here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

Question of the week: This weeks’ question was specific, 

I’m looking for a database for genes implicated in lymphoid and myeloid development

Short answer is there isn’t any (that I or anyone can find), but two answers as of this posting do give good methods to find what the question is looking for.

We’ve developed a text-mining approach, called GETM (Gene Expression Text Miner) to associate genes with anatomical locations based on tagging of gene names and species-specific anatomy ontologies that might help with your problem.

Tip of the Week: SNPexp, correlation between SNPs & gene expression

SNPexp is a nice simple tool that uses PLINK to  calculate the correlation (p-value) between SNPs in a given range of locations in the genome, or alternatively a list of specific SNP rsIDs, and the expression of a gene of interest. It combines the data from these two datasets: the HapMap project and GENEVAR*. It provides a simple web-based interface to allow you to make those calculations and to either download the results in a series of files, or to view the results as a custom track in the UCSC Genome Browser. Today’s tip gives you a quick introduction to using the tool.

*GENEVAR is both a database of  “analysis of gene expression variation in the HapMap samples using genome-wide expression arrays (47294 transcripts) from EBV-transformed lymphoblastoid cell lines from the same 270 HapMap individuals AND a downloadable software tool to allow you to “perform analysis and visualization of associations between sequence variation and gene expression in eQTL studies.” It’s an additional tool that might be of interest and providing for more in-depth analysis.

Quick link to SNPexp: http://app3.titan.uio.no/biotools/tool.php?app=snpexp

Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

Video Tip of the Week: Caleydo for gene expression and pathway visualization

Recently while watching the #bioinformatics tag on Twitter I saw Khader Shameer mention Caleydo.  I was instantly hooked at the very clever visualization strategy that they are using to provide more surface area for examining the data you are interested in viewing.  Their specific topics are pathways and gene expression, but it got me thinking about various data types that I would like to see connected in this way. This week’s Video Tip of the Week is about this sofware.

To skip right over to Caleydo and start trying it out, go here: http://www.caleydo.org/

Caleydo delivers a 3D representation of the expression and pathway data.  The main user interface has an area that is a box.  They call it a bucket, but in my head buckets are round, so I think of this as a box.  On the floor of the box you have a graphic.  But because you also have 4 interior surfaces of the box you have 4 more places to display and link the data.  You can have a heat map microarray representation on one side, and various pathways associated with the genes in that microarray on the other sides.

There’s a short systems biology Application Note in Bioinformatics that describes the framework and gives an overview of the tool.  But there’s also a more detailed paper over at their publication site that will get you started (that 2010 paper for the Visualization conference in Taipei).

My computer is a bit underpowered, but I was able to load their webstart version and begin to look around.  They provide some sample data you can select and examine.  For the movie this week, though, I was unable to load that and run the recording software at the same time.  So mostly it’s an introduction to the concept and the site.  You’ll have to go over and load it up yourself to try it out.  If the webstart version doesn’t work for you, there are a couple of other download options for different platforms.

The Caleydo team has also done a YouTube overview of the features that you can examine.


So try out this visualization strategy and see what you think.  I really like the concept.


Streit, M., Lex, A., Kalkusch, M., Zatloukal, K., & Schmalstieg, D. (2009). Caleydo: connecting pathways and gene expression Bioinformatics, 25 (20), 2760-2761 DOI: 10.1093/bioinformatics/btp432

Liver proteome database resources

So far at OpenHelix we’ve generally been focusing on the more broad tools that are of use to the largest cross-section of biomedical researchers.  Everybody needs genome browsers pretty much at some point in their research.  However, I’ve always had a soft spot for tissue-specific resources.  Since my PhD project was on muscle, I have a lot of thoughts on tissue-specific regulation, expression, and splicing that I think are going to be just fascinating as we build on the whole genome sequencing base projects.  A lot of the “hypothetical” and “unknown” predicted sequences are going to fall out of spatial-and temporal-specific expression projects as we move forward.

That appears to be the case in liver according to a new proteomics paper.  There’s a commentary in the Journal of Proteome Research that speaks to this:

Mammoth data set from human liver reported by Quinn Martin Eastman

A total of 6788 proteins were identified, though that number excludes >6000 proteins that had only one peptide match and were eliminated from the final count. The researchers identified proteins corresponding to 60% of all of the protein-encoding genes expressed in liver, which were identified by RNA analysis from the same samples….

Some 3721 of the identified proteins had not been seen in human liver before, though they had been detected in other human organs. Almost 1000 were “hypothetical”—their existence previously inferred from DNA sequence information only.

Emphasis mine. That’s very cool stuff.

The commentary refers to a paper in the same issue from a consortium of researchers, and links to the resulting database for the project.  The database has about the most complicated name I’ve ever seen, visible in the paper title here:

First Insight into the Human Liver Proteome from PROTEOMESKY-LIVERHu 1.0, a Publicly Available Database

Hyphens and superscripts–oh my!

You can check out the corresponding databases associated with this project at 2 URLs:

dbLEP for Liver Expression Profile:  http://dblep.hupo.org.cn

and Liverbase: http://liverbase.hupo.org.cn

I’m looking forward to other tissue collections as well.

Tip of the Week: Gene Expression Data by Condition at ArrayExpress

AE Atlas TipIn today’s tip, I want to show you how to use a great looking beta tool that I just found at EBI’s ArrayExpress Gene Expression repository (AE). The tool’s name is the ArrayExpress Atlas. You may have retrieved expression data from the ArrayExpress Warehouse, which is a carefully curated collection of expression data. The Warehouse is a wonderful resource, and a great way to obtain expression data sets, but the information retrieved is organized by gene name and sample values. The ArrayExpress Atlas appears to be the next generation of the Warehouse and it provides gene expression data as a table, with genes corresponding to rows and experimental conditions corresponding to columns. The tool is easy to use, provides easy to interpret results, and looks like its capabilities are growing fast. Check out this tip, check out the Atlas blog spot, check out the tool, and send any feedback for improving the tool to AE.

Tip of the Week: Data Downloading from Gene Expression Omnibus (GEO)

geo_thumbThere are times when being able to download the data used to create a publication is very useful. Perhaps you want to compare interesting data to some of your own. Or you may want to analyze the data differently than the original author to try and gain additional information. Or perhaps you’d just like to check out the data yourself that the author’s conclusions are valid. For my “tip of the week,” I thought I’d give you a tip for downloading some of that data from NCBI’s GEO (or Gene Expression Omnibus) database. We’ve got a whole introductory tutorial on how to use GEO, but this ~3 minute screencast will show you how to easily get GEO information into an Excel spreadsheet. Hope you find this tip as useful as I did when I learned it!