One of the most common questions we get when we are out doing software training is: what do I do with a list of genes? People generate lists from all sorts of biomedical research forays: microarray results, database searches, literature searches, library screens, etc. The source doesn’t matter much–in the end people have this list that they need to analyze, assess, categorize, group, filter, and manage.
We’ve been looking into some tools to accomplish this. We’ve already demonstrated a few of them already (Reactome SkyPainter, Gene Ontology Term Enrichment, MatchMiner…). But there are more that I want to explore. What I decided to do was to create a standard list that I’m going to use to explore and evaluate different tools. Today the tip is where I got this list and how I created it. I want to be able to refer back to this list in the upcoming “list” tips, and thought that if I explained that first it would help.
So today’s tip is obtaining a list of disease genes from UniProt. Now, you could just go to UniProt yourself and get this handy list. But I show you how to get there starting from the UniProt homepage, and what I did to filter this list to a set of unique gene symbols for disease genes in Excel. I end up with ~2500 unique symbols for disease genes that will be the input for upcoming tips.