Video Tip of the Week: PhenDisco, “phenotype discoverer” for dbGap data

The dbGaP, database of Genotypes and Phenotypes, repository at NCBI collects information from research projects that link genotype and phenotype information and human variation, across many different types of studies, providing leads on variation that may be important to understand clinical issues. Some of the data is publicly available de-identified patient information, and some of the data requires authorization to access. This is valuable information, certainly, but I know I’ve heard folks grouse about how challenging it can be to locate specific things you might be interested it, because of a lack of standardization of some of the aspects of the project details.

The developers of PhenDisco were aware of the challenges of extracting the information out of dbGaP, and they chose to investigate ways make searches for key data more effective. They looked at requests that had come in to dbGaP. They surveyed researchers who would represent typical users, and found that the way to make the mining of dbGaP easier would be to standardize a lot of the aspects of the project descriptions and data. They thought through use-case scenarios. And once the standardization was completed, a new query interface relying on these new descriptors was made available as well.

For the foundations of the project and how they went about it, you should read their paper (linked below). But for this week’s video tip, I’ll include a couple of things that this group has delivered to help people understand their project and use their site. If you want the short version about how to approach the site, this YouTube video will cover that (erm, and I’m sorry about the actual disco music….):

But if you have time for the longer form, there’s a webinar they delivered that I’ll include here as well. Part of this webinar is the video from YouTube, but the details are easier to see in the YouTube version so I’d encourage watching that and skipping that piece of the webinar.

So have a look at the PhenDisco if you’ve been finding searchers of dbGaP have been less satisfying than you’d hoped. I think one of the best ways to grasp the standardization is to have a look at their advanced search page to see what types of things are selectable there. Try some searches and see if it’s helpful for your research.

Just wanted to add a link to a slide set from a journal club presentation on PhenDisco as well, in case the videos aren’t ideal for your situation. There is also a separate video of that journal club.


If this is a type of resource you find useful, you might also want to explore the PheGenI (Phenotype-Genotype Integrator) that I covered in a previous Tip of the Week too.

Quick links:

Project overview page:

Search engine main page:

Advanced search page to understand the structure:


Doan S., Lin K.W., Conway M., Ohno-Machado L., Hsieh A., Feupe S.F., Garland A., Ross M.K., Jiang X. & Farzaneh S. & (2013). PhenDisco: phenotype discovery system for the database of genotypes and phenotypes., Journal of the American Medical Informatics Association : JAMIA, PMID:

Tryka K.A., A. Sturcke, Y. Jin, Z. Y. Wang, L. Ziyabari, M. Lee, N. Popova, N. Sharopova, M. Kimura & M. Feolo & (2013). NCBI’s Database of Genotypes and Phenotypes: dbGaP, Nucleic Acids Research, 42 (D1) D975-D979. DOI: