A HuGE database

ResearchBlogging.org :) that was fun writing that title. A recent correspondence in Nature Genetics outlined some changes in the HuGE Navigator. This database has been available in some form since 2001. The basic purpose of the database is to…

navigate and mine the growing scientific literature on human gene-disease associations and related data in human genome epidemiology. As an interconnected system of applications that users can enter by using genes, diseases, or risk factors as the starting point, HuGE Navigator provides a potential bridge between epidemiologic and genetic research domains.

HuGE culls data from PubMed dealing with population-based epidemiological studies of human genes. They recently replaced thier search strategy with machine learning approach. They then curate and assign study type and data categories and index by MeSH terms. HuGE has a suite of tools, as reported in the correspondence, that allow a researcher to search this literature in quite a few ways. Some of these include The HuGE Literature Finder (find published literature on human genome epidemiology including genetic association studies), the HuGE Investigator browser (find investigators or collaborators!), the HuGE watch tool (track the research), HuGEpedia (an online encyclopedia summarizing gene-disease associations), GeneSelectAssist (find candidate genes associated with a subject) and the HuGE Risk Translator (calculate the predictive value of genetic markers).

I am not an epidemiologist and I admittedly don’t know much about malaria, but I thought I’d try out a few tools such as GeneSelectAssist. I tried “malaria” in the search box and found 106 genes based on evidence in HuGE database and Entrez Gene as you see in this screenshot. I browsed around with some of this and found some interesting stuff like this paper suggesting apoE polymorphisms influence susceptibility to malaria. My immediate reaction to this though is that there is no priority given to the gene.

huge screenshotBut, as you can see here in this screenshot, you can prioritize this list. Small caveat though… because of NCBI use rules, they can’t do the calculation until after 9pm ET. So we are going to have to wait to see how that worked out.

So, I stuck apoE and malaria into the risk translator:
Huge risk translator

There are some other numbers to put in there for your test.

As they say in the ‘about’ section of the translator:

Risk Translator requires the user to enter the following parameters:

If One Risk Genotype option is checked,

Frequency (risk genotype): Proportion of carriers in total population
Disease Risk: Probability of disease in the general population over a specific period of time
Odds Ratio: Ratio of the odds of disease among carriers divided by the odds of disease among non-carriers
If Two Risk Genotype option is checked,

Frequency (risk genotype 1): Proportion of carriers in total population
Frequency (risk genotype 2): Proportion of carriers in total population
Disease Risk: Probability of disease in the general population over a specific period of time
Odds Ratio (risk genotype 1): Ratio of the odds of disease among carriers divided by the odds of disease among non-carriers
Odds Ratio (risk genotype 2): Ratio of the odds of disease among carriers divided by the odds of disease among non-carriers

If you do this you get back some calculations:

The HuGE Calculator calculates:
Epidemiological Assessment

Frequency (risk genotype): Proportion of carriers in total population
Frequency (reference genotype): Proportion of non-carriers in total population
Disease risk: Disease risk in total population
Disease risk (risk genotype): Disease risk for carriers
Disease risk (reference genotype): Disease risk for non-carriers
Risk difference: Difference between disease risks of carriers and non-carriers
Relative risk: Ratio of disease risks of carriers and non-carriers
Clinical Validity & Utility

Sensitivity: Proportion of carriers among those who will develop disease
Specificity: Proportion of non-carriers among those who will not develop disease
False Positive Rate: Proportion of carriers among those who will not develop disease
False Negative Rate: Proportion of non-carriers among those who will develop disease
Positive Predictive Value: Proportion of those who will develop disease among carriers
Negative Predictive Value: Proportion of those who will not develop disease among non-carriers
Likelihood ratio (risk genotype): Ratio of the proportion of carriers in those who will develop disease and the proportion of carriers in those who will not develop disease
Likelihood ratio (reference genotype): Ratio of the proportion of non-carriers in those who will develop disease and the proportion of non-carriers in those who will not develop disease
Public Health Impact

Population attributable fraction: Proportion of cases that can be prevented when the negative effect of the genetic risk factor is eliminated
Number Needed to Treat: Number of people needed to treat to prevent one case
Number Needed to Screen: Number of people needed to screen to prevent one case

I didn’t want to pull out numbers from you-know-where, but you get the idea.

The HuGE database looks like a great database for searching the literature for human genome epidemiology and pulling more than just references.

Check it out and tell us what you think?

Yu, W., Gwinn, M., Clyne, M., Yesupriya, A., Khoury, M.J. (2008). A navigator for human genome epidemiology. Nature Genetics, 40(2), 124-125. DOI: 10.1038/ng0208-124

One thought on “A HuGE database

  1. Mary

    There are also links from UCSC gene details pages to HuGE if there is corresponding data. For example, here is a BRCA1 page. On the page you can find a link to “CDC HuGE Published Literature: BRCA1″ which will pull the BRCA1 literature up at HuGE.

    When I was looking for information on that gene associated with ALS it led me to some useful studies. Quite handy.

Comments are closed.