Tag Archives: HGMD

Tip of the Week: Transfac (and HGMD, Proteome, etc)

BioBase is a provider of expert-curated biological databases. Two well known BioBase databases are TransFac and HGMD. Both have publicly available data (see previous links), but if you go to the BioBase site, you’ll find there are subscription based data access also for more feature-rich access. HGMD is the Human Gene Mutation database and ” represents an attempt to collate known (published) gene lesions responsible for human inherited disease.” TransFac on the other hand “provides data on eukaryotic transcription factors, their experimentally-proven binding sites, consensus binding sequences (positional weight matrices) and regulated genes.” As you can tell from a search of our blog, HGMD is often cited as a good location for human disease data, as TransFac is for TFBS.

BioBase has a series of video tutorials for both TransFac and HGMD (and more for the other databases such as Proteome, Genome Trax and ExPlain). For this weeks tip of the week, we’ve embedded two video tutorials.

This first explains MATCH, an analysis tool in TransFac to predict binding sites for Transcription Factors in a particular DNA sequence.



The second video tip is a quick tutorial on how to get started with searching HGMD


If you are interested in advanced searching of these two databases, or Genome Trax, Proteome or ExPlain, check out the video tutorials from BioBase.

(one) Video Tip of the Week (to hold them all): Variation and Disease Databases

After again reading Daniel MacArthur’s good rundown about the state of databases of human disease-causing variation from last year (One database to hold them all), I thought it might be nice to do a tip comparing several of them. I couldn’t get it under our self-imposed 5 minute limit for our tips (and technical limit of software I’m using, but that’s about to change). But as I perused our tips and other sites, I found we and others have quite a list of how-to tips to use these databases. So in today’s tip I’ve gathered video tips for 3 of the databases listed in the linked post. Below those tips I’ll link to other how-to videos for additional human variation and disease.

The databases mentioned are OMIM, Human Gene Mutation Database (HGMD), MutaDATABASE and The Human Variome Project . There are video tips for the first three.


Last year OMIM moved to http://www.omim.org and had a entire new interface. Mary was on top of it and did a tip on the new OMIM interface with lots of information on the move and OMIM in the post:

Our full tutorial on the new OMIM is coming soon.

HGMD has a public site and a by-subscription site. The latter includes access to the most current data and some added features. The publicly accessible site is out-of-date by three years. Because of HGMD restrictions, we aren’t able to do a tutorial or a tip on HGMD, but they do have an introduction video to their database:


Additionally, there is a good background page for more information.


Mary did a tip on MutaDatabase last summer:


Another excellent resource is Gen2Phen. The Gen2Phen project “aims to unify human and model organism genetic variation databases towards increasingly holistic views into Genotype-To-Phenotype (G2P) data, and to link this system into other biomedical knowledge sources via genome browser functionality.”  In that vein, they have quite an extensive list of Locus-specific databases and additional resources.

There are several other resources available for human disease variation including CGAP, dbGAP, GAD, PhenomicDB and several others. We have tutorials on all those if you wish to check those out.

Of course there’s dbSNP :D of which we have a tutorial and tip about searching human variation.

You can find an extensive list of other resources at Human Genome Variation Society (HGVS).

And an oft-asked question on Biostar is what kind of resources are there for this kind of data. You can find answers here, here and here.

What’s the Answer? databases of disease SNPs

BioStar is a site for asking, answering and discussing bioinformatics questions. We are members of thecommunity and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those questions and answers here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

This week’s question is searching for databases of SNPS ‘causal’ for diseases. As the answers point out, the word ‘causal’ should be used with hesitation when talking about SNPs. That said, the answers gave some good suggestions to this perennially asked question, here’s the first one (others are as useful, so check them out):

Human Gene Mutation Database


Paid subscription for up-to-date information. Otherwise less up-to-date public version of the database is freely available only to registered users from academic institutions/non-profit organisations.

Personal Genomics, clinical assessment and online resources

ResearchBlogging.orgThe Lancet paper, Clinical assessment incorporating a personal genome, has held my fascination this weekend (yes, I read it at the beach). Mary posted Friday and again Saturday on the paper and related NPR segment. It feels to me to be a seminal paper, though I do agree with Daniel at Genetic Future, there are a lot there we still don’t know. A large portion of the variation is in non-coding regions, and thus predictions and propensities are hard to come by with the available analysis. In fact, as he pointed out, many of the coding region variations have little information as to their effect on disease. I would add also that even if we get to that holy grail of $1,000 to sequence a personal genome, this kind of extensive analysis would still be time and cost-prohibitive for the vast majority of sequenced genomes.

Yet, as with all early steps in science and medicine, there’s missing pieces, large gaps and huge efforts (think “space travel,” “computers,” “microwave ovens,” “internet,”) that over time become inexpensive and commonplace (ok, so the former isn’t necessarily “inexpensive”). Sequencing genomes will become inexpensive before the analysis does, but both will come. And I think this paper is pointing to that future.

The other hurdle to large scale personal genomics I see (of course) is the understanding and use of the genomics and data resources. The authors use a large (and excellent, in my opinion) suite of genomics resources to do obtain data and do their analysis. I’ll list them here with links in alphabetical order:

dbSNP (T)
HapMap (T)
PubMed (T)
UniProt (T)

All of these resources have a wealth of data, but even then, that is a lot of analysis and familiarization that is needed with each tool. Each tool does have documentation and tutorials, and of course OpenHelix has tutorials on many of the ones mentioned (those with linked “T”s after the name). Still, this one analysis took a large number of tools and familiarization.

The paper does have a pretty good figure (figure 1) outlining the analysis process. For example, they SIFTed the genome to find gene-associated, non-synonymous, rare and novel and disease associated variations and then analyzed those using dbSNP, HGMD, OMIM and PubMed to analyze something like HFE2 which might have an association with Haemochromotosis. One of my quibbles with the paper, as often is with these papers, is that there isn’t a good methods ‘walk-through’ of the paper using something like Galaxy or Taverna in a history or workflow that would help reproduce the analysis.

We also have a tutorial I’d like to point you to, one that walks through a similar process and teaches users the basics of walking through that process. You can find this tutorial here, it’s free and publicly available. The tutorial walks the user through the analysis of a gene variation, in this case in the CYPC9 that effects an individual’s response to Warfarin. There is a similar variation (different gene, affects same drug response) in the paper. The tutorial uses the NIEHS SNPs site to get an overview of the variation including SIFT and PolyPhen predictions, then to the UCSC Genome Browser to find an overview of the region, walks through the dbSNP information and does a quick tag SNP analysis using GVS. That tutorial is only one very small step in what will have to be a immense education into genomics and genomics resources.

That is all to point out that the paper is an fascinating first step, and as a first step suggests the gaping holes we will have in bringing personal genomics to medicine.

Ashley, E., Butte, A., Wheeler, M., Chen, R., Klein, T., Dewey, F., Dudley, J., Ormond, K., Pavlovic, A., & Morgan, A. (2010). Clinical assessment incorporating a personal genome The Lancet, 375 (9725), 1525-1535 DOI: 10.1016/S0140-6736(10)60452-7

Omes and Omics. Oh, please stop the growth of the suffixome…?

Yesterday from the Gene Ontology GoFriends mailing list I got notified about Babelomics. The creators describe Babelomics as:

Babelomics is a suite of interconnected tools oriented to the functional annotation of genome-scale experiments. One of the field for which Babelomics is best suited is the analysisn of microarray gene expression experiments. Nevertheless Babelomics is not restricted to this type of data and has been designed for facilitating the interpretation of large-scale experiments.

I think the tools sound terrific. But I just wasn’t sure I could handle another -omic…

And then my email from the HUM-MOLGEN list arrived. I found out about the Variome meeting. The Variome project aims to collect and make available information on human variation that is correct and complete. Another worthy exercise. But another -ome for the collection. I can’t take the growth of the suffixome any more….