Tag Archives: disease

Video Tip of the Week: list of genes associated with a disease

I am currently in Puerto Varas, Chile at an EMBO genomics workshop. The workshop is mainly for grad students and the instructors are, for the most part, alumni of the Bork group. I gave a tutorial on genomics databases.

Anyway, the last two days of the workshop is a challenge, in teams of 3-4 advised by an instructor, students are to develop a list of genes associated with epilepsy. Obviously, this could be a trivial task, just go to OMIM or GENECARDS and grab a list. But this challenge requires them to go behind that and use the available data and make predictions. My team attempted, on my suggestion, some brainstorming techniques to ensure a more creative solution than they could come up with individually or just jumping into normal group dynamics. It seemed to work, their solution was quite creative and we will find out today how that worked.

That was my long way of saying, in the process we came across many databases of gene-disease information. above you will find a video of rat gene disease associations from RGD, often used of course to investigate human gene disease associations.

Below you will find a list of some excellent databases and resources to find similar lists:

Gene Association Database  http://geneticassociationdb.nih.gov/

G2D http://g2d2.ogic.ca

OMIM http://www.omim.org

Diseases http://diseases.jensenlab.org/

GeneCards http://genecards.org

DisGeNET http://ibi.imim.es/web/DisGeNET/

Several NCBI resources http://www.ncbi.nlm.nih.gov/guide/howto/find-gen-phen/

UCSC Genome Browser’s tracks for disease and phenotype http://genome.ucsc.edu

There are several others I’m sure, if you have a favorite not on this list, please comment.

Reference for RGD:
Laulederkind S.J.F., Hayman G.T., Wang S.J., Smith J.R., Lowry T.F., Nigam R., Petri V., de Pons J., Dwinell M.R. & Shimoyama M. & (2013). The Rat Genome Database 2013–data, tools and users, Briefings in Bioinformatics, 14 (4) 520-526. DOI:

Video Tip of the Week: PATRIC, Pathosystems Resource Integrations Center

PATRIC is a integration portal (as the name implies) of  data concerning disease-causing infectious bacteria. Or to put it in their words:

PATRIC is the Bacterial Bioinformatics Resource Center, an information system designed to support the biomedical research community’s work on bacterial infectious diseases via integration of vital pathogen information with rich data and analysis tools.

We mentioned PATRIC at the beginning of the year in a SNPpets. Also, recently I was speaking with a threat abatement specialist who was lamenting the lack of coordinated data on infectious bacteria genomes. I was sure there was such a site, so we checked our blog here and voila, sure enough, exactly what they needed.

PATRIC indeed coordinates a lot of different types of data from disease-causing infectious bacteria. This includes data from all NIAID biodefense A/B/C pathogens. This includes hundreds of genomes from many isolation sources. For example, as of this writing there are nearly 500 genomes, including 57 complete, of Escherichia. In addition to genomic data, there are many other types of data including phylogenetic, host-pathogen protein-protein interactions, protein, pathways and more. One interesting feature, of many,  is the disease map (for mycobacterium only right now) that shows local outbreaks and alerts. There are many tools to access and analyze this data from specialized searches to browsers.

To get a good idea of what is available at PATRIC, check out the quick intro video embedded above from the PATRIC developers. They have two other video tutorials on the feature table and identifying novel proteins you also might want to check out. Also, check out the blog for more databases and resources for infectious disease pathogens.

To cite or learn more about PATRIC, see:

Gillespie, J., Wattam, A., Cammer, S., Gabbard, J., Shukla, M., Dalay, O., Driscoll, T., Hix, D., Mane, S., Mao, C., Nordberg, E., Scott, M., Schulman, J., Snyder, E., Sullivan, D., Wang, C., Warren, A., Williams, K., Xue, T., Seung Yoo, H., Zhang, C., Zhang, Y., Will, R., Kenyon, R., & Sobral, B. (2011). PATRIC: the Comprehensive Bacterial Bioinformatics Resource with a Focus on Human Pathogenic Species Infection and Immunity, 79 (11), 4286-4298 DOI: 10.1128/IAI.00207-11

What’s the Answer? databases of disease SNPs

BioStar is a site for asking, answering and discussing bioinformatics questions. We are members of thecommunity and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those questions and answers here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

This week’s question is searching for databases of SNPS ‘causal’ for diseases. As the answers point out, the word ‘causal’ should be used with hesitation when talking about SNPs. That said, the answers gave some good suggestions to this perennially asked question, here’s the first one (others are as useful, so check them out):

Human Gene Mutation Database


Paid subscription for up-to-date information. Otherwise less up-to-date public version of the database is freely available only to registered users from academic institutions/non-profit organisations.

What’s the Answer? disease causing SNPs

BioStar is a site for asking, answering and discussing bioinformatics questions. We are members of thecommunity and find it very useful. Often

questions and answers arise at BioStar that are germane to our readers (end users of genomics resources).Every Thursday we will be highlighting one of those questions and answers here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

This week’s highlighted question is….

..which is the best database choice from where i can extract a data set of causative variants and a data set of benign variants (OMIM ,GWAS)…

A perennially favorite question. The accepted answer gives a good rundown of how to go about choosing a database. Another answer points to an earlier discussion with a wealth of databases.

What’s the Answer? Open Thread

BioStar is a site for asking, answering and discussing bioinformatics questions. We are members of the
community and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those questions and answers here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

Question of the week:

Can anyone suggest some tool or validated database…where I can get disease associated SNP data ( like diabetes, etc) and the corresponding PMIDs/ the number of caeses,controls and population studied…

The answers are excellent. The one below was the second highest voted answer, and excellent of course. The highest voted one was a snippet of sql code to query UCSC database for just this answer. Take a look!

roughly speaking, what you (and lots of people around the world) would like to do is actually the main purpose of the HVP project, which is encouraging the creation of locus specific databases (LSDBs) that would collate disease specific variations. right now, all we can do are just 2 things:


Tip of the Week: PolyPhen

There are several methods that can be used to predict if a particular non-synonymous SNP is deleterious; SIFT and PolyPhenamong others. Which one to use will be up to the individual researcher and the strengths and weakness of the predictors, though the two mentioned do a pretty good job. Today’s tip will be on the web interface of PolyPhen 2 hosted at the Sunyaev lab*. Many tools and databases use PolyPhen to help predict the functional effect of a nonsynonymous SNP including PolyDoms, F-SNP (which I’ve done a tip on before), NIEHS SNPs and SeattleSNPs (which we have  free tutorials on), SeattleSeq and more. Today’s tip will focus on simply using the web interface, but you can always download the program and integrate it as you see fit or use one of the databases. Along with SIFT, it’s arguably one of the most used predictors out there.

From an earlier help section describing PolyPhen:

PolyPhen (=Polymorphism Phenotyping) is an automatic tool for prediction of possible impact of an amino acid substitution on the structure and function of a human protein. This prediction is based on straightforward empirical rules which are applied to the sequence, phylogenetic and structural information characterizing the substitution

To learn more about how PolyPhen works, you can view that page, or you can read some of the references. Next tip I do (early February) will be on SIFT.

Tip of the Week: RepTar, a database of miRNA target sites

microRNAs have become a rich source of research as they probably have a huge effect on gene expression and disease. The human genome may encode over 1,000 miRNAs that target over half of our genes. They might be implicated in a lot of common diseases (which not yet have been picked up in GWAS studies?). They are a fascinating area of biology that has only come of it’s on in the last decade. As such, the number of databases to catalog miRNAs is large. Today’s tip is on a new one, RepTar, which is reported in the upcoming NAR database issue. The niche RepTar is attempting to fill is to get predictions of miRNAs more comprehensive by including new research in the algorithm. This new research suggests there are more possible target sites than previously thought. As mentioned in the article,

Recently, the miRNA binding options were expanded further with the identification of ‘centered sites’, functional miRNA target sites that lack both perfect seed pairing and 3′-compensatory pairing and instead exhibit pairing with the target along 11–12 contiguous pairs at the center of the miRNA (4). While some algorithms relaxed the evolutionary conservation criterion (5–11) and/or offer also predictions of 3′-compensatory sites [e.g. (6,12,13)], few databases offer predictions of the whole repertoire of miRNA targeting patterns. Furthermore to date, no database lists genome-wide prediction of cellular targets of viral miRNAs. These miRNAs lack significant evolutionary conservation and their targets are not necessarily expected to be evolutionarily conserved. In addition, the few identified viral miRNA targets have shown both conventional seed binding and 3′-compensatory binding [e.g. (3,14)].

Here we present a database of genome-wide miRNA target predictions for mouse and human genes, based on the predictions of our novel target prediction algorithm, RepTar

I’ll leave the predictive value up to miRNA researchers, but I thought I’d introduce the site.

While I’m at it, allow me to list a few other miRNA sites from labs and institutes as far flung as China, Italy, Israel, Canada and the U.S.. Perhaps someday I’ll do a comparison.

CircuitsDB, which Jennifer did a great tip of the week tutorial on.

miRBase, which we have a full-length tutorial on.
PicTar, they have an annotation track for UCSC Genome Browser
PuTmiR (in relation to transcription factors)

two lists to catch some others: http://mirnablog.com/microrna-target-prediction-tools/ and  http://www.ncrna.org/KnowledgeBase/link-database/mirna_target_database

Elefant, N., Berger, A., Shein, H., Hofree, M., Margalit, H., & Altuvia, Y. (2010). RepTar: a database of predicted cellular targets of host and viral miRNAs Nucleic Acids Research DOI: 10.1093/nar/gkq1233

Tip of the Week: Mouse Genomic Pathology

Ok, so this isn’t the same as our usual tips. But recently I was involved in an animal models project that led me to this resource on genomic pathology. The deeper I got into this animal model project, the more clear it became that a tremendous amount of genomic data is coming that is going to be great–but it will need to be paired with appropriate histology and pathology for a more complete understanding of the genomic biology.

All these model organism projects–knockout mice or rats, mutant mice for cancer studies for example, inbred lines with specific characteristics and genomic regions like the Collaborative Cross, treated animals–need quality pathology assessments. There are phenotyping projects like Europhenome being done on large sets of animals, and they require not only standardized descriptions and ontologies, but also image samples and evaluations.  In an age where we all scan around at all this software looking at genes and genomic regions, we have to have pathology data as well. And that data will also need to be standardized and stored in appropriate database resources for researchers to find and examine. I recently heard Dr. Robert Cardiff talk about his work on Pathobiology of the Mouse and how crucial it is to capture the information in a standardized and searchable ways. He’s one of the drivers of this project, and fully understands the needs in this arena.

More people should be trained in pathology to examine these animals. So during this project I was impressed to find out about an online learning project that could be helpful for people who need to understand the foundations of animal research and be introduced to important aspects of pathology.  This project has won an award for Outstanding Distance Learning (May 25).  So as a public service in genomics I point you to this UC Davis project.

You can have a look at the background and goals for this from the Center for Genomic Pathology site.  From there you can click the navigation for UCD Information Session to get a taste of their course, or click on my image above.  It’s a nice effort.

We have no relationships with UC Davis or this online learning project–we just thought it was a valuable and important component to genomics and wanted to talk about  it.

Updated Online Tutorial for GeneTests

Comprehensive tutorial on the publicly available GeneTests resource enable researchers to quickly and effectively use this invaluable resource.

Seattle, WA (PRWEB) May 25, 2010 – OpenHelix today announced the availability of an updated tutorial suite on GeneTests.

GeneTests is an integrated resource designed to provide access to current genetic testing and other clinical genetics information. The GeneTests resource includes the Laboratory Directory database, an international directory that identifies the location of clinical laboratories offering genetic testing; and GeneReviews, a collection of up-to-date, comprehensive disease-specific overviews which include clinical descriptions, diagnosis, management, molecular genetics, current genetic testing, and genetic counseling. This tutorials, in conjunction with OpenHelix tutorials on OMIM, dbSNP, GVS, HapMap and many others will give the medical researcher or clinician a set of training resources to help be efficient and effective at accessing and analyzing genomic variation and biomedical data.

The tutorial suites, available through an annual OpenHelix subscription, contain an online, narrated, multimedia tutorial, which runs in just about any browser connected to the web, along with slides with full script, handouts and exercises. With the tutorials, researchers can quickly learn to effectively and efficiently use these resources. The scripts, handouts and other materials can also be used as a reference or for training others.

This tutorials will teach users:
*to perform disease-specific searches and navigate the GeneTests site
*to understand the GeneReviews and Laboratory Directory Displays
*to access additional searches to query the GeneReviews and Laboratory Directory databases by disease feature, gene and protein specific searches, and more
*to identify U.S. and international laboratories offering molecular genetic testing for specific disorders, use the Clinical Directory to locate genetics professionals and services, and investigate additional educational and other resources

To find out more about these and over 90 other tutorial suites visit the OpenHelix Catalog and OpenHelix. Or visit the OpenHelix Blog for up-to-date information on genomics and genomics resources.

About OpenHelix
OpenHelix, LLC, (www.openhelix.com) provides a bioinformatics and genomics search and training portal, giving researchers one place to find and learn how to use resources and databases on the web. The OpenHelix Search portal searches hundreds of resources, tutorial suites and other material to direct researchers to the most relevant resources and OpenHelix training materials for their needs. Researchers and institutions can save time, budget and staff resources by leveraging a subscription to nearly 100 online tutorial suites available through the portal. More efficient use of the most relevant resources means quicker and more effective research.

Tip of the Week: SwissVar, a New Genotype-phenotype Resource from SIB

SwissVar_tip_movieToday’s tip is on a new genotype/phenotype resource from the Swiss Institute of Bioinformatics, or SIB. I was already a fan of many SIB tools and resources, and was using one (ENZYME) when I found a notice about SwissVar. SwissVar is described as ‘a portal to Swiss-Prot diseases and variants.’ It includes information about genotype-phenotype relationships for each specific variant, manually annotated from literature. Manual annotation adds a level of quality and believability to this data. The SwissVar portal also contains various pre-computed information that may aid in determining the effect of the variant. Genotype-phenotype searches can begin with either Medical Subject Headings, or MeSH terms (Disease), gene or protein names (General characteristics) or variants (Functional/structural features). There are multiple ways to modify your searches, and results are clean tables of data including gene/protein accessions, names, links to MeSH definitions and links to variation reports.

If your research could benefit from high quality, manually curated genotype/phenotype information, I suggest you watch this tip, and then explore SwissVar according to your own interests.

SwissVar – a Portal to Swiss-Prot Diseases and Variants: http://www.expasy.ch/swissvar/