Tag Archives: snps

Tip of the Week: Gemini, exploration of genetic variation

You Tube:

This week’s tip of the week is on Gemini which is the acronym for “GENome MINing.” Unlike most of the tips we give every week, this one is a software package. But, it is does use and integrate with many internet databases such as dbSNP, ENCODE, UCSC, ClinVar and KEGG. It’s also a freely available, open source tool and quite a useful software package that gives the researcher the ability to create quite complex queries based on genotypes, inheritance patterns, etc.  The above 12 minute clip is a talk given at a conference that gives a introduction of the science behind the tool.

The abstract from the recent paper from the developers gives a good introduction concerning the functionality of the tool:

Modern DNA sequencing technologies enable geneticists to rapidly identify genetic variation among many human genomes. However, isolating the minority of variants underlying disease remains an important, yet formidable challenge for medical genetics. We have developed GEMINI (GEnome MINIng), a flexible software package for exploring all forms of human genetic variation. Unlike existing tools, GEMINI integrates genetic variation with a diverse and adaptable set of genome annotations (e.g., dbSNP, ENCODE, UCSC, ClinVar, KEGG) into a unified database to facilitate interpretation and data exploration. Whereas other methods provide an inflexible set of variant filters or prioritization methods, GEMINI allows researchers to compose complex queries based on sample genotypes, inheritance patterns, and both pre-installed and custom genome annotations. GEMINI also provides methods for ad hoc queries and data exploration, a simple programming interface for custom analyses that leverage the underlying database, and both command line and graphical tools for common analyses. We demonstrate GEMINI’s utility for exploring variation in personal genomes and family based genetic studies, and illustrate its ability to scale to studies involving thousands of human samples. GEMINI is designed for reproducibility and flexibility and our goal is to provide researchers with a standard framework for medical genomics.

If you’d like to learn more, there is some pretty good documentation of the software package here.

While I’m at it, and totally unrelated except it’s human genomics, there is this slideshare presentation of the ‘current’ state of personal genomics. Current is in quotes because the slideshare is actually from 3 years ago, but there is a lot of good information in there. Anyone know of a more up-to-date slide set or extensive intro to the current state of personal genomics science similar to this?

 

Relevent Links:

GEMINI Software package
dbSNP
ENCODE
UCSC Genome Browser
ClinVar
KEGG

(tutorials are linked below for those tools in bold above)

Relevant Reference:

Paila U, Chapman BA, Kirchner R, & Quinlan AR (2013). GEMINI: Integrative Exploration of Genetic Variation and Genome Annotations. PLoS computational biology, 9 (7) PMID: 23874191

(one) Video Tip of the Week (to hold them all): Variation and Disease Databases

After again reading Daniel MacArthur’s good rundown about the state of databases of human disease-causing variation from last year (One database to hold them all), I thought it might be nice to do a tip comparing several of them. I couldn’t get it under our self-imposed 5 minute limit for our tips (and technical limit of software I’m using, but that’s about to change). But as I perused our tips and other sites, I found we and others have quite a list of how-to tips to use these databases. So in today’s tip I’ve gathered video tips for 3 of the databases listed in the linked post. Below those tips I’ll link to other how-to videos for additional human variation and disease.

The databases mentioned are OMIM, Human Gene Mutation Database (HGMD), MutaDATABASE and The Human Variome Project . There are video tips for the first three.

OMIM.

Last year OMIM moved to http://www.omim.org and had a entire new interface. Mary was on top of it and did a tip on the new OMIM interface with lots of information on the move and OMIM in the post:

Our full tutorial on the new OMIM is coming soon.

HGMD:
HGMD has a public site and a by-subscription site. The latter includes access to the most current data and some added features. The publicly accessible site is out-of-date by three years. Because of HGMD restrictions, we aren’t able to do a tutorial or a tip on HGMD, but they do have an introduction video to their database:

 

Additionally, there is a good background page for more information.

MutaDATABASE:

Mary did a tip on MutaDatabase last summer:

 

Another excellent resource is Gen2Phen. The Gen2Phen project “aims to unify human and model organism genetic variation databases towards increasingly holistic views into Genotype-To-Phenotype (G2P) data, and to link this system into other biomedical knowledge sources via genome browser functionality.”  In that vein, they have quite an extensive list of Locus-specific databases and additional resources.

There are several other resources available for human disease variation including CGAP, dbGAP, GAD, PhenomicDB and several others. We have tutorials on all those if you wish to check those out.

Of course there’s dbSNP :D of which we have a tutorial and tip about searching human variation.

You can find an extensive list of other resources at Human Genome Variation Society (HGVS).

And an oft-asked question on Biostar is what kind of resources are there for this kind of data. You can find answers here, here and here.

Tip of the Week: Human SNP-coexpression associations, SNPxGE2


Today’s tip is on a new database based on data from a single interesting paper, SNPxGE2. With a  large scale association study from HapMap data (269 individuals, 4 populations, over 500k SNPs and 15k expression profiles), the research reported:

the computationally predicted human SNP-coexpression associations, that is, the differential co-expression between 2 genes is associated with the genotype of an SNP.

This data is organized in an easily searchable database called SNPxGE2. As the paper only came out 2 months ago, it’s a promising database. It’s interesting and helpful as is, but I can see more data being added over time.

Related Links:

SNPxGE2
HapMap
Tip of the Week on SNPexp (correlation between SNPs and expression)
False Discovery Rate article

Wang, Y., Joseph, S., Liu, X., Kelley, M., & Rekaya, R. (2011). SNPxGE2: a database for human SNP-coexpression associations Bioinformatics, 28 (3), 403-410 DOI: 10.1093/bioinformatics/btr663

Video Tips of the Week: Annual Review IV (first half of 2011)

As you may know, we’ve been doing these video tips-of-the-week for FOUR years now. We have completed around 200 little tidbit introductions to various resources. At the end of the year we’ve established a sort of holiday tradition: we are doing a summary post to collect them all. If you have missed any of them it’s a great way to have a quick look at what might be useful to your work.

You can see past years’ tips here: 2008 I, 2008 II, 2009 I, 2009 II, 2010 I, 2010 II. The summary of the second half of 2011 will be available next week here.

January 2011

January 5: SKIPPY predicting variants w/ splicing effects

January 12: Twitter in Bioinformatics. This one was much more popular than I expected!

January 19: PolyPhen, for predicting the possible effects of mutations in genes

January 26: iRefWeb + protein interaction curation

February 2011

February 2: RCSB PDB Data Distribution Summaries

February 9: SIFT, Sorting (SNPs) Intolerant From Tolerant another tool for predicting the impact of mutations in genes.

February 16: Melina II for promoter analysis

February 23: SNPTips and viewing personal genome data This tip is one of the most-watched ones we’ve had. Thousands of views on SciVee!

March 2011

March 2: DAnCER for disease-annotated epigenetics data

March 9: World Tour of Genomics Resources

March 16: Encyclopedia of Life

March 23: ORegAnno for regulatory annotation

March 30: MetaPhoOrs, orthology and paralogy predictions

April 2011

April 6: The Taverna Project for workflows

April 13: VirusMINT , the branch of the Molecular Interaction database for viral interactions

April 20: LAMHDI for animal models

April 27: Dot Plots, Synteny at VISTA

May 2011

May 4: MycoCosm

May 11: InterMine for mining “big data”

May 18: Allen Institute’s Brain Explorer

May 25: SciVee, the YouTube of science

June 2011

June 1: New and Improved OMIM®

June 8: Converting Genome Coordinates

June 15: MutaDATABASE, a centralized and standardized DNA variation database

June 22: Update to NCBI’s Cn3D Viewer

June 29: Orphanet for Rare Disease information

What’s the Answer? databases of disease SNPs

BioStar is a site for asking, answering and discussing bioinformatics questions. We are members of thecommunity and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those questions and answers here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

This week’s question is searching for databases of SNPS ‘causal’ for diseases. As the answers point out, the word ‘causal’ should be used with hesitation when talking about SNPs. That said, the answers gave some good suggestions to this perennially asked question, here’s the first one (others are as useful, so check them out):

Human Gene Mutation Database

http://www.hgmd.cf.ac.uk/ac/index.php

Paid subscription for up-to-date information. Otherwise less up-to-date public version of the database is freely available only to registered users from academic institutions/non-profit organisations.

What’s the Answer? disease causing SNPs

BioStar is a site for asking, answering and discussing bioinformatics questions. We are members of thecommunity and find it very useful. Often

questions and answers arise at BioStar that are germane to our readers (end users of genomics resources).Every Thursday we will be highlighting one of those questions and answers here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

This week’s highlighted question is….

..which is the best database choice from where i can extract a data set of causative variants and a data set of benign variants (OMIM ,GWAS)…

A perennially favorite question. The accepted answer gives a good rundown of how to go about choosing a database. Another answer points to an earlier discussion with a wealth of databases.

What’s the Answer? Open Thread (GWAS genotyping)

BioStar is a site for asking, answering and discussing bioinformatics questions. We are members of the community and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those questions and answers here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

Question of the Week:

How much of the genome is captured by a GWAS?

Two great answers to this questions, a quote from the first one. Click the above link for more.

Human genome encodes 1 SNP/100-300bp; ~3GB sequence ~10million SNPs. It is impossible to analyze such a large number of data due to several limiting factors. To deal with this issue we can use Linkage Disequilibrium (LD) mapping (See section on D’, recombination rate), HaplotypeHaplotype blocks and Haplotype Tag SNPs (tagSNPs). (Read about HapMap project here). Instead of genotyping all the 10M SNPs we can genotype tagSNPs in a haplotype block. This is a representative SNP in a given region of genome with high LD. This will enable to find genetic variation without genotyping all the 10M SNPs. Previous studies indicated that genotyping chips with .5M-1M SNPs will be sufficient for a good GWAS.

Tip of the Week: PolySearch

Polysearch is a tool that allows you to search many different databases at once with a query “given X find all associated Y.” For example, given a pathway, find all associated drugs or given a gene find all associated diseases. I’ve found the tool to be quite helpful.

The tool was listed in last week’s tip of the week on “Gene Prioritization Portal” and does that, but does more than that as you can see.

Today’s tip gives an introduction to the site documentation and search finding associated diseases to a single gene.

It’s a great site, I have found some interesting results. I am having problems with the SNP/PCR Primer and some of the automatic synonym (it pushes my gene-disease search to a disease-gene search) and I’ve reported these, but for the most part it works very well.

SNPTips update (1.1)

I did a tip of the week on SNPTips a few months ago (more information there). It’s a great addon to view your genomic data while browsing databases and web sites. They’ve moved to version 1.1. There are two nice new features and some bug fixes. The features are:
*You can now use your deCODEme data, in addition to the 23andme support they started with.
*You can use SNPTips even without raw data to view SNPs on a page.
*and it’s been updated for Firefox 4.x.

You can check our our previous tip here (which still applies :).

SNPTips landing page at 5am Solutions.

What’s the Answer? Open Thread

BioStar is a site for asking, answering and discussing bioinformatics questions. We are members of the
community and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those questions and answers here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

Question of the week:

Can anyone suggest some tool or validated database…where I can get disease associated SNP data ( like diabetes, etc) and the corresponding PMIDs/ the number of caeses,controls and population studied…

The answers are excellent. The one below was the second highest voted answer, and excellent of course. The highest voted one was a snippet of sql code to query UCSC database for just this answer. Take a look!

roughly speaking, what you (and lots of people around the world) would like to do is actually the main purpose of the HVP project, which is encouraging the creation of locus specific databases (LSDBs) that would collate disease specific variations. right now, all we can do are just 2 things: