Tag Archives: variation

Tip of the Week: Gemini, exploration of genetic variation

You Tube:

This week’s tip of the week is on Gemini which is the acronym for “GENome MINing.” Unlike most of the tips we give every week, this one is a software package. But, it is does use and integrate with many internet databases such as dbSNP, ENCODE, UCSC, ClinVar and KEGG. It’s also a freely available, open source tool and quite a useful software package that gives the researcher the ability to create quite complex queries based on genotypes, inheritance patterns, etc.  The above 12 minute clip is a talk given at a conference that gives a introduction of the science behind the tool.

The abstract from the recent paper from the developers gives a good introduction concerning the functionality of the tool:

Modern DNA sequencing technologies enable geneticists to rapidly identify genetic variation among many human genomes. However, isolating the minority of variants underlying disease remains an important, yet formidable challenge for medical genetics. We have developed GEMINI (GEnome MINIng), a flexible software package for exploring all forms of human genetic variation. Unlike existing tools, GEMINI integrates genetic variation with a diverse and adaptable set of genome annotations (e.g., dbSNP, ENCODE, UCSC, ClinVar, KEGG) into a unified database to facilitate interpretation and data exploration. Whereas other methods provide an inflexible set of variant filters or prioritization methods, GEMINI allows researchers to compose complex queries based on sample genotypes, inheritance patterns, and both pre-installed and custom genome annotations. GEMINI also provides methods for ad hoc queries and data exploration, a simple programming interface for custom analyses that leverage the underlying database, and both command line and graphical tools for common analyses. We demonstrate GEMINI’s utility for exploring variation in personal genomes and family based genetic studies, and illustrate its ability to scale to studies involving thousands of human samples. GEMINI is designed for reproducibility and flexibility and our goal is to provide researchers with a standard framework for medical genomics.

If you’d like to learn more, there is some pretty good documentation of the software package here.

While I’m at it, and totally unrelated except it’s human genomics, there is this slideshare presentation of the ‘current’ state of personal genomics. Current is in quotes because the slideshare is actually from 3 years ago, but there is a lot of good information in there. Anyone know of a more up-to-date slide set or extensive intro to the current state of personal genomics science similar to this?

 

Relevent Links:

GEMINI Software package
dbSNP
ENCODE
UCSC Genome Browser
ClinVar
KEGG

(tutorials are linked below for those tools in bold above)

Relevant Reference:

Paila U, Chapman BA, Kirchner R, & Quinlan AR (2013). GEMINI: Integrative Exploration of Genetic Variation and Genome Annotations. PLoS computational biology, 9 (7) PMID: 23874191

UCSC’s new Variant Annotation Integrator

In case you aren’t on the UCSC announcement mailing list, and you don’t go to the site via their homepage with the posted news–you should know about this new tool at the UCSC Genome Browser. It will take variations that you are exploring and make a prediction about whether the variant is associated with a function, and potentially if it is damaging to a protein. It’s under active development, so try it out. And if there are features you could use, suggest them. See the VAI page for more.

Here are the details via their email, but sign up for the “announce” mailing list to get this news like this in your inbox if you like too:

[Link to the original at the mailing list site]

Hello all,

In order to assist researchers in annotating and prioritizing thousands
of variant calls from sequencing projects, we have developed the Variant
Annotation Integrator (VAI). Given a set of variants uploaded as a
custom track (in either pgSnp or VCF format), the VAI will return the
predicted functional effect (e.g., synonymous, missense, frameshift,
intronic) for each variant. The VAI can optionally add several other
types of relevant information, including: the dbSNP identifier if the
variant is found in dbSNP, protein damage scores for missense variants
from the Database of Non-synonymous Functional Predictions (dbNSFP), and
conservation scores computed from multi-species alignments. The VAI also
offers filters to help narrow down results to the most interesting variants.

Future releases of the VAI will include more input/upload options,
output formats, and annotation options, and a way to add information
from any track in the Genome Browser, including custom tracks.

There are two ways to navigate to the VAI: (1) From the “Tools” menu,
follow the “Variant Annotation Integrator” link. (2) After uploading a
custom track, hit the “go to variant annotation integrator” button. The
user’s guide is at the bottom of the page, under “Using the Variant
Annotation Integrator.”

As always, we welcome questions and feedback on our public mailing list:
genome@soe.ucsc.edu.


Brooke Rhead
UCSC Genome Bioinformatics Group

 

Video Tip of the Week: 1000 Genomes Dataset Browser from NCBI


A recent NCBI Newsletter announced the release of a new resource named the 1000 Genomes Dataset Browser, and that is the resource that I will be featuring in this tip. It is one of the tools available through the new NCBI Variation resources page, which also features resources such as dbSNP, dbVar, dbGaP and ClinVar (many of which OpenHelix has tutorials for) as well as other variation tools – Variation Reporter (pre-release version), Clinical Remap (beta version) and the Phenotype-Genotype Integrator.

Before I discuss NCBI’s 1000 Genomes Dataset Browser, I’d like to spend a bit of time on the 1000 Genomes project, in order to distinguish what is from NCBI and what is from the project itself. From the 1000 Genomes Pilot paper:

“The aim of the 1000 Genomes Project is to discover, genotype and provide accurate haplotype information on all forms of human DNA polymorphism in multiple human populations. Specifically, the goal is to characterize over 95% of variants that are in genomic regions accessible to current high-throughput sequencing technologies and that have allele frequency of 1% or higher (the classical definition of polymorphism) in each of five major population groups (populations in or with ancestry from Europe, East Asia, South Asia, West Africa and the Americas).”

You can access the full paper from the link below. The project has now moved past the pilot phase and is releasing new data all the time. You can see announcements and project details, or access that data, through the official 1000 Genomes project site, or through the official 1000 Genomes version of the Ensembl Browser. As you might imagine for a “big data” project such as this, data has been added to a variety of NCBI databases, including dbSNP, the Sequence Read Archive (SRA) and BioSample. Although you could search for this data through the universal Entrez search system, previously to view the data you would have to view individual results at each separate database. The 1000 Genomes Browser at NCBI has been created as a powerful interface for comprehensively searching for, and viewing, 1000 Genomes data contained in NCBI resources on a single page.

In the video tip I will familiarize you to the various areas of the page - the browser is created with series of widgets, each with its own function. I will not be able to cover all of the features, or demonstrate how users can upload their own variation data to the browser – I’ll leave you the fun of exploring those on your own. Because the tool is so young, bugs and suggestions/comments are still being actively requested – if you find something, check out the FAQs (which discuss bugs at various stages of being fixed) and then email the team.

Quick Links:
NCBI Newsletter announcement July 20, 2012: http://1.usa.gov/RQu5dR

NCBI Variation page: http://www.ncbi.nlm.nih.gov/variation/

NCBI 1000 Genomes Browser page:
http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes/

1000 Genomes Project site: http://www.1000genomes.org/home

The 1000 genomes project specific version of the Ensembl Browser:
http://browser.1000genomes.org

Reference:
The 1000 Genomes Project Consortium (2010). A map of human genome variation from population-scale sequencing Nature, 467, 1061-1073 DOI: 10.1038/nature09534

NHGRI human variation discussion live stream n̶o̶w̶ [recording available]

There’s a meeting going on today that people might be interested in following if you are interested in analysis of human variation. Here’s how Chris Gunter described it on G+ last night:

For serious genetics geeks:  the meeting organized by +Daniel MacArthur and myself (with lots of help from colleagues!), Implicating Sequence Variants in Human Disease, is streaming live here. On til 9 tonight and most of tomorrow during working day.

They have been talking about how they do analysis at their sites, the needs for new databases to support what they are finding, better ways to report variation in the literature, and more.

Live stream here: http://www.genome.gov/27549959

Video recording should be available later.

Video Tip of the Week: SNPeffect 4.0


One of the most frequent questions we hear when we do workshops is: how to I find out if this SNP has an effect on my favorite protein? Well, that’s assuming it is a coding SNP. Of course, promoter SNPs and splicing SNPs and other features would be great to assess as well. Right now, though, the most mature tools are those that look at the effects of variation on the coding of the amino acids in proteins.

We’ve talked before about some tools for this, including PolyPhen2 and SIFT. Each of them will offer different algorithms and options that might help you to explore your SNPs. But another tool is available that you should check out as well: SNPeffect 4.0.

SNPeffect isn’t new–this team has been developing it for a while. But their recent paper that describes new features in the 4.0 version spurred me to have a new look at it. There are some foundational things that are important to know about the data collection in their database. It’s not just a re-hash of dbSNP–it actually relies on another source of variation data. They use the UniProt collection of human proteins as the starting point. If you haven’t used UniProt much, you might not be aware of how much variation they catalog and store that are identified in the proteins (we cover this in our tutorial*). The SNPeffect team takes those variations and evaluates the impact they have on a protein with a variety of algorithms. Some of the variations will correspond to dbSNP entries–but not all of them do. You may find things here that you won’t find in dbSNP. So I would say it’s worth exploring your proteins of interest here as well.

The algorithms they use provide information on a number of features of the protein. TANGO and WALTZ assess protein aggregation and amyloid formation. LIMBO evaluates chaperone binding. Structural stability is predicted by FoldX (if a suitable structure is available). They also use SMART* and Pfam* to see if the variation occurs within domains of the protein. There are some other tools with more protein features examined as well. Check out the paper for more details.

You can also submit proteins of interest to their analysis suite from the “Submit a new SNPeffect job” links.

A new feature highlighted in their paper is the opportunity to do a Meta-analysis on groups of variations. You can explore the features of sets of variants in this way, using the different algorithms they offer.

This short video examines the pipeline, the basic interface, and a couple of sample pages. But you’ll want to go over and try a lot more to learn about your favorite proteins. There’s a lot of information that can come out of this that you might not have known before. Check it out.

*OpenHelix tutorials for these resources available for individual purchase or through a subscription

Quick links to resources discussed:

SNPeffect 4.0 http://snpeffect.switchlab.org/

PolyPhen 2 http://genetics.bwh.harvard.edu/pph2/

SIFT http://sift.jcvi.org/

Reference:

De Baets, G., Van Durme, J., Reumers, J., Maurer-Stroh, S., Vanhee, P., Dopazo, J., Schymkowitz, J., & Rousseau, F. (2011). SNPeffect 4.0: on-line prediction of molecular and structural effects of protein-coding variants Nucleic Acids Research, 40 (D1) DOI: 10.1093/nar/gkr996

Video Tips of the Week: Annual Review IV, 2nd half

As you may know, we’ve been doing these video tips-of-the-week for FOUR years now. We have completed around 200 little tidbit introductions to various resources from last year, 2011 (yep, it’s 2012 now). At the end of the year we’ve established a sort of holiday tradition: we are doing a summary post to collect them all. If you have missed any of them it’s a great way to have a quick look at what might be useful to your work.

You can see past years’ tips here: 2008 I2008 II2009 I2009 II2010 I2010 II. The summary of the first half of 2011 is available from last week.

July 2011

July 6: Prioritizing genes using the Gene Prioritization Portal

July 13: PolySearch, searching many databases at once

July 20: Human Epigenomics Visualization Hub

July 27: The new SIB Bioinformatics Resource Portal

 

August 2011

August 3: SNPexp, correlation between SNPs and gene expression 

August 10: CompaGB for comparing genome browser software

August 17: CoGe, comparing genomes revisited

August 24: Domain Draw for quick motif diagrams

August 31: From UniProt to the PSI SBKB and back again

 

September 2011

September 7: Plant comparative genomics using Plaza

September 14: phiGENOME for bacteriophage genome exploration

September 21: Getting flanking sequences of genomic locations

September 28: Introduction to R statistical software 

 

October 2011

October 5: VnD resource for genetic variation and drug information

October 12: Track Hubs in UCSC Genome Browser

October 19: Mitochondrial Transcriptome GBrowser 

October 26: Variation data from Ensembl

 

November 2011

November 2: MizBee Synteny Browser

November 9: The new database of genomic variants: DGV2

November 16: MapMi, automated mapping of microRNA loci

November 23: BioMart’s new central portal

November 30: Phosphida, a post-translational modification database

December 2011

December 7: VarSifter, for identifying key sequence variations

December 14: Big changes to NCBI’s genome resources

December 21: eggNOG for the Holidays (or to explore orthologous genes)

December 28: Video Tips of the Week: Annual Review IV (first half of 2011)

Video Tip of the Week: Variation Data from Ensembl

Trey introduced me to this “decent collection of video tutorials ” from Ensembl, but he and Mary are currently in Morocco teaching a 3-day bioinformatics workshop & then attending the conference (yes, I am envious!). I am therefore creating this week’s tip based on the tutorials that Trey pointed me to. In today’s tip I am going to parallel a tutorial available from Ensembl on SNP information in order to both: 1) show you haw you can access variation information from Ensembl and 2) compare doing these steps using Ensembl 64 (here in this video) and using Ensembl 54 (archived) (in the Ensembl video).

Bioscience resources often are continuously being developed and improved & it can be difficult to keep videos and documentation up-to-date. That’s why here at OpenHelix we work continuously to keeping our materials up-to-date, with weekly tips on new features and updated tutorials as updated sites become stable.

The Ensembl video (SNPs and other Variations – 1 of 2) is quite nice & provides more detail about the actual Ensembl data than I can in my short movie, but it was done a few years ago on an older version of Ensembl. Since then the resource has been updated, and gone through several new versions of the data. I’m going to follow the same steps that are done in part one of the Ensembl SNP tutorial so that you can see examples of what’s changed & what is pretty much the same. I’d suggest you watch both videos back-to-back to get a good idea of what’s changed, and what types of variation information are available from Ensembl. From that basis I’m sure you’ll be able to watch Ensembl’s second SNP video & apply it to using the current version of Ensembl without much trouble. For more details you can refer to the most recent Ensembl paper in the NAR database  issue, which describes not just variation information but Ensembl as a whole.

Quick links:

Ensembl Browser: http://www.ensembl.org/index.html

Legacy Ensembl Browser (release 54): http://may2009.archive.ensembl.org/index.html

Ensembl tutorial, part 1 of 2: http://useast.ensembl.org/Help/Movie?id=208

Ensembl tutorial, part 1 of 2: http://useast.ensembl.org/Help/Movie?id=211

OpenHelix Ensembl tutorial materials: http://www.openhelix.eu/cgi/tutorialInfo.cgi?id=95

Ensembl Tutorial List: http://useast.ensembl.org/common/Help/Movie?db=core

Reference:
Flicek, P., Aken, B., Ballester, B., Beal, K., Bragin, E., Brent, S., Chen, Y., Clapham, P., Coates, G., Fairley, S., Fitzgerald, S., Fernandez-Banet, J., Gordon, L., Graf, S., Haider, S., Hammond, M., Howe, K., Jenkinson, A., Johnson, N., Kahari, A., Keefe, D., Keenan, S., Kinsella, R., Kokocinski, F., Koscielny, G., Kulesha, E., Lawson, D., Longden, I., Massingham, T., McLaren, W., Megy, K., Overduin, B., Pritchard, B., Rios, D., Ruffier, M., Schuster, M., Slater, G., Smedley, D., Spudich, G., Tang, Y., Trevanion, S., Vilella, A., Vogel, J., White, S., Wilder, S., Zadissa, A., Birney, E., Cunningham, F., Dunham, I., Durbin, R., Fernandez-Suarez, X., Herrero, J., Hubbard, T., Parker, A., Proctor, G., Smith, J., & Searle, S. (2009). Ensembl’s 10th year Nucleic Acids Research, 38 (Database) DOI: 10.1093/nar/gkp972

Video Tip of the Week: VnD Resource for Genetic Variation and Drug Information


In today’s tip I am going to feature a resource that I found recently. I’ve been updating our dbSNP tutorial, which Mary & Trey will be presenting at workshops in Morocco, and also our free PDB tutorial, which is sponsored by the RCSB PDB team. I have therefore been thinking about protein structures and small sequence variations a lot lately. As I explored the latest Database issue of NAR looking for resources to do a tip on, I found an article describing the VnD (genetic Variation and Drug) resource, which can also be accessed at the URL www.vandd.org, according to the NAR article. The article is “VnD: a structure-centric database of disease-related SNPs and drugs“, and figure one shows a veritable Who’s Who of protein, variation and disease resources, so I had to investigate.

What I found at VnD made me sure that this was a resource that I wanted to feature in a tip. VnD is from the Korean Bioinformation Center, or KOBIC, who has a list of databases and tools that they provide. I’ll save the rest of the KOBIC resources for another post & concentrate on VnD here. Compiling data from resources such as RefSeq, OMIM, UniProt, PDB, DrugBank, dbSNP, GAD and more might have been cool enough, depending on how it was done, but the VnD also does their own structure modeling analysis on how the variation affects the protein structure and drug/ligand binding.

This tip movie isn’t long enough to really show you the breadth of what is available from the VnD, but I hope it will be enough to encourage you to read the NAR article (listed below), and to check out VnD. One thing to note: don’t expect to find every dbSNP rs# over there – one that I’ve been using in our tutorial isn’t over there. They are specifically interested in variations within genes that might effect drug binding. But hey, you can’t query DrugBank with rs#s, and I’ve never seen the structure modeling done like VnD, so it is a worthy resource that you may want to investigate if you are interested in how genetic variations connect with disease and drug therapies.

Quick links:

VnD: Variations and Drugs resource -  http://vnd.kobic.re.kr:8080/VnD/index.jsp

Korean Bioinformation Center (KOBIC) – http://www.kobic.re.kr/

RCSB PDB – http://www.pdb.org

OpenHelix Tutorial on the RCSB PDB – http://www.openhelix.com/pdb

dbSNP: Short Genetic Variations, from NCBI -  http://www.ncbi.nlm.nih.gov/projects/SNP/

OpenHelix Tutorial on NCBI’s dbSNP – http://www.openhelix.com/cgi/tutorialInfo.cgi?id=39

For links to other resources and OpenHelix tutorials mentioned in this post, please see our catalog of resources – http://www.openhelix.com/cgi/tutorials.cgi

Reference:
Yang, J., Oh, S., Ko, G., Park, S., Kim, W., Lee, B., & Lee, S. (2010). VnD: a structure-centric database of disease-related SNPs and drugs Nucleic Acids Research, 39 (Database) DOI: 10.1093/nar/gkq957

dbSNP: no longer single….?

I think this is very interesting–dbSNP has a new logo. dbSNP is no longer “single”. Keeping dbSNP as a professional name, but also has a new name for social situations: “Short Genetic Variations”.

I was just checking my twitter feed, and found out something fascinating in the new release.  Here was the item that prompted me to look:

RT @yokofakun: #dbsnp134 has been released: http://www.ncbi.nlm.nih.gov/projects/SNP/docs/build134.txt

Pierre forwarded that notice, and I decided to check out the release notes. Hidden in there is a small piece of information that I think makes a big mental leap for a lot of people….

1) dbSNP logo change (http://www.ncbi.nlm.nih.gov/projects/SNP/)

As there has been confusion about the types of variations dbSNP actually contains, the dbSNP logo text was changed from “Single Nucleotide Polymorphism” to “Short Genetic Variations”. We hope that this change will reflect the wide range of dbSNP’s variation content, and thereby prevent any future misunderstandings.

In spite of its name, dbSNP is not limited to single nucleotide polymorphisms (SNPs), but stores information about multiple small-scale variations that include insertions/deletions, microsatellites, and non-polymorphic variants. dbSNP also stores common and rare variations along with their genotypes and allele frequencies.

Most importantly, dbSNP includes clinically significant variations, and should NOT be assumed to hold only benign polymorphisms.

Some of that stuff will be obvious to a lot of our readers. But you’d be surprised at what we find in the training rooms. Many people are really shocked to see that dbSNP contains a lot more than just single nucleotide polymorphisms. And we make a point of mentioning that the UCSC Genome Browser calls their SNP track “simple nucleotide polymorphisms” to reflect that idea. For many people in our workshops that’s the first time they have processed that knowledge.

In case you are curious, here’s what an old header looked like at dbSNP (I have taken this from our training materials):

I think this is a great move. Subtle, but great. And they must have thought it was important based on that release note piece. dbSNP is no longer single. I feel like I should send a gift….

SNPTips update (1.1)

I did a tip of the week on SNPTips a few months ago (more information there). It’s a great addon to view your genomic data while browsing databases and web sites. They’ve moved to version 1.1. There are two nice new features and some bug fixes. The features are:
*You can now use your deCODEme data, in addition to the 23andme support they started with.
*You can use SNPTips even without raw data to view SNPs on a page.
*and it’s been updated for Firefox 4.x.

You can check our our previous tip here (which still applies :).

SNPTips landing page at 5am Solutions.