Tip of the Week: Genome Variation Tour II

9 June, 2010 (02:28) | Tip of the Week | By: Trey

The last tip of the week I did was Genome Variation Tour I where we started our journey following one SNP in an individual’s genome through various databases to see what we can find out about that variation. In that tip we started out by looking at a SNP in the CYP4F2 gene in the UCSC Genome Browser and followed it to dbSNP. Today’s tip will continue our journey to OMIM to see what information we can find there. We’ll find this variation is clinically associated with Warfarin dosage effects and specifically this individual’s C/T heterozygosity indicates an intermediate dosage for effectiveness if indeed he ever needed this drug.  In some ways, your guess is as good as mine as to what we will find and what avenues we will be taking in the next few tips I’ll be doing. I’m am discovering information as I go along too. I can tell you though that the next installment of the genome variation tour will take us to PubMed, and a few not particularly well known but gem databases perhaps and probably back to the UCSC Genome Browser to expand our look at the interactions of several variations in this individuals genome.

Comments

Comment from gsgs
Time June 9, 2010 at 11:50 AM

these are all human genomes ?
full 3GB ?
aligned ?
is there a list of differences to some other
genome(s) ?
can we share the task to create these lists ?

Comment from Trey
Time June 9, 2010 at 7:10 PM

Yes.
Yes.
Yes.
See first tip or VISTA
UCSC sessions or Galaxy if I understand the question. We have tutorials on all three: http://www.openhelix.com/ucsc
http://www.openhelix.com/galaxy
http://www.openHelix.com/vista

Comment from gsgs
Time June 11, 2010 at 6:49 AM

hmm, do I really need galaxy or can I use my
own programs ? At some point for special tasks
I would presumably have to learn to make my
own programs anyway.

But maybe I can use galaxy also for my flu-sequences ?!?

requires some time to examine…

Comment from Mary
Time June 11, 2010 at 8:36 AM

Hi gsgs–

You can use anything you want, of course. This field is very much the wild west at this point. We are suggesting some things we think would work, but you are certainly not constrained by these.

I think we’ve mentioned in the past that there is a virus implementation of the UCSC Genome browser, with HIV sequences:
http://www.gsid.org/gsid_hiv_data_browser.html

Comment from gsgs
Time June 11, 2010 at 11:11 AM

for each sequenced human genome, I want a list of the positions where it differs from Venter
then a square table of genomes showing their number of differences
how much % of these 1.58M SNPs are synonymous
how much % in the are of those 20K genes

reading wikipedia a bit …

http://en.wikipedia.org/wiki/Human_genome
The completion of the fifth such map was announced in December 2008. The genome mapped was that of a Korean researcher Seong-Jin Kim. Genome maps had previously been completed for

Craig Venter of the U.S. in 2007,
James Watson of the U.S. in April 2008,
Yang Huanming of China in November 2008
Dan Stoicescu in January 2008
Seong-Jin Kim December 2008.

Kim’s genome had 1.58 million SNPs that had never been reported before
Large-scale structural variations are differences in the genome among people that range from a few thousand to a few million DNA bases

http://en.wikipedia.org/wiki/International_HapMap_Project
30 adult-and-both-parents trios from Ibadan, Nigeria (YRI)
30 trios of U.S. residents of northern and western European ancestry (CEU)
44 unrelated individuals from Tokyo, Japan (JPT)
45 unrelated Han Chinese individuals from Beijing, China (CHB)

http://en.wikipedia.org/wiki/DbSNP
data

Comment from Mary
Time June 11, 2010 at 11:41 AM

Have you seen the Personal Genomes track at UCSC? I don’t know if this URL will work for you. If not, go to the Human Mar 2006 assembly, look at the track called Genome Variants.

http://genome.ucsc.edu/cgi-bin/hgTrackUi?hgsid=162143559&c=chrX&g=pgSnp

Comment from gsgs
Time June 11, 2010 at 2:40 PM

yes, that’s it !
I downloaded and looked at
ftp://ftp.hapmap.org/hapmap/jimwatsonsequence/watson_snp.gff.gz
223MB uncompressed , with ~2M SNPs
(SNPs compared with what ?)
Are you writing tools in Galaxy to analyse these files ?

Comment from Mary
Time June 11, 2010 at 5:36 PM

No, we don’t do any tool development per se. But lots of folks are working with Galaxy now. Check out my recent post on the developer slides. And there are already a bunch of tools in Galaxy, and some more planned for integration as well.

Comment from gsgs
Time June 12, 2010 at 2:16 AM

this table with some sort of distance-measure,
I haven’t yet figured out what exactly it is
http://www.nature.com/nature/journal/v460/n7258/fig_tab/nature08211_F1.html

I downloaded 5 mutation files (JW,YH,AK1?, 2 from the
1000-project)
non-uniform format, so far I converted JW to
(for me) computer-readable
trying to read/convert(…/analyze) other genomes today
others must have done it already, mut have these multi-genome-computer-readable-mutation files
what’s the referrence genome ?

can you attach a discussion forum to this blog, so we can
also communicate with others here, the discussion is threaded
and easily searchable and others can also start new topics ?
Or recommend a forum where these things are being discussed.

Pingback from Tip of the Week: A year of tips III (first half of 2010) | The OpenHelix Blog
Time December 22, 2010 at 9:07 AM

[...] June 9: Trey continues his genome variation tour. [...]