BioStar is a site for asking, answering and discussing bioinformatics questions. We are members of the community and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those questions and answers here in this thread. You can ask questions in this thread, or you can always join in at BioStar.
This week’s highlighted question is….
I try to collect known disease causative mutations with full genome coordinate and call information to build a golden standard (and search the obtained list against my full genome data) – BED format is my target to implement bedtool or galaxy on top.
A general comment: why are BED, GFF, or similar shared format not supported by public databases as standard DL format???
I found, with help of colleagues, several sources of disease mutations including:
- OMIM variants extracted by Omicia and provided as a track (OMICIA_auto) on the next release of UCSC tables (http://genome-preview.ucsc.edu/…)
- COSMIC rev54 (now 55 since a couple of days) DL as a text table I had to convert to BED with some perl magic (ftp://ftp.sanger.ac.uk/pub/CGP/cosmic)
- dbSNP was not an easy catch and I am still struggling to get the full information from their difficult batch download system (only feasible through ensembl BIOMART so far: [tip: hg18 BIOMART is at:http://may2009.archive.ensembl.org/biomart/martview/]). For dbSNP, I searched for records with phenotype (thanks to another colleague) which is the only available annotation to pick disease variants but in fact includes many association results which are far from being causative .
REM: As you could notice, I still work with hg18|Build36 but more recent data would do as well with some liftover. If someone has other sources, it would be great to share as this is likely a common request for people willing to mine in patient full genomes.
There’s a great collection of links to resources where you would be able to find these kinds of mutations. Some of them you’ll know (or if you don’t you can see our training on them , some are sponsored, some are subscription)
Check out the answers at BioStar. And if you have other suggestions be sure to add them.