In an earlier What’s Your Problem thread, a researcher had hundreds of SNP locations where they were trying to easily obtain the flanking sequence of those hundreds of SNPs without having to go to each location in the UCSC Genome Browser and eyeballing. There are probably a few ways to do this, but I found that Galaxy was a good place to start. So, the tip this week is taking two SNP locations on the human genome and obtaining the flanking sequence from those locations and returning a file that could be saved either as a spreadsheet, text or even made back into a UCSC Genome Browser custom track that can then be uploaded, viewed and searched at UCSC. The process for individual researchers will be a bit different depending on the data and how the excel/worksheet/file is configured, but hopefully you’ll get the idea. The steps are thus:
1. Upload your file (tab delineated text)
2. Convert file to the ‘interval’ format
3. Cut out any columns of data from original file to save for later use.
4. Get flanking chromosomal locations (then merge upstream and downstream records into one record)
5. Get flanking sequence
6. Paste data columns from step 3 to the data columns (chromosomal location and sequence) from step 5.
Voila, now you have a tab-delineated text file that can be opened in Excel, made into a custom track (in Galaxy), etc.
Any suggestions on other methods for doing this?
(OpenHelix does training on Galaxy and UCSC Genome Browser).