What's Your Problem? Open Thread

wyp_q_mark2_thumbnail1 Welcome to the “What’s Your Problem?” (WYP) open thread. The purpose of this entry is to allow the community to ask questions on the use of genomics resources. Think of us as a virtual help desk. If you have a question about how to access a certain kind of data, or how to use a database, or what kind of resources there are for your particular research problem, just ask in the comments. OpenHelix staff will keep watch on the comment threads and answer those questions to the best of our knowledge. Additionally, we encourage readers to answer questions in the comments too. If you know the answer to another reader’s question, please chime in! The “WYP” thread will be posted every Thursday and remain at the top of the blog for 24 hours. Questions or problems asked on Thursday will be answered on Thursday to the best of our ability. You can leave questions on other days of the week, but the answer might not come that day.

You can keep up with this thread by remembering to check back, by subscribing to the RSS comments feed to this WYP post or by subscribing to be notified by email of new comments to the post (use checkbox at end of comment form, you can unsubscribe later). If you want to be notified of future WYP posts (every Thursday), you can subscribe to the WYP feed.

10 thoughts on “What's Your Problem? Open Thread

  1. Rob

    My problem is trying to convert gene symbol formats for various mining programs. For example, from NCBI GI to GO. I found DAVID can do this but not all the genes (why? – is there no GO for some genes?) Anyway with every database using their own unique format, these conversions get worse and worse.

  2. Mary

    Hi Rob–Conversions are a problem, but I would agree that GO is more challenging in general. That would be because GO relies on either computational assignment or curator calls, and those can change over time–and yes, some things are not annotated at all. Also, GO is a major moving target–there are always new terms coming in and getting retired. It will never be as tidy as a stable ID from UniProt, for example. And because of the hierarchy layers, some things will be more deeply annotated, some less so.

    But you might try to GO site itself. I think the batch conversion is kind of hidden over there, so I’ll try to explain the steps:
    1. Go to http://geneontology.org
    2. On the homepage, click the link under the text box that says “Browse the Gene Ontology with AmiGO.” That should take you to this page: http://amigo.geneontology.org/cgi-bin/amigo/browse.cgi
    3. On that new search page, in the Navigation Bar at the top you’ll see “search….browse…GOOSE…” etc. Click that Search link. That should take you to the Advanced Search page.
    4. On the Advanced Search page you can upload big lists and pull terms.

    Anyway, see if that gets you some information that might help.

  3. Trey

    Hi Rob,

    You also might want to try out the UCSC Table Browser. The 3rd exercise of our advanced topics tutorial (here: http://www.openhelix.com/ucsc ) gives you a basic rundown on how to pull out different gene symbols. It get GO terms, but you could use a similar procedure to get other IDs (ensembl, etc).

    This is a common problem. I’ll do some more digging around, we’ve seen several tools for this so i’ll have to put some together.

  4. John

    Is there a way to get Gene ID or Entrez ID for RS SNP’s in Batch. I have thousand SNPs with RS numbers. I need to find their Gene ID or Ensemble or Entrez ID’s.

  5. Trey

    Hi John,

    The following two choices aren’t perfect in that they pull lots of other data (not just IDs), but you might want to try batch GVS (http://gvsbatch.gs.washington.edu/GVSBatch/) or batch dbSNP (http://www.ncbi.nlm.nih.gov/SNP/dbSNP.cgi?list=rsfile) both of which will give you a file of data from rsIDs with gene names and symbols. It’s not perfect, in that you’ll have to do a couple other steps like put into an excel format or database format and then eliminate the other data and it might not get you exactly the IDs you are looking for (perhaps the name and not the ensembl ID or something), but it’s a start.

    Well, now you’ve got me on a quest. There must be tool that does this cleaner. I’ll check around.

    Trey

  6. Jennifer

    Hi John,

    Have you tried running a batch dbSNP search & then switch the display from ‘graphical summary’ to ‘flatfile’ or ‘summary’? Those two displays include gene names and locus_ids. You could output that & then write a script to pull rsIDs with locus_ids. Perhaps there is even a Scriptome script already written that will help with that.

    Or perhaps merge a list of your rsIDs with a dbSNP ftp download file using a Scriptome script.

    Entrez’ batch query:
    http://www.ncbi.nlm.nih.gov/sites/batchentrez?db=snp

    Scriptome:
    http://sysbio.harvard.edu/csb/resources/computational/scriptome/UNIX/

    I also found this resource, but didn’t play with it at all:
    http://bio.kuas.edu.tw/snpid-info/introduction.jsp

  7. Trey

    John, all the resources do what the question requires: batch input a large list of rsIDs and return a list of gene names or IDs along with other information. I’ve done some test analyses and all the ones pointed out here get me this list.

    The IDs might not be exactly the ones required (might be accession numbers and not HUGO names for example), or there might be some extraneous data, so a few other steps might be required to get exactly the information in the format that is needed. These tools, with the necessary steps, will do that.

Comments are closed.