“What’s Your Problem?” Open Thread

Welcome to the “What’s Your Problem?” (WYP) open thread. The purpose of this entry is to allow the community to ask questions on the use of genomics resources. If you have a question about how to access a certain kind of data, or how to use a database, or what kind of resources there are for your particular research problem, just ask in the comments. OpenHelix staff will keep watch on the comment threads and answer those questions to the best of our knowledge. Additionally, we encourage readers to answer questions in the comments too. If you know the answer to another reader’s question, chime in.

The “WYP” thread will be posted every Thursday and remain at the top of the blog for 24 hours.

You can keep up with this thread by remembering to check back, by subscribing to the RSS comments feed to this WYP post or by subscribing to be notified by email of new comments to the post (use checkbox at end of comment form, you can unsubscribe later). If you want to be notified of future WYP posts (every Thursday), you can subscribe to the WYP feed.

9 thoughts on ““What’s Your Problem?” Open Thread

  1. Pingback: “What’s Your Problem?” Open Thread | The Open Helix Blog

  2. gsgs

    I want all public flu-viruses in one big
    database in unified computer-readable format with updating !

    problems: determine which segments belong to the same virus,
    FTP-downloads have different format while download of single records takes long and is hard to process
    collection days should be given, not only years, sequences are withheld in private databases by WHO,CDC,etc. sequences are uploaded with delay (often 1-2 years) while the pandemic threat is imminent.

  3. Jennifer

    Hi gsgs,

    I’ve just begun to develop a set of tutorials on microbial and viral databases, but I have found a few ‘influenza’ specific databases including NCBI’s Influenza virus Resource http://www.ncbi.nlm.nih.gov/genomes/FLU/Database/select.cgi?go=1 (looks updated according to the dates on their ftp site) and the IVDB – Influenza Virus Database http://influenza.genomics.org.cn/ There are also lots of other sources of viral information, but that are not so “flu-centric”.

    And I’d appreciate any comments you have on any viral resources you use currently!

  4. gsgs

    hi Jennifer,
    thanks for replying. I knew genbank but not IVDB. I have lots of partial flu-databases, my biggest has records which look:

    genbank code1,genbank code2,name,country,year,species,segment,serotype,virus number,,
    (3-8 for flu.lanl.gov, which not always matches genbank)

    and then the nucleotides, not yet aligned. Some errors corrected, country-names and species-names and virus names partially unified.
    From Mar.2007, not yet updated.
    Plus aligned files for each segment, but actual ones with weaker genbank format.
    Then I use self written programs to manipulate, display mutations,…

    you can send email to sterten{at)aol.com

  5. Jennifer

    Hi again gsgs,

    If you want to align viral sequences to viral genomes, have you looked into the JGI’s Integrated Microbial Genomes Expert Review (IMG/ER) resource? It is a companion resource to their public IMG database (which has over 1500 viral genomes), but it is password restricted & allows users to add annotation to genomes. I LOVE the IMG resource, but have not used the IMG/ER yet.

    Their self-description of the resource is “The Integrated Microbial Genomes- Expert Review (IMG/ER) system provides support to individual scientists or group of scientists for functional annotation and curation of their microbial genomes of interest. Often such genomes have not been yet deposited into the public genome sequence archives, and therefore access to them is restricted to specific scientists or groups.”

    You can go to http://durian.jgi-psf.org/cgi-bin/img_i_er/main.cgi and request a user name, or more information through their comment link, if you think it might be of use to you.

  6. Mary

    Thanks much, Chris! That’s exactly what we were hoping–that people would bring their information over and contribute the knowledge to these types of questions.

  7. gsgs

    hi Jennifer,

    for alignment I use this:
    also kalign.exe, but kalign doesn’t work so well for
    me with >400 sequences.

    Thanks for the IMG link, I requested an account, let’s see.

    I think, everyone who works with flu-sequences should have his own
    complete database with all available flu-sequences in optimal
    standardized computer-readable form. Plus lots of utilities how
    to handle that database. That DB should be ~20MB compressed
    and monthly or weekly updated and available for download.

    When you download some sequences (e.g. from ftp) for a special
    purpose from genbank, you find the records in non-uniform format.
    (Vietnam and Viet Nam, chicken and Chicken, (A/…) and
    strain A/…, only year not exact date, sex,age)
    This makes it hard for utilities to process the database.

  8. gsgs

    why are there so many different formats and errors and different spellings, enumerations ?

    IMO there should be one complete
    standard database available for download, updated, where everyone can correct errors.
    All fields should be included, but some are maybe not available and left blank.

    accession code,name,country,date/year,upload-date,host species,serotype [+maybe types of all 8 segments],
    sex,age of host,clade,resistances,
    virus number,sequence number,

    and the sequences should be aligned, phylogenetically sorted,
    or sorted by virus number,

    this is just one universal flu-database, not everyone should be required to build it himself but once it’s done, it could be shared with others.

    and commonly improved,updated

Comments are closed.