Category Archives: What’s the Answer?

What’s The Answer? (GMOD resources training)

BioStars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at BioStars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at BioStars.

I’m going to run this post on Tuesday this week rather than Thursday because this has a deadline that you need to hit if you want to sign up for this and I want you to have a couple more days. And I spend much of Thursday on a plane on my way to do workshops anyway….

Are you interested in an online 5-day course in using the awesome GMOD resources for analyzing, organizing, and displaying your data? In the News area at Biostars I posted this announcement that you should check out.

News: GMOD Online Training 2014 coming up

GMOD Online Training

http://gmod.org/wiki/GMOD_Online_Training_2014

19th-23rd May 2014, approx 9am-6pm US Eastern time

The Generic Model Organism Database (GMOD) project is offering an online training course for those interested in learning how to use and deploy GMOD’s free, open-source bioinformatics software. The GMOD project provides interoperable tools for visualising, storing, and disseminating genetic and genomic data.

The course will be held from 19th-23rd May 2014, with tuition and interaction with tutors occurring between (approximately) 9am and 6pm US Eastern time.

[go over to Biostars to see the list of topics covered and for more details]

–Mary

What’s the Answer? (new Biostars interface)

BioStars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at BioStars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at BioStars.

Ok, it’s not so much a question this week, but a heads-up. There have been some changes over at Biostars that you should be aware of. If you haven’t logged in for a while, it can be interesting.

Here’s the post about the new interface, and some details about the status of some features.

Forum: Biostar 2.0 released, by Istvan Albert

Biostar 2.0 is now live.

This is the result of quite a few months of work and we hope to have added valuable improvements. We may have bugs and display oddities so please bear with us while we identify and fix them.

The overarching theme is that of providing better search and  navigation and improved tracking of existing content.

[lots more details over at the post]

There’s another post about someone who logged in but seemed to have a different account, causing some confusion, so just be aware of that aspect. But if you have trouble you can see the release post for how to get help.

When I first logged in to the new setup I was greeted with many cheery badges having been awarded, and new notifications that weren’t really new notifications. But they made me smile, anyway! Biostar_appreciated

I feel appreciated.

Anyway, check out the new look: Biostars.

If you are new to Biostar, and not a regular reader, there’s more background on the philosophy of the site in this paper:

Parnell L.D., Lindenbaum P., Shameer K., Dall’Olio G.M., Swan D.C., Jensen L.J., Cockell S.J., Pedersen B.S., Mangan M.E. & Miller C.A. & (2011). BioStar: An Online Question & Answer Resource for the Bioinformatics Community, PLoS Computational Biology, 7 (10) e1002216. DOI:

What’s The Answer? (1000 Genomes signatures)

BioStar is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the community and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

In addition to the discussion and question sections, Biostar also has a section that allows contributors to publicize their tools and workshops and stuff like that. It’s called “ads” but don’t let the make you squeamish. I don’t think I’ve seen anything for sale–it’s been used to merely highlight the availability of some database or training so far. Outreach to scientists is tricking–it can be perceived as icky, but we really need ways to tell people about the tools out there.

I’ve been meaning to mention this tool but some other topical things had come up in the meantime. But now I wanted to mention this. It’s a custom installation of a UCSC Genome Browser framework that offers new ways to look at variations from the 1000 Genomes project.

Ad: A database of Signatures of Selection in the 1000 Genomes dataset

The 1000 Genomes Selection Browser is a database of Signatures of Selection in the Human Genome, based on the 1000 Genomes Phase I data. It is freely accessible at http://hsb.upf.edu/

The browser, based on a custom UCSC Genome Browser installment, allows to easily navigate the genome and visualize regions that are candidate for having been involved in an event of selection in any of the African, European, or Asian populations. The data can also be easily downloaded for further analysis here.

Our browser includes a total of 17 tests for selection. For each test of selection, we provide a raw score, plus a ranked score which compares each position to the rest of the genome.

[long list of the tests with further details....]

Go have a look at the details and then go over to the 1000 Genomes Selection Browser and try it out. The paper that describes is is linked below.

Reference:

Pybus M., Dall’Olio G.M., Luisi P., Uzkudun M., Carreno-Torres A., Pavlidis P., Laayouni H., Bertranpetit J. & Engelken J. (2013). 1000 Genomes Selection Browser 1.0: a genome browser dedicated to signatures of natural selection in modern humans, Nucleic Acids Research, 42 (D1) D903-D909. DOI:

What’s the Answer? (PPI images in compartments)

BioStar is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the community and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

This week’s highlighted question is another one of my favorite types–looking for better visualization tools to represent some data features. This time it’s protein-protein interactions (PPI), but organized on the graphical view to show different locations where they are occurring. Maybe it’s nuclear and cytoplasm, maybe it’s Golgi, maybe mitochondria, etc. I’ve seen some graphics like this and I get the utility. And IPA (we assume is the Ingenuity software) can do this it seems–but not everyone will have access to that. An open-source option would be great. But I haven’t tried to make them myself so I was interested to see the suggestions.

Hi,

I am able to create decent PPI network pictures with cytoscape. But I recently noticed that IPA can produce nice visual of networks with membranes, cytoplasm, nucleus i.e. it shows different cellular location of proteins as well as interaction between them. But is there any free software that can produce nice visual of PPI along with such cellular components?

I would also like to know which free software do you use to create stunning visuals of networks.

Any comments will be greatly helpful.

Thanks Diwan

Go have a look at the suggestions. But if you know of some others that would be great too, do share.

What’s the Answer? (pre-defined gene sets)

BioStar is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the community and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

As I mentioned in last week’s tip, I was spurred to look at the cancer data at ICGC from a Biostar question.

Question: Is there any pre-defined gene sets for different cancers

Hi all,

I’m looking for the pre-defined gene sets for different type of cancers. is there any database that I can dowload this gene sets? for example gene sets for Breast Cancer.

jack

But I thought I ought to also mention there were other options for looking for cancer gene data sets that were provided. Go have a look at the question and answers from others. But if you have other ideas please bring them over.

What’s The Answer? (schematic genomic context tree)

BioStar is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the community and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at BioStar.;

This week’s highlighted question just came in late yesterday, so there’s not much chatter yet. But if you have any thoughts on how to create the kind of image that’s desired, it would be great. I’ve seen questions similar to this before so I think there’s demand for it.

 

Is there any software out there that can draw a simple, schematic rendering of a gene block across multiple species? The result should be something like: http://imgur.com/iKolcBU

Plenty of such renderings in STRING, FIG and EcoCyc, to name those that I know. Anyone has a pointer to a standalone or a web-based program that does this?

Iddo

This is exactly the kind of small visualization task that would be great for a little coding project for a class. There are a bunch of these types of requests for tools that aren’t heavy lifting analysis, but helpful visualizations, that might be good tasks. If you have any ideas on how to do this right now though, please go answer over there.

What’s the Answer? (QIIME vs Mothur)

BioStar is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the community and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

This week’s highlighted question is my favorite kind of question. Trying to assess which tools have which features is sometimes challenging. Documentation varies a lot, and because of the nature of a research project rather than a commercial use, it can be hard to get a comparison of different tools and uses from tool the provider’s sites. Having a place to check with folks in the field who aren’t part of one or another software group really helps.

Question: QIIME vs Mothur : Why use one over the other?

In my own “metagenomic” (amplicon based) analysis, I’ve had a preference for using Mothur. My reasons for it are as follows:

  1. Seems like it does a good job (based on analysis with Mock community)
  2. It has a simple SOP that was straight forward to follow and is published

I am very new to community analysis, so Mothur seemed like an easy first step (and I’ve had good support on the forum..so extra bonus). A long the way I’ve gotten questions about using QIIME. After looking into the differences between the two, I still can’t fully grasp what the major differences between the methods are to motivate one person to prefer one method compared to the other. Does one method make the other obsolete?

I would really appreciate if someone who has worked with either (or both) methods could shed some light on why pick one over the other.

amcrisan

Go check out the ensuing discussion. It’s really helpful.

 

What’s The Answer? (WGS demand estimates)

BioStar is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the community and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

This week’s highlighted question was prompted by the news that Craig Venter’s new venture will be sequencing a whole lotta genomes soon.

What’s your estimate for the yearly demand of Whole Human Genome Sequencing ?

The advent of the HiSeq X Ten makes sequencing genomes faster and cheaper.

The “$1K”, 30X Whole Human Genome is now available for $1,400 ($1,550 with library prep):

support

I really don’t know. It’s hard to assess a brand new market, and with the ethical dilemmas, lack of legal protection, FDA interest, confusion of insurance companies–demand isn’t the only aspect. But you can see what others are thinking, and add your own thoughts, here.

What’s The Answer? (SNP calling in polyploids)

BioStar is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the community and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

This week’s question is interesting to me because I’m increasingly aware of the challenges of using tools that focus on humans or mammals on other types of genomes. There are some trickier genomes out there, and some of them are important food crops. But there are also just other types of genome organizations that could be challenging to assess with human-centric tools and assumptions.

Question: Tetraploid SNP calling & SNP filtering

Dear all,

I’m currently working on a project involving SNP calling of resistance genes (R-genes) in 96 potato cultivars. We are interested in identifying SNPs (and INDELs) in the NB domain. The NB domains were enriched using PCR and Illumina paired-end read libraries were created for each cultivar. After quality checking (adapter trimming and read quality trimming) the reads were mapped against the potato DM v4.03 genome using NextGenMap. SNP calling is (to be) performed on the known NB-region coordinates.

And now is it time for my question, how should we do the SNP calling? Potato is a tetraploid organism, so theoretically using samtools mpileup should not cover all SNPs/alleles, because (correct me if I’m wrong) samtools is designed for diploid organisms. After a Google search I found three SNP callers (QualitySNPng, freebayes and UNEAK) which should be able to call SNPs in polyploid organisms. My question to the community is if anyone has experience in polyploid (tetraploid) SNP calling and if there is a recommended SNP caller (or if they all behave similarly), or that we maybe should only call SNPs which are called by multiple SNP callers.

Furthermore, we are uncertain about which parameters to use in SNP calling and filtering. The major problem we face is that we are uncertain when we can actually call a SNP; what allele fraction should we use? Or should we call a SNP if it has at least x (high quality) reads supporting it? And what quality score (QUAL as given in the VCF output format, or any other quality measure) is sufficiently high enough to call a SNP with high confidence?

So far we haven’t been able to find any satisfying answers to these questions and are therefore uncertain how to proceed. Thanks in advance for anyone taking the time to read this and to anyone who is willing to help us with our problems.

pavenhuizen

The discussion has begun with one answer, but if you have other thoughts or know of tools that suit the bill for this, please bring your information over.

 

What’s The Answer? (ALT_LOCI-aware tools)

BioStar is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the community and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

This week’s highlighted question is pretty thoughtful and timely. Most people in the field are probably aware of the new human genome reference sequence that’s been generated and will be percolating to tools we use very soon. We talked about it back in April in this post. But one aspect of this new version is that there will be more segments that are ALT_LOCI, or variations from the primary assembly sequence that are important to capture. There was more emphasis in this new reference assembly to these alternative sequences and they will be made available with the reference sequence.

But how will different tools we use handle those? Although this question was aimed at aligners, it is important to think about how any tools we are using that display, compare, or navigate the reference genome will be affected.

Question: GRCh38 and reference alignment

I came across a slide share about the new GRCh38 assembly and its ALT_LOCI assemblies. (http://www.slideshare.net/vaschn/agbt2014schneider)

Question: 1. Are there any “ALT Loci aware” aligners out there ?

  1. If so, then how are accommodating these ALT LOCI while aligning reads to the reference genome ? How do they chose between primary assembly and Alternate Loci ?

Some references: This Biostar thread (Applying patches to GRCh assembly) talks about patches/Alts during alignment as well as about the aligner – srprism (ftp://ftp.ncbi.nlm.nih.gov/pub/agarwala/srprism)

Nice overview of ALT LOCI in GRCh38 (Will GRC38/HG20 be a multiple sequence reference genome?)

thanks in Advance,

Aniket

Are the tools ready? One answer provides some guidance on aligners (or “multi-allelic reference” aware as Deanna described them) that may be–but they aren’t published yet. If anyone else has other ALT-aware tools please bring ‘em over. Or of you know of any other helpful details or issues around this it would be great for the discussion.

I think this is important to think through as we get more and more individuals sequenced and see more sequence diversity in the population too. And it’s important to be aware that the primary assembly is an excellent snapshot, but it doesn’t show you everything. You should be ALT-Loci aware too.