Category Archives: What’s the Answer?

reddit_icon

What’s the Answer? (NGS tools you are using the most)

reddit_iconThere are a lot of tools in the bioinformatics ecosystem. Some of them you won’t ever need because of your particular research focus.  For some of them you may want to know about similar alternatives. But my favorite types of discussions with practitioners of various sorts is what are they actually using. This week’s highlighted question asks that of folks working with NGS data.

reddit question icon

Those working with NGS data – what is your list of most used tools?

submitted  by lc929

Command line, Galaxy browser? Sam tools? Thanks.

It’s really great to know what’s actually in use and working for people. Go have a look at the answers. Contribute some thoughts over there as well if you have favorites.

Biostars

What’s the Answer? (rewarding good coding behavior)

This was a nice twist on the usual complaints about code and documentation. A way to recognize good behavior and illustrate examples that are well done. I liked it.


Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.


Forum: What are your “model” examples for bioinformatics programming?

We’ve talked a lot about Minimum Standards For Bioinformatics Command Line Tools here, but I’m wondering what you consider to be “gold standard” examples of coding/programming that bioinformaticians should read to improve their own coding and understanding.

I’ll start:

Picard – written in Java

Sickle – written in C++

Cutadapt – written in Python

Dan D

Anyway, I thought it was a good idea to show the kind of behavior you want to encourage. Check out the chatter and add other good examples, if you have some.

 

What’s the Answer? (which genetic code)

This week’s highlighted question is a kind of a basic one, but as more and more sequence data comes along from species that people might not have worked with before, it’s kind of a handy tip to keep in your back pocket.


Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.


Question: Database that says what genetic code a particular organism uses?

I’m at ftp://ftp.ncbi.nih.gov/entrez/misc/data/gc.prt with all the genetic codes.  Trying to figure out what genetic code Nostoc punctiforme uses.  Is there a database that has which version they use? I know it’s not standard and they have multiple start codons.

jolespin

The very helpful response from h.mon described where to find it on the NCBI Taxonomy pages and the link to the details. Go click through.

reddit_icon

What’s the Answer? (treasure hunt)

reddit_iconThis week’s highlighted question is from the Bioinformatics subreddit. I thought it was a neat idea–something that totally would have engaged me as an undergrad in the summer with time on my hands. So I thought I’d help out with some promotion for this treasure hunt.

reddit question icon How to promote a bioinformatics themed treasure hunt?

I recently released Excision, an adventure and treasure hunt that takes players through a series of introductory computer and bioinformatics puzzles. Players go through the game as a brand new Agent in a near-future society. There’s even a $50 prize for the first person to make it through all the stages.

I’ve gotten some pretty positive feedback from those who have played, but I don’t know anything about marketing and haven’t done a great job of promoting Excision. If anyone has any suggestions on how I might go about getting more visibility, I’d love to hear any ideas.

And, of course, here’s the link for anyone who wants to check it out for themselves!

http://what-is-bioinformatics.blogspot.com/2015/02/base-camp.html

LifeIsBio

Might be a fun teaching tool for some folks too. Isn’t it what we are all doing, hunting for treasures in the genomes? Anyway, I thought it was a cute concept. Check it out.

Biostars

What’s The Answer? (effective altruists in bioinformatics)

As this week’s lighter holiday fare continues, here’s an interesting and unusual question about career directions related to bioinformatics. I’d love to see how this comes out 10 years from now.


Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.


Forum: A Bioinformatics Career for an Effective Altruist?

Hi.

My situation:
- Living in Portugal. But I have no problems in going to other countries.
- Just finishing my Biology’s Bachelor’s Degree
- Know some Haskell and Matlab
- I’m doing the The Odin Project course (http://theodinproject.com/curriculum). I already know some HTML, some CSS, some Ruby on Rails and a good bit of Rails, I think.

What I Want
- I want to be a Effective Altruist (https://en.wikipedia.org/wiki/Effective_altruism), that is, I want to do the most good possible in the world. I could have a very-well-paid job and give a good percentage of it to charity (a strategy called ‘earning to give’). Or I could have a job with a direct impact on people, like biomedicine research on important problems. Or I could mix the two strategies.

Problem

I’m thinking about getting a Masters on Bioinformatics. But I’m afraid that after finishing it I could get a not-very-well-paid-job compared with other informatics-related jobs and that that job would not have a great positive impact on society. Maybe what I would give to charity with the earnings from other informatics-related job would do more good in the world.

On the other hand, I have no formal education on informatics.Could I really get a well-paid job in the area just from self-learning?

Questions

- How much positive impact does bioinformatics research has on the world?
- How a bioinformatics master’s degree would raise my chances of one day have a really-well-paid-job compared with just self-learning?

Thanks. :D

Luís Campos

Ps: There’s a worldwide community of effective altruists, so I think the answers to this question have the potential to have a high impact on some person’s carrers

–luis1212

I enjoyed reading people’s thoughts on this, and added one of my own.

reddit_icon

What’s the Answer? (subreddit r/genome)

reddit_iconSome researchers who currently enjoy (or at least tolerate) twitter are concerned about its ongoing support, stability, and effectiveness. Some of them are now adding a reddit section to their social media options. I noted this in our Friday SNPpets, but thought I’d also show you a concrete example of how it’s being used.

Reddit version:

Kaiser Permanente cohort of ~100k individuals with genotypes and phenotypes
submitted by josephpickrell

Kaiser Permanente (an insurance company in California) genotyped around 100,000 people [edit: I should note this was in collaboration with UCSF], and made the data available on dbGAP.

Today I noticed three papers describing aspects of the data:

  1. A description of the genotyping and quality control
  2. A comparison of genetic ancestry to self-reported race/ethnicity in the cohort.
  3. A description of a telomere length assay on all individuals.

My personal experience is that these data are well-organized and accessible, so it’s great to see more details on how they were collected. I’ve only seen a few papers that use the data, for example Loh et al. used the data to calculate genetic correlations between several of the phenotypes.

There was the release of some information that might be helpful to researchers. And instead of a fleeting notice on twitter, this post in the subreddit is available at any time. You can also discuss it in longer form, include more links, and your timing doesn’t matter. I can see the value of this. But I know some people aren’t crazy about Reddit. We’ll see how it develops. So far, though, seems popular.

Visit: https://www.reddit.com/r/genome/

reddit_icon

What’s the Answer? (creating a web site to visualize a paper’s results)

reddit_iconThis week’s highlighted question is from the Bioinformatics subreddit. It’s a question about display of your research data for others to access and interact with. I think this will be an issue more and more as huge data sets become available, on a wide range of topics, none of which will fit in to traditional publications. Sometimes there will be data repositories (and I hope that when an appropriate repository is available data will also go there still). But I can imagine some projects that have more or different features than some existing repository. So they may want to provide a place for people to interact with their data above and beyond just download access. So this discussion was interesting to me.

In addition, just yesterday I talked about the ZBrowse tool that sits on top of R and RStudio as commenters discuss here for other situations as well. And if you try out ZBrowse you’ll see some similarity in the software features to the kidney metabolism interface that’s discussed in the replies.

Of course, maintenance over the long run is still an issue. But some of those concerns are possibly being managed better with Docker, Bioboxes, and other strategies for data management and storage that funding agencies are scrutinizing. Or, at least it’s being noticed that these issues need real strategies and oversight.

reddit question icon Creating website for data visualisation?

I’m submitting a paper soon, and we are thinking about providing a website for visualisation of results. (Reading data from a table and then providing graphs and a statistics table for a gene of interest etc)

Is anyone familiar with this process? Any suggestions? And how long will this take to learn and implement.. I’m familiar with bash, R and a bit of python syntax, if that helps

Any advice appreciated!

–willgotskill

Have a look at the discussion threads. I thought it was interesting and worth knowing about how people are solving this.

Haussler_TRICON15

What’s the Answer? (pan-genome graphs)

This weeks highlighted discussion is the problem of pan-genome graphs, which are ways to represent the variation we find in genomes instead of a linear reference sequence view. I was really struggling with these concepts until I heard a talk at the #TRICON meeting recently. David Haussler had some really helpful visuals. I don’t have an audio link to the talk I heard, but I found a similar one. I think it’s a concept people need to consider, because these are going to be coming to us in the near future.


Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.


Forum: Pan-genome graphs make the popular science press

There’s much talk on my twittersphere about this piece in MIT Technology Review: Rebooting the Human Genome. It talks about how the current reference genome concept misses so much of the human variation that we need to capture as we sequence more and more people’s personal genome data.

But there’s also some confusion. I don’t think the concepts of the graphs was really well described in there. In an earlier thread here we talked about it a little, but I wasn’t able to find the talk I’d heard about this which was helpful to me. But I found a similar one, and maybe this will help people to get the idea of the graphs instead of just the current linear view we have of the reference genome.

You can watch the whole thing, of course. But the part about the graph ideas come in to this talk around 52 minutes.

So the idea is that we have to be able to account for the “bubbles” that don’t match a linear reference string. Some bubbles will be alterations, some insertions, some deletions, some inversions–but we can capture this with graph representations that go beyond our current tools. But they are all valid, and we need to know and see this variation better.

Anyway, I’m posting because I think it’s important to be aware of. And I think that even researchers in the field aren’t that familiar with the ideas yet.

This paper was also helpful to me to understand the concepts, but unfortunately is not open access: Building a pan-genome reference for a population. doi: 10.1089/cmb.2014.0146 http://www.ncbi.nlm.nih.gov/pubmed/25565268

If anyone else has good introductions to the representations of these variant graph concepts I’d like to see them.

Edit to add: this paper has some of Haussler’s graphs too: http://arxiv.org/abs/1404.5010

Mary

Hat tip to Mike Schatz for leading me to the Tech Review article originally.

References:

Benedict Paten, Adam Novak, & David Haussler (2014). Mapping to a Reference Genome Structure arXiv.org arXiv: 1404.5010v1

Nguyen, N., Hickey, G., Zerbino, D., Raney, B., Earl, D., Armstrong, J., Kent, W., Haussler, D., & Paten, B. (2015). Building a Pan-Genome Reference for a Population Journal of Computational Biology, 22 (5), 387-401 DOI: 10.1089/cmb.2014.0146

reddit_icon

What’s the Answer? (finding interesting papers)

Keeping current in the field just continues to become more challenging, as the number of publications each week, and the number of big data sets, continues to outpace a normal person’s ability to read. Reddit’s bioinformatics sub is trying out something new to find useful nuggets.

/r/bioinformatics paper discussion thread

Hi all!

This thread is the beginning of hopefully a new tradition in /r/bioinformatics! The idea is to discuss any papers you guys find interesting, helped you solve a problem recently, or whatever reason.

Format

  • We’ll create a thread like this every last friday of the month
  • Let it be sticky for about two weeks
  • People have some time to read it and discuss it
  • Rinse and repeat!

[more details over there…]

I think it’s a nice idea, and I’ll be looking in. Contributing when I can. I added yesterday’s tip paper on ClinGen, because I think the field needs to think a lot about the standards for getting bioinformatics data out to clinical endpoints. Check it out.

Biostars

What’s the Answer? (to Venn or not to Venn)

This week’s highlighted item from Biostars gets back to the visualization challenges that I love to think about. The question posted asked for help for an 11-set Venn diagram. What was funny about the response was that the overwhelming consensus was: please, no! And alternatives were offered.


Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.


Question: Venn diagram with 11 sets

Hi All,

can anyone recommend me a tool/package, which allows me to create Venn diagrams up to 11 sets? The packages which I have found so far can support to create less sets only.

Many thanks!

szuszmok

The question was a frequent problem in various data sets. You want to find the members of groups that overlap in different conditions, treatment situations, genes present or absent in different species, whatever. In the most famous case of Venn illustration, the banana genome team created a much-discussed masterpiece of sets of genes in shared gene families among other plant species. It was so astonishing to look at that it even got Cory Doctorow’s attention: Just look at that banana genome Venn diagram. But genomics Venn diagrams get around. Here’s one that became fashion:

However, part of the problem with the Venn is that it was so difficult to interpret. As a developer of visualization tools told me later, Venns do not scale well for genomics types of data. He was UpSet about genome folks trying to force the data in, and created the very neat UpSet tool to help that: Video Tip of the Week: UpSet about genomics Venn Diagrams?

So I added that to the responses, but there were other suggestions too. Go have a look at the ensuing discussion.