Category Archives: What’s the Answer?

Biostars

What’s The Answer? (effective altruists in bioinformatics)

As this week’s lighter holiday fare continues, here’s an interesting and unusual question about career directions related to bioinformatics. I’d love to see how this comes out 10 years from now.


Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.


Forum: A Bioinformatics Career for an Effective Altruist?

Hi.

My situation:
- Living in Portugal. But I have no problems in going to other countries.
- Just finishing my Biology’s Bachelor’s Degree
- Know some Haskell and Matlab
- I’m doing the The Odin Project course (http://theodinproject.com/curriculum). I already know some HTML, some CSS, some Ruby on Rails and a good bit of Rails, I think.

What I Want
- I want to be a Effective Altruist (https://en.wikipedia.org/wiki/Effective_altruism), that is, I want to do the most good possible in the world. I could have a very-well-paid job and give a good percentage of it to charity (a strategy called ‘earning to give’). Or I could have a job with a direct impact on people, like biomedicine research on important problems. Or I could mix the two strategies.

Problem

I’m thinking about getting a Masters on Bioinformatics. But I’m afraid that after finishing it I could get a not-very-well-paid-job compared with other informatics-related jobs and that that job would not have a great positive impact on society. Maybe what I would give to charity with the earnings from other informatics-related job would do more good in the world.

On the other hand, I have no formal education on informatics.Could I really get a well-paid job in the area just from self-learning?

Questions

- How much positive impact does bioinformatics research has on the world?
- How a bioinformatics master’s degree would raise my chances of one day have a really-well-paid-job compared with just self-learning?

Thanks. :D

Luís Campos

Ps: There’s a worldwide community of effective altruists, so I think the answers to this question have the potential to have a high impact on some person’s carrers

–luis1212

I enjoyed reading people’s thoughts on this, and added one of my own.

reddit_icon

What’s the Answer? (subreddit r/genome)

reddit_iconSome researchers who currently enjoy (or at least tolerate) twitter are concerned about its ongoing support, stability, and effectiveness. Some of them are now adding a reddit section to their social media options. I noted this in our Friday SNPpets, but thought I’d also show you a concrete example of how it’s being used.

Reddit version:

Kaiser Permanente cohort of ~100k individuals with genotypes and phenotypes
submitted by josephpickrell

Kaiser Permanente (an insurance company in California) genotyped around 100,000 people [edit: I should note this was in collaboration with UCSF], and made the data available on dbGAP.

Today I noticed three papers describing aspects of the data:

  1. A description of the genotyping and quality control
  2. A comparison of genetic ancestry to self-reported race/ethnicity in the cohort.
  3. A description of a telomere length assay on all individuals.

My personal experience is that these data are well-organized and accessible, so it’s great to see more details on how they were collected. I’ve only seen a few papers that use the data, for example Loh et al. used the data to calculate genetic correlations between several of the phenotypes.

There was the release of some information that might be helpful to researchers. And instead of a fleeting notice on twitter, this post in the subreddit is available at any time. You can also discuss it in longer form, include more links, and your timing doesn’t matter. I can see the value of this. But I know some people aren’t crazy about Reddit. We’ll see how it develops. So far, though, seems popular.

Visit: https://www.reddit.com/r/genome/

reddit_icon

What’s the Answer? (creating a web site to visualize a paper’s results)

reddit_iconThis week’s highlighted question is from the Bioinformatics subreddit. It’s a question about display of your research data for others to access and interact with. I think this will be an issue more and more as huge data sets become available, on a wide range of topics, none of which will fit in to traditional publications. Sometimes there will be data repositories (and I hope that when an appropriate repository is available data will also go there still). But I can imagine some projects that have more or different features than some existing repository. So they may want to provide a place for people to interact with their data above and beyond just download access. So this discussion was interesting to me.

In addition, just yesterday I talked about the ZBrowse tool that sits on top of R and RStudio as commenters discuss here for other situations as well. And if you try out ZBrowse you’ll see some similarity in the software features to the kidney metabolism interface that’s discussed in the replies.

Of course, maintenance over the long run is still an issue. But some of those concerns are possibly being managed better with Docker, Bioboxes, and other strategies for data management and storage that funding agencies are scrutinizing. Or, at least it’s being noticed that these issues need real strategies and oversight.

reddit question icon Creating website for data visualisation?

I’m submitting a paper soon, and we are thinking about providing a website for visualisation of results. (Reading data from a table and then providing graphs and a statistics table for a gene of interest etc)

Is anyone familiar with this process? Any suggestions? And how long will this take to learn and implement.. I’m familiar with bash, R and a bit of python syntax, if that helps

Any advice appreciated!

–willgotskill

Have a look at the discussion threads. I thought it was interesting and worth knowing about how people are solving this.

Haussler_TRICON15

What’s the Answer? (pan-genome graphs)

This weeks highlighted discussion is the problem of pan-genome graphs, which are ways to represent the variation we find in genomes instead of a linear reference sequence view. I was really struggling with these concepts until I heard a talk at the #TRICON meeting recently. David Haussler had some really helpful visuals. I don’t have an audio link to the talk I heard, but I found a similar one. I think it’s a concept people need to consider, because these are going to be coming to us in the near future.


Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.


Forum: Pan-genome graphs make the popular science press

There’s much talk on my twittersphere about this piece in MIT Technology Review: Rebooting the Human Genome. It talks about how the current reference genome concept misses so much of the human variation that we need to capture as we sequence more and more people’s personal genome data.

But there’s also some confusion. I don’t think the concepts of the graphs was really well described in there. In an earlier thread here we talked about it a little, but I wasn’t able to find the talk I’d heard about this which was helpful to me. But I found a similar one, and maybe this will help people to get the idea of the graphs instead of just the current linear view we have of the reference genome.

You can watch the whole thing, of course. But the part about the graph ideas come in to this talk around 52 minutes.

So the idea is that we have to be able to account for the “bubbles” that don’t match a linear reference string. Some bubbles will be alterations, some insertions, some deletions, some inversions–but we can capture this with graph representations that go beyond our current tools. But they are all valid, and we need to know and see this variation better.

Anyway, I’m posting because I think it’s important to be aware of. And I think that even researchers in the field aren’t that familiar with the ideas yet.

This paper was also helpful to me to understand the concepts, but unfortunately is not open access: Building a pan-genome reference for a population. doi: 10.1089/cmb.2014.0146 http://www.ncbi.nlm.nih.gov/pubmed/25565268

If anyone else has good introductions to the representations of these variant graph concepts I’d like to see them.

Edit to add: this paper has some of Haussler’s graphs too: http://arxiv.org/abs/1404.5010

Mary

Hat tip to Mike Schatz for leading me to the Tech Review article originally.

References:

Benedict Paten, Adam Novak, & David Haussler (2014). Mapping to a Reference Genome Structure arXiv.org arXiv: 1404.5010v1

Nguyen, N., Hickey, G., Zerbino, D., Raney, B., Earl, D., Armstrong, J., Kent, W., Haussler, D., & Paten, B. (2015). Building a Pan-Genome Reference for a Population Journal of Computational Biology, 22 (5), 387-401 DOI: 10.1089/cmb.2014.0146

reddit_icon

What’s the Answer? (finding interesting papers)

Keeping current in the field just continues to become more challenging, as the number of publications each week, and the number of big data sets, continues to outpace a normal person’s ability to read. Reddit’s bioinformatics sub is trying out something new to find useful nuggets.

/r/bioinformatics paper discussion thread

Hi all!

This thread is the beginning of hopefully a new tradition in /r/bioinformatics! The idea is to discuss any papers you guys find interesting, helped you solve a problem recently, or whatever reason.

Format

  • We’ll create a thread like this every last friday of the month
  • Let it be sticky for about two weeks
  • People have some time to read it and discuss it
  • Rinse and repeat!

[more details over there…]

I think it’s a nice idea, and I’ll be looking in. Contributing when I can. I added yesterday’s tip paper on ClinGen, because I think the field needs to think a lot about the standards for getting bioinformatics data out to clinical endpoints. Check it out.

Biostars

What’s the Answer? (to Venn or not to Venn)

This week’s highlighted item from Biostars gets back to the visualization challenges that I love to think about. The question posted asked for help for an 11-set Venn diagram. What was funny about the response was that the overwhelming consensus was: please, no! And alternatives were offered.


Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.


Question: Venn diagram with 11 sets

Hi All,

can anyone recommend me a tool/package, which allows me to create Venn diagrams up to 11 sets? The packages which I have found so far can support to create less sets only.

Many thanks!

szuszmok

The question was a frequent problem in various data sets. You want to find the members of groups that overlap in different conditions, treatment situations, genes present or absent in different species, whatever. In the most famous case of Venn illustration, the banana genome team created a much-discussed masterpiece of sets of genes in shared gene families among other plant species. It was so astonishing to look at that it even got Cory Doctorow’s attention: Just look at that banana genome Venn diagram. But genomics Venn diagrams get around. Here’s one that became fashion:

However, part of the problem with the Venn is that it was so difficult to interpret. As a developer of visualization tools told me later, Venns do not scale well for genomics types of data. He was UpSet about genome folks trying to force the data in, and created the very neat UpSet tool to help that: Video Tip of the Week: UpSet about genomics Venn Diagrams?

So I added that to the responses, but there were other suggestions too. Go have a look at the ensuing discussion.

Biostars

What’s the Answer? (bioinformatics one-liners)

As much of a fan as I am of web-based tools for accessing what you need, there are times when the command line can so quickly accomplish what you need. When I looked at this new post at Biostars it was already hugely popular 3 hours into the day. So I think this captured some attention from the field, and some of our readers might want to check out some of these ideas, or offer your own. So, this week’s unusual highlighted question is about the command line.


Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.


Question: Best bioinfo one-liners?

Whereas an infinity of efficient tools exists out there, it is sometimes still quicker for achieving simple tasks to execute a one linux command. I’m starting by sharing 3 I use quite often.

##1 get the sequences length distribution form a fastq file using awk
zcat file.fastq.gz | awk 'NR%4 == 2 {lengths[length($0)]++} END {for (l in lengths) {print l, lengths[l]}}'

##2 Reverse complement a sequence (I use that a lot when I need to design primers)
echo 'ATTGCTATGCTNNNT' | rev | tr 'ACTG' 'TGAC'

##3 split a multifasta file into single ones with csplit:
csplit -z -q -n 4 -f sequence_ sequences.fasta /\>/ {*}

I may be wrong, but I’ve not found such a list in Biostars.

So, what comes to your mind? I hope this post will yield some gold nuggets ;-)

Manu Prestat

There was a lot of chatter–have a look.

Biostars

What’s The Answer? (brain connectome)

This week’s highlighted item lets you find answers in brains. What do the brain connections look like in 3D? I love 3D brain maps–not in a zombie manner, just in an astonishing complexity manner. And although this is a different type of computational resource than we usually explore, I thought it was interesting.


Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.


Tool: Budapest Reference Connectome: a 3D visualization tool to browse connections in the human brain

I am pleased to announce a new tool in neuroscience (connectomics): Budapest Reference Connectome a 3D visualization tool to browse connections in the human (consensus) brain.

The connectome has 1015 nodes, but those corresponding to the same larger cerebral area are drawn at the same spot, for a cleaner view. You can use your mouse to zoom and rotate, and and if you click a node you can see only connections of that node.

We welcome any comment, question and suggestion.

[couple of sample images are included over there]

Csaba Kerepesi

The idea of the “human consensus brain” makes me giggle. We could really use that. But as if we could all agree on one! That said, you have the option of using the menus over there to load up a female brain or a male brain as well as the consensus brain. I went over and tried it out.

Another neat thing about their project is that they are using Biostar as a option for support. I think that’s really neat. I’m so over mailing lists. And yet I still have to read them all the time.

Reference:
Szalkai, B., Kerepesi, C., Varga, B., & Grolmusz, V. (2015). The Budapest Reference Connectome Server v2.0 Neuroscience Letters, 595, 60-62 DOI: 10.1016/j.neulet.2015.03.071

What’s the Answer? (user friendly software)

reddit_iconThis week’s highlighted chatter is about the never-ending quest for better ways to access and use other people’s software. I don’t think there’s anything new here, but it may be a nice reminder for developers that others want to use the things you are developing–make it easier for them to do so.

reddit question icon Tips for developing more user friendly bioinformatics software?

This seems to be a reoccurring theme: I read a cool new bioinformatics paper that develops some method for doing exactly what I want to try out on my data. I try to find the code so I can apply the method to my data. Some times the code is not available so I have to contact the author. Other times, the code is available but so poorly documented that I have to contact the author and ask for clarification. Most frequently, the code is available, reasonably documented, but takes some strange input format that I’m not sure how to massage my data into and I spend a lot of time just getting everything in the right format.

What are some of your tips, suggestions, or recommendations for developing more user friendly bioinformatics software? There must be industry standards that we can learn and borrow from.

JEFworks

The ensuing discussion was valuable. Good ideas, good techniques. Have a look.

What’s the Answer? (woolly mammoth ORFs)

This week’s highlighted question was interesting to me in a couple of ways. It was a good question about the recent analysis of the woolly mammoth genome, making it a nice example of post-publication discussion. But mostly I just loved the chatter about issues and challenges around extinct organisms and their sequences. We are living the in the future now. And that’s so awesome.


Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.


Question: Where are the mammoth’s ORFs?

Not sure if anyone from the Swedish Museum of Natural History is on this forum but does anyone know any plans to process the bam files from http://www.ebi.ac.uk/ena/data/view/ERP008929 into something we can actually use for looking at protein evolution? Might Ensembl eventualy pick up the data for their pipeline Emily_Ensembl ? and/or the NCBI ? This is not the first time journal editors allow a new genome paper without the genome in question being in any usable form for biologists

“Complete Genomes Reveal Signatures of Demographic and Genetic Declines in the Woolly Mammoth”

http://www.citeulike.org/user/cdsouthan/article/13590852

cdsouthan

I loved Emily’s explanation, and this part: “…mammoths have no active transcription…”. I thought to myself, well, not yet. The Plan to Turn Elephants Into Woolly Mammoths Is Already Underway. George Church and CRISPR are on the way back to the future already.