Category Archives: What’s the Answer?

What’s The Answer? (Internet of DNA)

This week’s highlighted discussion tackles the “Internet of DNA”, a story I picked last week in my SNPpets post, which has bubbled up elsewhere. And Biostar folks look at the more technical implications of “A global network of millions of genomes….”


Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

This week’s discussion comes as part of an interesting week on the personalized medicine front. A whole bunch of things are coming together–the US getting a Chief Data Scientist who talks about bioinformatics, The NEJM talking about training physicians to deal with medical genomics issues, and the “Internet of DNA” getting out into the popular science media realm. So have a look at what bioinformatics nerds made of this, and what their thoughts are:

Forum: A global network of millions of genomes could be medicine’s next great advance. | Beacon

Internet of DNA

A global network of millions of genomes could be medicine’s next great advance.

Availability: 1-2 years

Noah is a six-year-old suffering from a disorder without a name. This year, his physicians will begin sending his genetic information across the Internet to see if there’s anyone, anywhere, in the world like him.

http://www.technologyreview.com/featuredstory/535016/internet-of-dna/

Do you think this will happen within 2 years?

Edit:

This is the technical implementation I think  that they are talking about:

The Beacon project is a project to test the willingness of international sites to share genetic data in the simplest of all technical contexts. It is defined as a simple public web service that any institution can implement as a service. The service is designed merely to accept a query of the form “Do you have any genomes with an ‘A’ at position 100,735 on chromosome 3″ (or similar data) and responds with one of “Yes” or “No.” A site offering this service is called a “beacon”.

http://ga4gh.org/#/beacon

So it just a federated query over multiple large genomics (+ phenotypes) data sets. Full genomes are not centralized, or moved, so privacy is less of a concern.

William

And please, contribute your own thoughts over there. We need to be having this discussion. Also, watch for more on this Beacon….

What’s the Answer? (RStudio as a game-changer)

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

This week’s highlighted Biostar item is part of the week’s them on statistical computing. The post comes from someone who is a biologist, is remembering what it was like before we had the nice RStudio interface. And he offers some hand-holding to get new users started.

Tutorial: Few words for R beginners

Hi,

As a biologist who started to learn R, I encountered a lot of problems on learning the subject. Now I don’t want to go into them but I just want to suggest what I think that can save you from wasting your time and energy fooling around without getting what you expect.

  1. Install R ! Of course!
  2. Install R-studio, this simplifies your life. Note: R-studio should be installed after R. (http://www.rstudio.com/). After this you always open R-studio not R. R is the actual program but R-studio gives it the nice interactive interface.
  3. Watch this webinar on R to get familiar with basics and why it’s good to have R-studio. http://bitesizebio.com/webinar/20600/beginners-introduction-to-r-statistical-software/
  4. Coursera offers this very nice course in R. Get the videos from their website and of course watch them! (https://www.coursera.org/course/rprog)
  5. While learning from the course, practice with swirl ( http://www.swirlstats.com ). Swirl was the best R teacher for me. It interactively makes you work around with R.
  6. Also https://www.datacamp.com/courses/introduction-to-r or generally https://www.datacamp.com is very good resource for self learners!
  7. Stuar51XT is a youtube channel that has very nice comprehensive R courses. Just in their videos search for “introduction to R programming” https://www.youtube.com/user/Stuar51XT .
  8. Practice and expand bioinformatics oriented R skills by “Institute for Integrative Genome Biology” manual. http://manuals.bioinformatics.ucr.edu/home/R_BioCondManual

If I go back to my pre-R era I would follow the above. I think its a good kick-off for those who want to learn R and start getting familiar with R’s environment.  I hope it helps you =)

Cheers!

–Parham

But I also loved this response:

I would add, as someone who started using R around 13 years ago: RStudio has been a complete game-changer. It has made the software far more accessible to more people, brought together a great combination of developers, been responsible for many useful, innovative packages and all-in-all, is just A Good Thing.            – Neilfws

See, it’s not just me trying to lure you to RStudio. It is A Good Thing. There are some other comments over there too with more tips or chatter. Go have a look.

 

What’s the Answer? (wet lab software)

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

This week’s highlighted question is about some wet lab software. Typically we are looking at genomics analysis tools, but with the high-throughput nature of current biology it seems to me there’s good opportunity for more of this type of resource management software too. And the software developer is looking for some feedback from other types of researchers.

Tool: StrainControl Laboratory Manager software

Dear all,

I have read some posts regarding lab software’s, so I thought that maybe ours could be of interest.

We that have developed the software, StrainControl Laboratory Manager, all work in the field of science.

Last year StrainControl was released to the research community.

StrainControl is a lab software that allows to you to store everything in the lab in one place.
Currently there are about 700 labs that are using StrainControl and they are all satisfied.

Some key functions:
1) Handle strains, cell-lines, oligos, plasmids, chemicals and inventories.
2) Link plasmid data to strains or cell-lines.
3) Ability to rename any field and text to your own needs.
4) Customize which data columns should be visible.
5) User management allowing you to create read, write, administrator accounts etc.
6) Read-access from cloud drive (dropbox etc) and network support.
7) Create reports (over 20 formats)

What I´m interested in is if any other research fields (beside basic research labs) can make any use of the software since any field can be renamed to fit a different research field.

We would be very happy if you could give it a try and comment how the software works for you.

More information: http://www.straincontrol.com

Thank you in advance,
Chris Ericsson, PhD

And some of our most popular blog posts are about colony management software, electronic lab notebooks, and other sorts of routine stuff–not just analysis tools. So have a look and see if this is useful. Or if you have other tools like that which you find essential to lab work, let  me know. I’d love to have a look.

What’s the Answer? (free images for science communications)

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

This week’s highlighted question isn’t about the usual topics–software or tools for genomics per se. Someone recently asked about accessing free images for papers, and this is also something I need for presentations. I mean, I know this isn’t directly related to genomics necessarily–but communicating is pretty useful too. And pretty common in science….

Question: Images — Free common images for articles?

Hello!
I’d like to ask You — is there any open database for downloading free (totally newbish) images such as “DNA structure”, “protein synthesis” and so for placing them into article? NCBI used to have image search which now refuses to find these basic images. Am i doing something wrong or is there any other place for such images?

Thank You very much in advance

ldpubsec

And the mechanism for finding the rights at Google was new to me. I thought maybe it would be new to others as well. Go see. I’ve also added a couple of other sources that I use, I think people would find them handy–such as NHGRI and NIH images.

What’s the Answer? (zoomable browser)

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

This week’s highlighted question hits on something I’ve mused about before. We could really use some more lightweight browsing tools that are more consumer-level appropriate. Although I think there are some tools to accomplish various different things that sophisticated end users could work with. But there’s still a gap, which I still think would be a great project for some student team.

The extra wrinkle on this question, though, is it needs to be able to run without an internet connection.

Question: Any good offline zoomable genome browsers?

More as an educational than a research tool, I’d like to give a presentation on human genetics using a human genome browser that can start at the single-base level, and zoom (smoothly if possible) out to the entire chromosome. The catch is that I won’t have internet access when giving the presentation. I’m happy to download reference sequences, annotations, etc. The main things I’d like to display are (at the fine scale) intron/exon positions, and at the wider scale, gene positions, alu/SINE positions, and (if possible) simple gene repeats such as human MW/LW opsins. I like http://chromozoom.org, but it’s hard to get working offline. I find that http://www.biodalliance.org doesn’t have a terribly nice chromosome-wide view. Are there others that people could suggest?

hyanwong

If you have some suggestions, please bring ‘em along. This question pops up on a somewhat regular basis, and we could really use more ideas on this.

What’s the Answer? (3D structures with mutations)

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

This week’s highlighted question was about visualizing variations in linear graphics as well as in 3D structural representations.

Question: Map genetic mutations to protein domain/structure

I am trying to map genetic mutations to protein domain/structure. Ideally, I want to visualize the variants in linear protein domain diagram and 3D protein structure like the attached images. I did research, but I can’t find good tools/databases for such work.

I know similar questions have been asked here like How To Create Mutation Diagram In R Or In Any Tools?. But it is only the protein domain diagram (with no 3D structure), plus the protein domain annotation there seems to be limited.

Thank you all in advance!

[Graphic over there shows an image of what the original poster wants to visualize]

mittjohns

I had recently talked about Mutation Mapper as another answer to a related question. But at that time I didn’t note that you can also get a 3D structure from there. Glad to see someone mention it as a possible answer on this new question.

What’s the Answer? (Docker, actually…)

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

In kind of an amusing pairing to this weeks Tip-of-the-Week on Docker, I was looking out for good questions/discussions at Biostar when I came across this one. The discussion opens with the idea of curating a bunch of “efficient” bioinformatics programs. This is a worthy exercise in time and resources. The conversation flows around then to include Docker, but also to note that Docker isn’t the right thing in every case potentially. So have a look  at how the idea percolated around–a good demonstration of crowdsourcing something useful for the community.

News: Pandora’s Toolbox – a collection of bioinformatics programs

Hello everyone,

I am developing a toolbox with collection of source codes of efficient bioinformatics programs.

They are available under the condition that you cite the individual authors and not Pandora’s toolbox.

My goal is to develop additional code so that we can mix and match them to solve various problems efficiently.

———————————–

github page -
https://github.com/homologus/Pandoras-Toolbox/

Blog posts -
Introducing ‘Pandora’s Toolbox’ and ‘Pandora’s Modules’
http://www.homolog.us/blogs/blog/2015/01/05/introducing-pandoras-toolbox-and-pandoras-modules/

An Update on Pandora’s Toolbox
http://www.homolog.us/blogs/blog/2015/01/08/an-update-on-pandoras-toolbox/
The following programs are currently included in the collection.

[list of stuff in there, go have a look over there for that set]

ugly.betty77

So, in short, more ideas for using Docker in the genomics software community. Jess’ sayin. And a nice coincidence for my blogging this week.

What’s the Answer? (publishing tool papers)

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

This week’s highlighted forum discussion was really interesting to me. In the original post, there are pros and cons of publishing software tool papers. I think all of these are useful points for discussion. But it was also interesting what others commented and replied on the topic.

Forum: On the utility of publishing a tool paper

I’ve been considering writing an application note for the pyfaidx module (for reading/writing indexed FASTA files), but I’m not sure if the effort involved in authoring and publishing an application note is worth it. Several projects have published their work as application notes, but I’m not sure that a “me too” attitude helps here.

Reasons I can think of for publishing a tool:

  1. Citations. Obviously it’s easier for people to reference your work.
  2. Content discovery. Not everyone knows what they’re searching for, and while GitHub and Google do help here, not everyone is an SEO genius.
  3. Context for usage. Several application notes I’ve seen provide use cases or examples where the tool may provide an advantage.

Downsides:

  1. Time
  2. Publication fees
  3. Danger of producing a stale description of your software. Software development should be motivated by use cases, bugs, and user feedback. All can really change the functionality and interface of software.

Any thoughts about pros/cons of tool publications would help.

Matt Shirley

Go have a look at the discussion in full. A someone who has searched for a lot of software, only to find references to internal personal scripts, broken links and outdated personal web pages in too many cases, I certainly favor publishing in some findable, archived format somewhere. But I don’t think it has to cost much–I don’t care if it was in a traditional journal format. There are ways around that now that would perfectly suffice for these types of smaller utilities or data sets.

What’s the Answer? (FANTOM5 promoter atlas)

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

This week’s highlighted question at Biostar is something that rang a bell. I’ve been meaning to take a look at this resource, but it got buried on my desk under a pile of other stuff and I forgot to get back to it.

Question: FANTOM5 Promoter Atlas

Hi All,

I went through this paper on promoter atlas in FANTOM5 http://www.nature.com/nature/journal/v507/n7493/full/nature13182.html#the-fantom5-promoter-atlas , my question is that do they have cell wise CAGE dataset or is it global , as I cannot see the cell wise CAGE expression on their website.

Thanks in advance.

Aishwarya.Kulkarni

The thing that reminded me about that paper was in the answer–Taylor Raborn noted that this can be found in the ZENBU resources. In both March and in May I started tip-of-the-week posts about ZENBU as I can tell from my draft folder, but other stuff came up. I really have to visit that in the new year. If people could stop developing new resources for a while, I can catch up…? Right, that will happen.

Until I have a chance to get back to it (we have our annual special summary posts over the next two weeks and other stuff already in the hopper for early next year), you’ll have to settle for the ZENBU wiki details on their genome browser.

References:
Forrest A.R.R., Hideya Kawaji, Michael Rehli, J. Kenneth Baillie, Michiel J. L. de Hoon, Vanja Haberle, Timo Lassmann, Ivan V. Kulakovskiy, Marina Lizio, Masayoshi Itoh & Robin Andersson & (2014). A promoter-level mammalian expression atlas, Nature, 507 (7493) 462-470. DOI: http://dx.doi.org/10.1038/nature13182

Severin J., Marina Lizio, Jayson Harshbarger, Hideya Kawaji, Carsten O Daub, Yoshihide Hayashizaki, Nicolas Bertin & Alistair R R Forrest (2014). Interactive visualization and analysis of large-scale sequencing datasets using ZENBU, Nature Biotechnology, 32 (3) 217-219. DOI: http://dx.doi.org/10.1038/nbt.2840

What’s the Answer? (tidy data format)

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

This week’s highlighted post at Biostar is about “tidy data”. Ah, quite the concept. The day when data becomes tidy will be one to celebrate. Anyway, I think it’s a worthwhile discussion to have, and I’m looking forward to the comments as this develops. If you have thoughts, please bring them over there too.

Usually I highlight most of the question here, but this time there are pieces that are too large–examples of format issues–so I’ll just give you the bullet and send you over to Biostar to read the whole thing.

Forum: Principles of Tidy Data (Hadley Wickham) and the VCF format

Hadley Wickham, the author of ggplot and many other popular R packages, has recently published a very good paper regarding the principles of tidy data. This article introduces a new library called tidyr, and also proposes a standard for formatting and organizing data before data analysis.

I personally think that the principles proposed in the article are very good, and that they help a lot in data analysis. Some of these are already adopted by many ggplot2/plyr users, as you need a data frame in a long format in order to produce most of the plots.

My question is whether it would make sense to apply these principles to bioinformatics. In particular, if we look at the VCF format, it fails at least two of the rules mentioned in the paper:

- “3.1. Column headers are values, not variable names”  (because individuals are encoded on distinct columns)

- “3.2. Multiple variables stored in one column” (because each genotype column contains the status of one or more alleles, plus its coverage etc…

For example, let’s take the example from the 4.0 specs of VCF:

[examples here]

[More discussion of the issues within samples, so go read over there]

What do you think? Will we all convert to tidy VCF in the far future?

–Giovanni M Dall’Olio

So, tidy VCF. What do you think? Some people are already musing about it. Discuss over there.

Reference:
Wickham H.W. (2014). Tidy Data, Journal of Statistical Software, 59 (10). http://www.jstatsoft.org/v59/i10