Tag Archives: biostar

Getting help with bioinformatics questions online

There are thousands of bioinformatics databases, servers, algorithms, and apps in the bioinformatics ecosystem. Even though we are immersed in this environment ourselves, it seems that every day there’s something new, and in every workshop we do someone brings us an issue they have which requires some sort of tool that we haven’t explored yet–some species, some data type, or some investigational focus in a field that needs some computational tools.

There are some institutions that have terrific support from a good-sized bioinformatics core group or well-informed science librarians. There are some that have a few go-to folks that can give guidance around their department. But for many people it’s not easy to find what you need locally. In fact, we’ve been stunned sometimes at the lack of support even at some big-name institutions.

Another problem is the way that bioinformatics is growing makes it even more challenging to get appropriate guidance. This is what I mean:

Even when a tool can be identified, documentation and support are frequently missing or obsolete, and often have not anticipated more creative, advanced queries or novel implementations. As an example, the advent of workflow systems like Galaxy and Taverna [1], [2] enables users to weave customized pipelines together that employ multiple and disparate tools and data sources. Because questions and problems relating to workflow systems may cross traditional boundaries and domains of data and software providers, it can be difficult to find guidance from others with relevant experience. Though Taverna’s “MyExperiment” and Galaxy’s “Shared Pages” attempt to address this issue, questions about locating appropriate data, tools, or components beyond the current implementation remain best posed elsewhere. This lack of support can and often does lead to cumbersome bottlenecks during data analysis.

To accomplish some analysis, for example, you might need 5 different databases and tools for mining and running the analysis. But the way that the world is structured you can only get help for one tool at a time at the provider’s site. It’s great to have a tool like Galaxy to access dozens of things you might need. But for each you have to seek out documentation (whose quality may vary) and work it out yourself.

But now there are communities that can offer support across tools and disciplines. One of them in BioStar. We’ve been participating at BioStar for a while now, and each Thursday we highlight a question that might be interesting to our readers.

As BioStar was developing, people were asking how to cite the assistance that they received. And the BioStar participants wanted to also let the wider community know about the site. So an online discussion evolved into a movement to write a paper. A Google doc was provided, and a bunch of us wrote, edited, discussed, generated images, and eventually arrived at a consensus paper. Larry Parnell took the responsibility to work the terrific contributions into a cohesive unit, and voilà!  An article was born. The process was fascinating and fun. Most of us have never met each other, yet we were able to accomplish this in pretty reasonable time.

So have a look at the publication about BioStar to understand how it works and see if it could help you out, or pass it along to students in your classes perhaps. Or if you are a support provider, join us and help us guide others :) .

You might also want to read up on another work that can make your participation in online help communities more effective: Ten Simple Rules for Getting Help from Online Scientific Communities. Some veterans of the forums offer nice advice on how to get the most out of these kinds of interactions.

Visit BioStar: http://biostar.stackexchange.com/

References:
Parnell, L., Lindenbaum, P., Shameer, K., Dall’Olio, G., Swan, D., Jensen, L., Cockell, S., Pedersen, B., Mangan, M., Miller, C., & Albert, I. (2011). BioStar: An Online Question & Answer Resource for the Bioinformatics Community PLoS Computational Biology, 7 (10) DOI: 10.1371/journal.pcbi.1002216

Dall’Olio, G., Marino, J., Schubert, M., Keys, K., Stefan, M., Gillespie, C., Poulain, P., Shameer, K., Sugar, R., Invergo, B., Jensen, L., Bertranpetit, J., & Laayouni, H. (2011). Ten Simple Rules for Getting Help from Online Scientific Communities PLoS Computational Biology, 7 (9) DOI: 10.1371/journal.pcbi.1002202

What’s the answer? Database anomalies

BioStar is a site for asking, answering and discussing bioinformatics questions. We are members of the community and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those questions and answers here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

The question for the week:

Incorrect/unusual entries in main databases (GenBank, UniProt, PDb)? Pierre Poulain asks ” I… advise my students to be cautious with the data they can find in these databases. To illustrate this, I found quite unusual entries in GenBank:..” and he then lists some good ones.

There were several interesting, and funny, answers including one from our own Mary,

My favorite bizarre database item was a PubMed one. This was long before that NCBI ROLF blog was created. I was searching for genes identified in the transition to gray hair. This was not useful….

http://www.ncbi.nlm.nih.gov/pubmed/12079806

This is the TITLE (note, not the abstract):

I am a 64-year-old man, and I’ve always been proud of my perfect health record. I’ve also been proud of my full head of hair, even after the gray started creeping in. Four months ago I caught pneumonia and spent eight days in the hospital (three in intensive care). It took a while, but I’m finally back to normal – except that my hair is falling out. It comes out in clumps when I shampoo or even comb it, and it’s gotten noticeably thin all over. I remember reading about Propecia in your newsletter but I don’t have the old issue. Should I try the medication?

Check out the other answers for good examples as to why the researcher should always double-check the data.

What would bioinformatics professionals do with their personal genome? “I simply don’t want to know.”

Over the long holiday weekend I noticed an interesting item in my twitter feed. A number of people were pointing to the post entitled: My Genome Via E-mail by David Ewing Duncan. Some of you may be familiar with David’s writing and his big project called “Experimental Man“.  He has been exploring all sorts of biomedical tests and investigations about his body, making him probably the case of personalized medicine with the most depth at this point.

Well, he has also taken to genomics as part of this, of course. And now he’s one of the people in the Personal Genome Project and has his full genome sequence in hand. Well, sort of. He has it, but he’s asking for guidance on what to do with it:

This is an appeal: Send me you ideas for how best to interpret my newly sequenced complete genome!

Now, as an exercise over a year ago I thought this through. I have no expectation of having my genome any time soon–but it’s a question people ask me and I thought it was fun to think about. I reviewed that post the other day and I still think that’s what I’d do:

1) Assessment and QC

2) Build a personal genome browser with various tracks, including a literature track for personally curating stuff interesting or relevant to me

3) Look closer at specific medically relevant genes. I know this is looking under the flashlight, but the most knowledge and anything actionable would probably be in this set. I’d also look specifically into family issues (like that allergy/eczema stuff I found in my 23andMe data) and try to learn things there.

But I also thought I’d like to know what some of my peers in bioinformatics/genomics would do. As you may know if you follow this blog, we participate in discussions at BioStar. The participants here are active in genomics research around the world, and they are super-users of the tools of art in this field. Who better to ask? So I posted a question asking what they would do with their personal genome sequence. I offered my skeletal workflow as an example, and expected some thoughts on what they would do.

What would you do with your personal genome data? is my question over there.

To my surprise, the top rated answer at this time says this:

I may be in minority but I’ll say this: right now I simply don’t want to know – Did you ever notice how genomic variation never correlates with good news. It seems there is only bad news. There are no SNPs for happiness, friendship or love….

Um. Ok. That’s one way to approach this. I was surprised, really–I didn’t expect the question to become philosophical. I really wanted a workflow.

For most of the weekend the second rated answer was this:

Whatever you do with it should be up to you to decide, use it for your personalized medicine if you wish. So my sole recommendation is: Keep the data private and well protected and encrypted! Decide in an informed way, whom you grant access to them….

There were real concerns about the security and misuse of this data.

There are a couple of other interesting answers as well. I have to say it was fascinating. It wasn’t what I expected–but it was illuminating for me. I haven’t always been the most enthusiastic participant in the personal genomics debate, as I have real concerns about security and misuse of the information, and the current utility. But it’s certainly coming whether we are ready or not, and I really wanted to know what people would do with it in a concrete way if they had it. And I thought bioinformatics/genomics professionals would have the best leads on this.

“right now I simply don’t want to know”

I’m considering adding a bounty to my question over there. You can add some of your own points to the question for encouragement to obtain an answer. And I’ll still be the highest ranked identified female over there–so I can afford the points.

If you have some thoughts and want to join BioStar, and if you give me a decent workflow, I may award the points to you!  Anyone? Bueller? Bueller? Anyone?

What’s the Answer: genes implicated in…

BioStar is a site for asking, answering and discussing bioinformatics questions. We are members of the community and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those questions and answers here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

Question of the week: This weeks’ question was specific, 

I’m looking for a database for genes implicated in lymphoid and myeloid development

Short answer is there isn’t any (that I or anyone can find), but two answers as of this posting do give good methods to find what the question is looking for.

We’ve developed a text-mining approach, called GETM (Gene Expression Text Miner) to associate genes with anatomical locations based on tagging of gene names and species-specific anatomy ontologies that might help with your problem.

What’s The Answer? Open Thread (Genotype/Penetrance Function)

BioStar is a site for asking, answering and discussing bioinformatics questions. We are members of the community and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those questions and answers here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

Question of the Week:

How do I compute the proportion of cases/controls for each genotype from the penetrance function?

The question isn’t actually about a specific resource, but since the answer is clear and we do deal with a lot of genotype resources, I thought this was applicable to this thread.

What’s the Answer? Open Thread (GWAS genotyping)

BioStar is a site for asking, answering and discussing bioinformatics questions. We are members of the community and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those questions and answers here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

Question of the Week:

How much of the genome is captured by a GWAS?

Two great answers to this questions, a quote from the first one. Click the above link for more.

Human genome encodes 1 SNP/100-300bp; ~3GB sequence ~10million SNPs. It is impossible to analyze such a large number of data due to several limiting factors. To deal with this issue we can use Linkage Disequilibrium (LD) mapping (See section on D’, recombination rate), HaplotypeHaplotype blocks and Haplotype Tag SNPs (tagSNPs). (Read about HapMap project here). Instead of genotyping all the 10M SNPs we can genotype tagSNPs in a haplotype block. This is a representative SNP in a given region of genome with high LD. This will enable to find genetic variation without genotyping all the 10M SNPs. Previous studies indicated that genotyping chips with .5M-1M SNPs will be sufficient for a good GWAS.

What’s the Answer? Open Thread (Galaxy servers)

BioStar is a site for asking, answering and discussing bioinformatics questions. We are members of thecommunity and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those questions and answers here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

Question of the Week:

…I was wondering if there are any other known Galaxy servers out there …

The question actually goes on to list several Galaxy servers that are publicly available. There is a growing list in the answers to. Some of them, like the NBIC Galaxy Server (proteomics) have additional tools that you might not be able to find at the main Galaxy server.

What’s the Answer? Open Thread (gene networks)

BioStar is a site for asking, answering and discussing bioinformatics questions. We are members of thecommunity and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those questions and answers here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

Question of the week:

I have 10 genes linked to a particular disease (for the sake of example say cancer).
I want to build a gene network for these 10 genes.
Any web based tools available which can do the job?

I immediately thought of GeneMania (publicly available OpenHelix tutorial) and STRING (tutorial, by subscription) . The best answer beat me too it, with a lot of other excellent tools. Go check it out.

 

 

 

 

What’s the Answer: Open Thread (NGS Tools)

BioStar is a site for asking, answering and discussing bioinformatics questions. We are members of the
community and find it very useful. Often questions and answers arise at BioStar that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those questions and answers here in this thread. You can ask questions in this thread, or you can always join in at BioStar.

Question of the week:

Now we are analysing NGS data, and I wonder if you know some collections of bioinformatics tools which can help me (like biopieces).

There were a few good answers with a good suggestion of lists of tools for analysis and preparation of Next-Generation Sequencing tools. Here’s one answer, click the link above for the rest:

(after some advice about mastering scripting language and unix commands…)

  1. learn the most used bioinformatics tools. e.g.:
    • one (or preferably many) short-read aligners
    • samtools
    • an NGS viewer (IGV is a good one to start)
    • Bedtools
    • A means to view and filter your NGS reads
    • and certainly many others depending on your specific focus.
  2. Then start to learn some of the common data repositories. e.g.:

 

Real bioinformaticians write code, real scientists…

Just over a week ago, Neil Saunders wrote a post I agreed with: Real bioinformaticians write code. The post was in response to a tweet conversation that started:

Many #biostar questions begin “I am looking for a resource..”. The answer is often that you need to code a solution using the data you have.

He’s right, and that’s very true for bioinformaticists to whom he’s talking. My concern is for the rest of biological researchers. He states in the post:

In other words: know the data sources, know the right tools and you can always sculpt a solution for your own situation.

This is very true and I whole heartedly agree. So many solutions exist already in thousands of databases and analysis tools. It’s what we do here at OpenHelix, help experimental biologists, genomics researchers and bioinformaticists find the right data sources and tools and then go and “sculpt a solution for their situation.”

In the last part of my comment,

BioMart, UCSC Genome Browser, Galaxy, etc, etc are excellent tools and data sources and could probably answer about 80% of most posed questions :). But my caveat would be that knowing the data sources and right tools can be a bit of a daunting task.

And it is, despite the somewhat dismissive response :). We’ve all seen the graphs, exponentially rising amounts of data over time. It’s an issue as the Chronicle of Higher Education article title states:

Dumped on by Data: Scientists Say a Deluge is Drowning Research

The journal Science also had an entire 10 article section on the issue. It’s not a problem that will go away.

Along with that deluge of data, has come a deluge of databases and data analysis tools (created for the most part by bioinformaticists!), many of which _alone_ are quite daunting to find the right data and tool within. There are thousands such databases and tools. I’ve lost count.

Neil Saunders is correct. The solution is out there, find the right tools and data, sculpt a solution. He responds to my comment with “Learning what you need to know in bioinformatics can certainly be daunting. But then, science isn’t for for the easily daunted :-).” In other words, “if you are daunted, you aren’t a scientist?”

We give workshops to researchers around the world from Singapore to the US to Morocco and at institutions as varied as Harvard, Stanford, University of Missouri, Mt. Sinai, Stowers and Hudson-Alpha. The researchers we’ve given workshops and answered questions from were also varied, developmental biologists, evolutionary, medical researchers, bioinformaticists, researchers quite well versed in genomics and those not.

The overriding theme is finding and knowing the data and the tools is not only daunting, but sometimes not possible. Not because they don’t exist, but because finding and knowing them is a drain of personal and lab resources considering the shear growing field of things to find and know. I refer you to the Chronicle article… drowning in data..

They are real scientists not easily daunted, but daunted just the same, by what’s in front of them. And yes, many of those specific questions to specific research needs can be answered by existing tools. We come across many questions on Biostar that a well-crafted database search or analysis step will answer beautifully, without the need for reinventing the wheel with more code (and the answers are often code).

I suspect that most of those scientists out there who call themselves ‘bioinformaticists” should have a grasp of the tools and databases available to them (but I can tell you, even the brightest of them don’t sometimes). So, the advice and final words of the linked blog post above…

In other words: know the data sources, know the right tools and you can always sculpt a solution for your own situation…. real bioinformaticists write code

Yes, real bioinformaticists write code, but this advice is insufficient to the other 90% of real scientists who don’t. Perhaps Biostar is not the solution (I suspect a lot of those questions being asked he points out are those by non-bioinformaticists who only have a basic, if any, knowledge of coding nor access to those who do). Perhaps it, or something like it, can be.