Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

What’s the Answer? (openly hate R)

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

Although I highlighted the original post a couple of weeks back, this Bioinformatics nerd “Uses This” series at Biostar has continued to be really informative and sometimes amusing.  I can’t even extract them to give a fair look because there are so many now, you should just go read them all. Not only is it an interesting cross-section of bioinformatics folks on a bunch of different topics and species, there are really good tips on software tools you might want to know about.

But I’ll extract this piece from today’s chat with Pablo because I used it in the click-baity title:

Forum: Pablo Cingolani of snpEff uses this

What do you use to create plots and charts?
I use R for stats, plots and charts. Although I openly hate R because I think it’s one of the least intuitive programming languages in the planet (followed closely by Malbolge and BrainF***)

Heh. But they aren’t all wonky tools either–some great tips on tools like project management or even remote meeting software have come along:

Forum: Hadley Wickham of ggplot and RStudio uses this

What tools/software do not get enough recognition?
Here’s three that I love and not enough people know about:

  • Selectorgadget: if you ever do any web scraping, you will love the way it learns css/xpath selectors based on positive and negative examples.
  • iDoneThis: we use this at RStudio. It’s a great way to keep track of what you’ve achieved, and to see what your colleagues are working on.
  • appear.in: super simple video chat. No logins, just share a link, and the quality is way better than google hangouts.

Really interesting stuff. Go read “Uses This” posts.

Video Tip of the Week: SeqMonk

Always on the lookout for effective visualization tools, I recently came across a series of videos about the SeqMonk software. It’s not software that I had used before, so I wanted to look at the videos, and then try it out. It downloaded quickly, offered me an extensive list of genomes to load up, and then right away I was kicking the tires. And I was impressed. It was easy to locate and explore different regions and the different tracks that were available. And it appears to be very straightforward to load up your own data as well. The video I’ll highlight here is called “Creating Custom Genomes with SeqMonk” which gives a nice intro to their setup.

But they have a whole BabrahamBioinf channel with helpful videos, including a nice short one on how to export graphical representations to use for presentations and publications and such. This is a request I hear a lot from people, and this is a nice guide.

Then I went to look for references for the software to learn more. The group that has developed it–Babraham Bioinformatics–hasn’t published papers specifically on their tools, apparently. They are a services and support group for an institution and not a research group. But they make many of their tools available to the public.

As I’ve noted, though, I really like to get a sense of how people are using the tools, and who is using tools, by looking deeply at the literature. When something has no official citation, it’s harder to assess. And as I’ve pointed out, many papers don’t even cite the tools in the main paper, sometimes it’s in figure legends, or supplements.

A lot of folks have found SeqMonk useful. But it took me 3 different site searches to figure out how useful. I searched at PubMed, PubMedCentral, and Google Scholar. The results were pretty interesting, actually. Just a basic search for SeqMonk yields these differences:

Literature search site number of results
PubMed 1
PubMedCentral 53
Google Scholar 110

The paper in PubMed wasn’t in PubMedCentral, but it was among the 100+ in Google Scholar. Of the 53 in PMC, 2 were absent from Scholar–one had SeqMonk in a figure legend, one had SeqMonk in supplemental procedures. Google Scholar obviously had the biggest range–it also included meeting abstracts, theses, and patent documents, and also a few false positives (from 1840?, 1929, and a couple of other things I couldn’t figure out). Oddly, sometimes the titles differed between PMC and Scholar, but they appeared to be the same paper.  As I’ve noted before, it’s challenging to find out where software is being used, since the way people reference it can be so variable. This was another interesting example of this variability.

But that aside, I was certainly impressed by the various types of data and species that SeqMonk has supported. The variety of species included archaea, chloroplast genome studies, bacteria, ancient maize, yeast, medicinal mushroom mitochondria, zebrafish, and a lot of mammalian research. It has supported a wide range of explorations and topics–lots of epigenetics, PCR techniques, telomere erosion, methylomes of tumors, and even comparison of sequence alignment software. Figure 1 of that aligners paper gives you a nice look at SeqMonk in the wild.

So have a look at the features of SeqMonk for visualization, analysis, and display of existing genomes or your own data. It’s a flexible and effective tool for many purposes.

Quick links:

SeqMonk: http://www.bioinformatics.babraham.ac.uk/projects/seqmonk/

Their video channel: http://www.youtube.com/user/BabrahamBioinf

Their training materials: http://www.bioinformatics.babraham.ac.uk/training.html#seqmonk

Follow them on twitter: https://twitter.com/babraham_bioinf

References:

Chatterjee A., P. A. Stockwell, E. J. Rodger & I. M. Morison (2012). Comparison of alignment software for genome-wide bisulphite sequence data, Nucleic Acids Research, 40 (10) e79-e79. DOI: http://dx.doi.org/10.1093/nar/gks150

Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

What’s The Answer? (transmembrane protein dbs)

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

This week’s highlighted question drew less response than I expected. It’s a good question, and would be of major interest for folks looking for druggable targets. So I figured–yeah, there must be a site that focuses on this. But I couldn’t pull one out of my memory banks. I was hoping someone else would. Any thoughts?

Question: Are there any specialist transmembrane protein databases?

I am working almost exclusively with transmembrane proteins. Are there any databases that specialise in categorising transmembrane proteins. For example by membrane type, number of membrane spanning regions, number of non-polar helices, whether the protein is functional or structural, et cetera.

Good Gravy

Bring an answer over there if you know of one.

Video Tip of the Week: MedGen, GTR, and ClinVar

The terrific folks at NCBI have been increasing their outreach with a series of webinars recently. I talked about one of them not too long ago, and I mentioned that when I found the whole webinar I’d highlight that. This recording is now available, and if you are interested in using these medical genetics resources, you should check this out.

I was reminded of this webinar by a detailed post over at the NCBI Insights blog: NCBI’s 3 Newest Medical Genetics Resources: GTR, MedGen & ClinVar. There’s no reason for me to repeat all of that–I’ll conserve the electrons and direct you over there for more details about the features of these various tools. There is a lot of information in these resources, and the webinar touches on these features and also describes the relationships and differences among them.

I’ve been catching the notice of their webinars by following their Twitter announcements. The next one is coming up on October 15th, announced here, on E-Utilities. Follow them to keep up with the new offerings: @NCBI.

Quick links:

MedGen: http://www.ncbi.nlm.nih.gov/medgen/

GTR, Genetic Testing Registry: http://www.ncbi.nlm.nih.gov/gtr/

ClinVar: http://www.ncbi.nlm.nih.gov/clinvar/

Reference:

Acland A., R. Agarwala, T. Barrett, J. Beck, D. A. Benson, C. Bollin, E. Bolton, S. H. Bryant, K. Canese, D. M. Church & K. Clark & (2013). Database resources of the National Center for Biotechnology Information, Nucleic Acids Research, 42 (D1) D7-D17. DOI: http://dx.doi.org/10.1093/nar/gkt1146

Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

What’s The Answer? (what do bioinformatics folks use?)

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

This week’s highlighted item from Biostars is actually one post that was the first of a new series. Inspired by the “Uses This” via The Setup, an interview offers a quick look at what a variety of folks use to do their jobs, Istvan started asking bioinformatics professionals what tools they use for their work. And some other bonus questions.

The first in the series was Jim Robinson of IGV. But since then a number of others have been added (you can follow them with the tag or see the list underneath the first one). Istvan is also welcoming other folks to submit the answers if you want to share what you are up to, and how you get there.

Forum: Jim Robinson of the Integrative Genomics Viewer (IGV) uses this

Based on user suggestion we launch series of posts based on ideas promoted by the Uses This website.

How are the tools that we use every day being developed? What do bioinformaticians with proven track record use to get their work done?

I have sent out a few emails and I will start posting answers as they come in. Feel free to send me candidates (or volunteer) for the interviews.

[The list of questions]

What hardware do you use?

What is your text editor?

What software do you use for your work?

What do you use to create plots and charts?

What do you consider the best language to do bioinformatics with?

What bioinformatics tools/software do not get enough recognition?

[Go over to Biostars to read Jim's answers]

Istvan Albert

Interesting stuff. And more to come. Keep checking.

Video Tip of the Week: UCSC #Ebola Genome Portal

Although I had other tips in my queue already, over the last week I’ve seen a lot of talk about the new Ebola virus portal from the UCSC Genome Browser team. And it struck me that researchers who have worked primarily on viral sequences may not be as familiar with the functions of the UCSC tools. So I wanted to do a tip with an overview for new folks who may not have used the browser much before.

There is great urgency in understanding the Ebola virus, examining different isolates, and developing possible interventions to help tackle this killer. Jim Kent was made aware of the CDC’s concerns from his sister–who edits the CDC’s “Morbidity and Mortality Weekly Report”, according to this story:

“It wasn’t until talking to Charlotte that I realized this one was special,” Jim Kent said. “It had broken out of the containments that had worked previously, and really, if a good response wasn’t made, the entire developing world was at risk.”

Jim Kent redirected his team of 30 genome analysts to devote all resources toward developing the Ebola genome. They worked through the night for a week to develop a map for other scientists to determine where on the virus to target treatment.

So the folks at UCSC have created a portal where you can explore the sequence information and variations among different isolated strains, annotations about the features of the genes and proteins, and they even added a track for the Immune Epitope Database (IEDB, which happened to be a video tip not long ago)–where antibodies have been shown to bind Ebola protein sequences. The portal also provides links to publications and further research related to these efforts.

The reference sequence that forms the framework for the browser is a sample from Sierra Leone: http://www.ncbi.nlm.nih.gov/nuccore/KM034562.1 It was isolated from a patient  this past May, and I don’t see a publication attached to it–the submission is from the Broad’s Viral Hemorrhagic Fever Consortium. There are more details and thanks to the Pardis Sabeti lab for the sequence, you can read in the announcement email. So, as we keep seeing, we need to have access to the data long before publications become available. The work happens in the databases now, we can’t wait for traditional publishing.

In a side note, I also just learned that the NLM (National Library of Medicine) has a disaster response function, and they have a special Ebola section now because of the needs: Ebola Outbreak 2014: Information Resources. And for more of Jim Kent’s thoughts on Ebola, check out the blog that the UCSC folks have just started: 2014 Ebola Epidemic.

The goal of this tip was to provide an overview of the layout and features for folks who might be new to the UCSC software ecosystem. If you already know how to use it, it won’t be new to you. But if you are interested in getting the most out of the UCSC tools, you can also explore our longer training videos. UCSC has sponsored us to provide free online training materials on the existing tools, and this portal is based on the same underlying software. So you can go further, including using the Table Browser for queries beyond just browsing, if you learn the basics that we cover in the longer suites.

Quick links:

Ebola virus portal at UCSC: http://www.genome.ucsc.edu/ebolaPortal/

UCSC browser intro training: http://www.openhelix.com/ucscintro

UCSC advanced training: http://www.openhelix.com/ucscadv

Reference:

Karolchik D., G. P. Barber, J. Casper, H. Clawson, M. S. Cline, M. Diekhans, T. R. Dreszer, P. A. Fujita, L. Guruvadoo, M. Haeussler & R. A. Harte & (2013). The UCSC Genome Browser database: 2014 update, Nucleic Acids Research, 42 (D1) D764-D770. DOI: http://dx.doi.org/10.1093/nar/gkt1168