Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

 

What’s the Answer? (vintage bioinformatics)

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

This week’s highlighted post is actually a BLAST from the past. Although it doesn’t directly say so, it came along around the same time as the “Old Bioinformaticians” conversation. It’s what bioinformatics nerds talk about like your grandparents do, talking about walking to school, in the snow, uphill, both ways–you know? And yeah, I contributed.

But it seems that Pierre Lindenbaum turned this into a curation effort to capture some of this history. I think that’s a nice idea. And people will want these kinds of things for talks and papers sometimes, and possibly for teaching the youngsters. So if you have some of these early bioinformatics artifacts, please contribute them over there.

Forum: Vintage / unconventional pictures for Bioinformatics

I’m looking for Vintage / unconventional pictures for Bioinformatics.

Feel free to add an URL to the picture below. If you’re the owner of the picture, tell me if I you allow me to upload the picture on wikipedia commons.

Please, don’t upvote my answers.

See also: Bioinformatic Cartoon

PS:  e.g: do you have a picture of a printed version of the GCG manual somewhere ?

–Pierre Lindenbaum

Go dust off your items, and share some photos.

Video Tip of the Week: PaleoBioDB, for your paleobiology searches

Yeah, I know, it’s not genomics–but it’s the history of life on this planet–right?  The Paleobiology Database has been keeping records of this ancient biology for a while now, and they have some really nice tools to explore the fossil records and resources that have become available. It’s also interesting to me to see the informatics needs of this type of project. It has a lot of overlap with databases of more recent biology, like the GOLD one–they need taxonomy for the organisms, they need literature links–but they have other needs to capture both geographical regions and the layers of time as well.

There are a couple of ways to access the data. When you arrive at the main landing page, you have the choice to “Launch PBDB”, or “Launch Navigator”. PBDB is a “classic” interface, with typical search boxes and query results. Since this is the internet, I used that “quick search” and looked for paleo cats, and found a lot of Felis in there. But that’s not the only way to look around. They have a newer graphical access mechanism that’s called the Navigator. You can use the navigator to search the world, filter for specific items or time periods–but my favorite thing is you can reset the planet to be what it looked like eons ago. This is covered in their intro video that is this week’s Tip of the Week:

They have other videos as well, you can see that they have both this Navigator interface and help with the classic style. Their “apps” offer other types of searches too. You can even search for insect size. Another way to access information is via R. I began to look around at this because David Bapst on Google+ pointed to their new publication announcement (linked below), offering their R package for accessing their underlying data.

According to their publications page, this resource supports a wide range (and copious amount) of research in this field. It was really neat to have a look at a rather different scale of bioinformatics across the time horizon. Check out the Paleobiology Database resources for your fossil needs.

Quick link:

Paleobiology DB: http://paleobiodb.org/

Reference:
Varela S., González-Hernández J., Sgarbi L., Marshall C., Uhen M., Peters S. & McClennen M. (2014). paleobioDB: an R package for downloading, visualizing and processing data from the Paleobiology Database, Ecography, DOI: 10.1111/ecog.01154

Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

What’s the Answer? (openly hate R)

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

Although I highlighted the original post a couple of weeks back, this Bioinformatics nerd “Uses This” series at Biostar has continued to be really informative and sometimes amusing.  I can’t even extract them to give a fair look because there are so many now, you should just go read them all. Not only is it an interesting cross-section of bioinformatics folks on a bunch of different topics and species, there are really good tips on software tools you might want to know about.

But I’ll extract this piece from today’s chat with Pablo because I used it in the click-baity title:

Forum: Pablo Cingolani of snpEff uses this

What do you use to create plots and charts?
I use R for stats, plots and charts. Although I openly hate R because I think it’s one of the least intuitive programming languages in the planet (followed closely by Malbolge and BrainF***)

Heh. But they aren’t all wonky tools either–some great tips on tools like project management or even remote meeting software have come along:

Forum: Hadley Wickham of ggplot and RStudio uses this

What tools/software do not get enough recognition?
Here’s three that I love and not enough people know about:

  • Selectorgadget: if you ever do any web scraping, you will love the way it learns css/xpath selectors based on positive and negative examples.
  • iDoneThis: we use this at RStudio. It’s a great way to keep track of what you’ve achieved, and to see what your colleagues are working on.
  • appear.in: super simple video chat. No logins, just share a link, and the quality is way better than google hangouts.

Really interesting stuff. Go read “Uses This” posts.

Video Tip of the Week: SeqMonk

Always on the lookout for effective visualization tools, I recently came across a series of videos about the SeqMonk software. It’s not software that I had used before, so I wanted to look at the videos, and then try it out. It downloaded quickly, offered me an extensive list of genomes to load up, and then right away I was kicking the tires. And I was impressed. It was easy to locate and explore different regions and the different tracks that were available. And it appears to be very straightforward to load up your own data as well. The video I’ll highlight here is called “Creating Custom Genomes with SeqMonk” which gives a nice intro to their setup.

But they have a whole BabrahamBioinf channel with helpful videos, including a nice short one on how to export graphical representations to use for presentations and publications and such. This is a request I hear a lot from people, and this is a nice guide.

Then I went to look for references for the software to learn more. The group that has developed it–Babraham Bioinformatics–hasn’t published papers specifically on their tools, apparently. They are a services and support group for an institution and not a research group. But they make many of their tools available to the public.

As I’ve noted, though, I really like to get a sense of how people are using the tools, and who is using tools, by looking deeply at the literature. When something has no official citation, it’s harder to assess. And as I’ve pointed out, many papers don’t even cite the tools in the main paper, sometimes it’s in figure legends, or supplements.

A lot of folks have found SeqMonk useful. But it took me 3 different site searches to figure out how useful. I searched at PubMed, PubMedCentral, and Google Scholar. The results were pretty interesting, actually. Just a basic search for SeqMonk yields these differences:

Literature search site number of results
PubMed 1
PubMedCentral 53
Google Scholar 110

The paper in PubMed wasn’t in PubMedCentral, but it was among the 100+ in Google Scholar. Of the 53 in PMC, 2 were absent from Scholar–one had SeqMonk in a figure legend, one had SeqMonk in supplemental procedures. Google Scholar obviously had the biggest range–it also included meeting abstracts, theses, and patent documents, and also a few false positives (from 1840?, 1929, and a couple of other things I couldn’t figure out). Oddly, sometimes the titles differed between PMC and Scholar, but they appeared to be the same paper.  As I’ve noted before, it’s challenging to find out where software is being used, since the way people reference it can be so variable. This was another interesting example of this variability.

But that aside, I was certainly impressed by the various types of data and species that SeqMonk has supported. The variety of species included archaea, chloroplast genome studies, bacteria, ancient maize, yeast, medicinal mushroom mitochondria, zebrafish, and a lot of mammalian research. It has supported a wide range of explorations and topics–lots of epigenetics, PCR techniques, telomere erosion, methylomes of tumors, and even comparison of sequence alignment software. Figure 1 of that aligners paper gives you a nice look at SeqMonk in the wild.

So have a look at the features of SeqMonk for visualization, analysis, and display of existing genomes or your own data. It’s a flexible and effective tool for many purposes.

Quick links:

SeqMonk: http://www.bioinformatics.babraham.ac.uk/projects/seqmonk/

Their video channel: http://www.youtube.com/user/BabrahamBioinf

Their training materials: http://www.bioinformatics.babraham.ac.uk/training.html#seqmonk

Follow them on twitter: https://twitter.com/babraham_bioinf

References:

Chatterjee A., P. A. Stockwell, E. J. Rodger & I. M. Morison (2012). Comparison of alignment software for genome-wide bisulphite sequence data, Nucleic Acids Research, 40 (10) e79-e79. DOI: http://dx.doi.org/10.1093/nar/gks150

Friday SNPpets

Welcome to our Friday feature link collection: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

What’s The Answer? (transmembrane protein dbs)

Biostars is a site for asking, answering and discussing bioinformatics questions and issues. We are members of the Biostars_logo community and find it very useful. Often questions and answers arise at Biostars that are germane to our readers (end users of genomics resources). Every Thursday we will be highlighting one of those items or discussions here in this thread. You can ask questions in this thread, or you can always join in at Biostars.

This week’s highlighted question drew less response than I expected. It’s a good question, and would be of major interest for folks looking for druggable targets. So I figured–yeah, there must be a site that focuses on this. But I couldn’t pull one out of my memory banks. I was hoping someone else would. Any thoughts?

Question: Are there any specialist transmembrane protein databases?

I am working almost exclusively with transmembrane proteins. Are there any databases that specialise in categorising transmembrane proteins. For example by membrane type, number of membrane spanning regions, number of non-polar helices, whether the protein is functional or structural, et cetera.

Good Gravy

Bring an answer over there if you know of one.

Video Tip of the Week: MedGen, GTR, and ClinVar

The terrific folks at NCBI have been increasing their outreach with a series of webinars recently. I talked about one of them not too long ago, and I mentioned that when I found the whole webinar I’d highlight that. This recording is now available, and if you are interested in using these medical genetics resources, you should check this out.

I was reminded of this webinar by a detailed post over at the NCBI Insights blog: NCBI’s 3 Newest Medical Genetics Resources: GTR, MedGen & ClinVar. There’s no reason for me to repeat all of that–I’ll conserve the electrons and direct you over there for more details about the features of these various tools. There is a lot of information in these resources, and the webinar touches on these features and also describes the relationships and differences among them.

I’ve been catching the notice of their webinars by following their Twitter announcements. The next one is coming up on October 15th, announced here, on E-Utilities. Follow them to keep up with the new offerings: @NCBI.

Quick links:

MedGen: http://www.ncbi.nlm.nih.gov/medgen/

GTR, Genetic Testing Registry: http://www.ncbi.nlm.nih.gov/gtr/

ClinVar: http://www.ncbi.nlm.nih.gov/clinvar/

Reference:

Acland A., R. Agarwala, T. Barrett, J. Beck, D. A. Benson, C. Bollin, E. Bolton, S. H. Bryant, K. Canese, D. M. Church & K. Clark & (2013). Database resources of the National Center for Biotechnology Information, Nucleic Acids Research, 42 (D1) D7-D17. DOI: http://dx.doi.org/10.1093/nar/gkt1146