﻿<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The OpenHelix Blog &#187; software testing</title>
	<atom:link href="http://blog.openhelix.eu/?tag=software-testing&#038;feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://blog.openhelix.eu</link>
	<description>at OpenHelix</description>
	<lastBuildDate>Fri, 24 May 2013 13:06:19 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Sage Bioinformatics Advice, But&#8230;</title>
		<link>http://blog.openhelix.eu/?p=7912</link>
		<comments>http://blog.openhelix.eu/?p=7912#comments</comments>
		<pubDate>Wed, 27 Apr 2011 16:43:10 +0000</pubDate>
		<dc:creator>Jennifer</dc:creator>
				<category><![CDATA[General Science]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[learning]]></category>
		<category><![CDATA[protein function]]></category>
		<category><![CDATA[software testing]]></category>
		<category><![CDATA[tutorials]]></category>
		<category><![CDATA[variation]]></category>

		<guid isPermaLink="false">http://blog.openhelix.eu/?p=7912</guid>
		<description><![CDATA[Bioinformatics analysis is a powerful technique applicable to a wide variety of fields, and the subject of many a blog post here at OpenHelix. I&#8217;ve had two particular bioinformatics articles on my desk for a couple of months now, waiting for me to be able to articulate my thoughts on them. They both offer great [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://blog.openhelix.eu/wp-content/uploads/2011/04/sage_advice_3.jpg"><img class="alignleft size-medium wp-image-8181" title="sage_advice_3" src="http://blog.openhelix.eu/wp-content/uploads/2011/04/sage_advice_3-300x217.jpg" alt="" width="300" height="217" /></a>Bioinformatics analysis is a powerful technique applicable to a wide variety of fields, and the subject of many a blog post here at OpenHelix. I&#8217;ve had two particular bioinformatics articles on my desk for a couple of months now, waiting for me to be able to articulate my thoughts on them. They both offer great information about their particular area of interest &#8211; predicting either SNV impacts or protein identities &#8211; and sage bioinformatics advice.</p>
<p>The first article &#8220;<a href="http://bioinformatics.oxfordjournals.org/content/27/4/441" target="_blank">Using bioinformatics to predict the functional impact of SNVs</a>&#8221; is a great review of bioinformatics techniques for picking out functionally important single nucleotide variants (SNVs, also sometimes variously referred to as SNPs or Small, Simple or Single Nucleotide Polymorphisms) from the millions of candidate variants being identified everyday. In the introduction the authors do a great job of explaining the many ways in which SNVs can have an impact, as well as how these basic philosophies of impact can be used for bioinformatics analyses. The paper then goes on to describe both classic and bioinformatics techniques for predicting the impact of such variations. It is a phenomenal read for the list of resources alone, with many valuable and important algorithms and resources mentioned.  We&#8217;ve got tutorials (<a href="http://www.openhelix.eu//cgi/tutorialInfo.cgi?id=116" target="_blank">ENCODE</a>, <a href="http://www.openhelix.com/OMIM" target="_blank">OMIM</a>, <a href="http://www.openhelix.eu//cgi/tutorialInfo.cgi?id=27" target="_blank">the UCSC Genome Browser</a>, <a href="http://www.openhelix.eu//cgi/tutorialInfo.cgi?id=77" target="_blank">UniProtKB</a>, <a href="http://www.openhelix.eu//cgi/tutorialInfo.cgi?id=122" target="_blank">Blosum and PAM</a>, <a href="http://www.openhelix.eu//cgi/tutorialInfo.cgi?id=47">HGMD</a>,  <a href="http://www.openhelix.eu//cgi/tutorialInfo.cgi?id=37">JASPAR</a>, <a href="http://www.openhelix.eu//cgi/tutorialInfo.cgi?id=54" target="_blank">Principal Components Analysis</a>, <a href="http://www.openhelix.eu//cgi/tutorialInfo.cgi?id=111" target="_blank">relative entropy</a>, <a href="http://www.openhelix.eu//cgi/tutorialInfo.cgi?id=45" target="_blank">SIFT score</a>, <a href="http://www.openhelix.eu//cgi/tutorialInfo.cgi?id=81">TRANSFAC</a>, ) and blog posts (the <a href="../?p=670">Catalog of Published Genome-Wide  Association Studies</a>) describing many of the same resources. In fact this paper inspired at least one of our weekly posted tips (<a href="http://blog.openhelix.eu/?p=6658">Tip of the Week: SKIPPY predicting variants w/ splicing affects</a>). The paper then goes on to a &#8220;BUYER BEWARE&#8221; section that offers some sage advice &#8211; know the weaknesses, assumptions, and of the resources you use for your predictions.</p>
<p>The second article is an open access article from BioTechniques entitled &#8220;<a href="http://www.biotechniques.com/news/biotechniquesNews/biotechniques-312015.html" target="_blank">Mistaken identities in proteomics</a>&#8220;. It offers a romp through the history of mass spectrometry (MS) technology and rising standards for documenting techniques used for protein identification in journals. The article also concludes with sage bioinformatics advice, including this quote:</p>
<blockquote><p>Proteomic researchers should be able to answer key questions, according to Giddings. “What are you actually getting out of a search engine?” she says. “When can you believe it? When do you need to validate?”</p></blockquote>
<p>Both papers suggest that researchers who wish to use bioinformatics resources in their research should investigate the theoretical underpinnings and assumptions of each tool before deciding on a tool to use, and then should go at every analysis with a level of disbelief in the tool results. That just sounds like common sense, and makes good theoretical advice.</p>
<p>HOWEVER, the level of investigation that is required to truly know each tool and algorithm is prohibitively huge. As for me, my &#8220;practical&#8221; suggestion for researchers is a bit of a &#8220;filtering shortcut&#8221;. Before diving into all the publications on all possible tools, just spend a few minutes with some documentation &#8211; the resource&#8217;s FAQ, or an <a href="http://www.openhelix.eu//cgi/tutorials.cgi" target="_blank">intro tutorial</a> &#8211; we&#8217;ve got a few we can offer you <img src='http://blog.openhelix.eu/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  &#8211; to get an idea of what the tool is about &amp; what you might be able to get from it. Once you&#8217;ve got a general idea of how to approach the resource  begin &#8220;banging&#8221; on it lightly. An initial kick the tires test of an algorithm, database, or other resource can be as easy as keeping a &#8220;test set&#8221; on hand at all times &amp; running it through any new tool you want to use. Make sure that the set includes a partial list of some very well known proteins/pathways/SNPs/etc. (whatever you work on &amp; will be interested in analyzing) and that it has some of your fields &#8216;flukes&#8217;. Think about what you expect to get back from your set. Then run your tester set through any new tool you are considering using in your research, and look at your results &#8211; are they what you know they should be? Can they handle the flukes, or do they break? As an example, when I approach a new protein interaction resource, I&#8217;ll use a partial parts list for some aspect of the yeast cell cycle, and include one or two of the hyphenated gene names. If the tool is good, I get a completed list with no bogging on the &#8220;weird&#8221; names. If it bogs, I know the resource may not be 100% worked out for yeast &amp; may have issues with other species as well. If the full list of interactors comes back with a bunch of space-junk proteins I begin investigating what data is included in the resource and if settings can be tweaked to get better answers. Then, if things still look promising with the tool, I am likely to dig deep into the literature, etc. for the tool &#8211; just to be sure &#8211; because the authors of these articles are absolutely right, chasing false leads is expensive, frustrating &amp; time consuming. It is amazing how many lemons &amp; jalopies you can weed out with a 5 minute bioinformatics tire kick! <img src='http://blog.openhelix.eu/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>I also don&#8217;t think the responsibility should solely be on the back of each end user &#8211;  the resource developer does have some responsibility for making their  tool rigorous and for accurately representing its capabilities in publications and documentation. Calls  for open source code can help improve some bioinformatics tools, so can  education &amp; outreach &#8211; but that discussion will have to wait for another day&#8230;<br />
<strong></strong></p>
<p><span style="text-decoration: underline;"><strong>References</strong>:</span></p>
<ul>
<li><span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Bioinformatics&amp;rft_id=info%3Adoi%2F10.1093%2Fbioinformatics%2Fbtq695&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Using+bioinformatics+to+predict+the+functional+impact+of+SNVs&amp;rft.issn=1367-4803&amp;rft.date=2010&amp;rft.volume=27&amp;rft.issue=4&amp;rft.spage=441&amp;rft.epage=448&amp;rft.artnum=http%3A%2F%2Fwww.bioinformatics.oxfordjournals.org%2Fcgi%2Fdoi%2F10.1093%2Fbioinformatics%2Fbtq695&amp;rft.au=Cline%2C+M.&amp;rft.au=Karchin%2C+R.&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CBioinformatics">Cline, M., &amp; Karchin, R. (2010). Using bioinformatics to predict the functional impact of SNVs <span style="font-style: italic;">Bioinformatics, 27</span> (4), 441-448 DOI: <a rev="review" href="http://dx.doi.org/10.1093/bioinformatics/btq695">10.1093/bioinformatics/btq695</a></span></li>
<li><span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=BioTechniques&amp;rft_id=info%3Aother%2Fhttp%3A%2F%2Fwww.biotechniques.com%2Fnews%2FbiotechniquesNews%2Fbiotechniques-312015.html&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Mistaken+identities+in+proteomics&amp;rft.issn=&amp;rft.date=2011&amp;rft.volume=&amp;rft.issue=&amp;rft.spage=&amp;rft.epage=&amp;rft.artnum=http%3A%2F%2Fwww.biotechniques.com%2Fnews%2FbiotechniquesNews%2Fbiotechniques-312015.html&amp;rft.au=Vincent+Shen&amp;rfe_dat=bpr3.included=1;bpr3.tags=Biology%2CBioinformatics">Vincent Shen (2011). Mistaken identities in proteomics <span style="font-style: italic;">BioTechniques</span> Other: <a rev="review" href="http://www.biotechniques.com/news/biotechniquesNews/biotechniques-312015.html">http://www.biotechniques.com/news/biotechniquesNews/biotechniques-312015.html</a></span></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://blog.openhelix.eu/?feed=rss2&#038;p=7912</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Software testing in bioinformatics</title>
		<link>http://blog.openhelix.eu/?p=33</link>
		<comments>http://blog.openhelix.eu/?p=33#comments</comments>
		<pubDate>Thu, 03 Jan 2008 15:55:31 +0000</pubDate>
		<dc:creator>Mary</dc:creator>
				<category><![CDATA[General Science]]></category>
		<category><![CDATA[Genomics Research]]></category>
		<category><![CDATA[MGI]]></category>
		<category><![CDATA[mouse]]></category>
		<category><![CDATA[software testing]]></category>

		<guid isPermaLink="false">http://www.openhelix.com/blog/?p=33</guid>
		<description><![CDATA[This post at Bioinformatics Zen (Why data testing is important in computational research) got me thinking about the software testing I have done in the past for various databases. I don&#8217;t actually write code but I have worked closely with programmers in various situations. Bringing the knowledge of the biology to the software development team [...]]]></description>
				<content:encoded><![CDATA[<p>This post at <a href="http://www.bioinformaticszen.com/" target="_blank"><font color="#808080"><strong>Bioinformatics Zen</strong></font></a> (<a href="http://www.bioinformaticszen.com/2007/12/why-data-testing-is-important-in-computational-research/" rel="bookmark" title="Permanent link to 'Why data testing is important in computational research'">Why data testing is important in computational research</a>) got me thinking about the software testing I have done in the past for various databases.  I don&#8217;t actually write code but I have worked closely with programmers in various situations.  Bringing the knowledge of the biology to the software development team has been really fun in some cases&#8211;trying to explain <strong><font color="#808080"><em>why </em></font></strong>the data should or shouldn&#8217;t be represented a certain way challenges your own understanding of the data.</p>
<p>I&#8217;m not going to share all of my secrets of the <strong><font color="#808080">software testing wizardry</font></strong> I have done <em>(I think you should hire me to test <img src='http://blog.openhelix.eu/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' />  it is one of my favorite things to do)</em>, but here&#8217;s one that I have used more than once:</p>
<p><span id="more-33"></span><strong><font color="#808080">giant lists of stuff </font></strong>are great for finding odd characters and constructions that break software.  One of my favorite sources is here: <a href="ftp://ftp.informatics.jax.org/pub/reports/index.html" target="_blank">MGI Data and Statistical Reports</a>.   The <a href="http://www.informatics.jax.org/" target="_blank">Mouse Genome Informatics</a> team has been building databases for decades and has a wide range of data types available.</p>
<p>Let&#8217;s say, for example, your database has genes in it.  Not that hard to imagine in bioinformatics tools, of course.  But what happens when you search for the nonagouti gene?  What is the symbol for nonagouti?  <a href="http://www.informatics.jax.org/javawi2/servlet/WIFetch?page=markerDetail&amp;key=3" target="_blank">a</a>.  Just <font color="#000000"><strong>a</strong></font>.  Sounds simple.  But it can be remarkably hard to find!</p>
<p>That&#8217;s just one example of the things I think about when testing software.  I also have had to be sure that superscripts can be represented correctly (those knockout mouse strains have rather tricky official nomenclature).   Check out the <a href="ftp://ftp.informatics.jax.org/pub/reports/MGI_PhenotypicAllele.rpt" target="_blank">phenotypic alleles list</a>.  Can your software deal with the dashes, the slashes, the parenthesis, the superscripts, and the length of those terms these<a href="http://www.informatics.jax.org/javawi2/servlet/WIFetch?page=alleleDetail&amp;key=37006" target="_blank"></a>?</p>
<p><a href="http://www.informatics.jax.org/javawi2/servlet/WIFetch?page=alleleDetail&amp;key=37006" target="_blank"><font class="enhance"> 		     		          A630077B13Rik<sup>Gt(IRESBetageo)653Lex</sup>  		               </font></a></p>
<p><a href="http://www.informatics.jax.org/javawi2/servlet/WIFetch?page=alleleDetail&amp;key=6879" target="_blank"><font class="enhance"> 		     		          Aanat<sup>C57BL/6J</sup>  		               </font></a></p>
<p><a href="http://www.informatics.jax.org/javawi2/servlet/WIFetch?page=alleleDetail&amp;key=54999" target="_blank"><font class="enhance"> 		     		          Runx2<sup>Tn(pb-ZG-s)1.1Mrc</sup>  		               </font></a></p>
<p><font class="label"> </font>                  <a href="http://www.informatics.jax.org/javawi2/servlet/WIFetch?page=alleleDetail&amp;key=3684" target="_blank"><font class="enhance"> 		     		          p<sup>un+4J</sup>  		               </font></a></p>
<p>Another favorite trick of mine is to use the huge lists as input to try to break the software.   Or huge genes.  Or huge exons.  Or teeny ones.  I keep a collection of biological oddities in my back pocket for testing situations.  I figure if the software can handle the extreme cases, it can probably handle the average stuff as well.  But handling the average situation does not mean it can handle the uncommon things.</p>
<p>Happy testing!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.openhelix.eu/?feed=rss2&#038;p=33</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
