Category: Guest Posts

Guest Post: SNAP — Andrew Johnson

22 June, 2010 (14:01) | Genomics Research, Genomics Resource News, Guest Posts, New Resource | By: Trey

This next post in our continuing semi-regular Guest Post series is from Andrew Johnson, one of the developers and the concept designer of SNAP, SNP Annotation and Proxy Search which is hosted at the Broad Institute. If you are a provider of a free, publicly available genomics tool, database or resource and would like to convey something to users on our guest post feature, please feel free to contact us at wlathe AT openhelix DOT com or the contact form (write ‘guest post’ as subject heading). We welcome introductions to your resource, information on updates, highlights of little known gems or opinion pieces on the state of genomic research and databases.

SNAP (http://www.broadinstitute.org/mpg/snap/, Johnson et al. (2008) Bioinformatics 24(24): 2938), “SNP Annotation and Proxy search”, is a flexible, web-based tool that allows anyone in the world to quickly accomplish a range of SNP-related genetics and bioinformatics tasks. This post highlights some common questions andfeatures of SNAP, some more obscure uses, and recent and planned developments.

How did SNAP come about?

The idea for SNAP was originally sparked by GWAS analysts within a large collaborative group (the Framingham Heart Study SHARe project). This was in the pre-imputation era when GWAS investigators from different groups using different SNP arrays often wanted to find best proxy SNPs based on HapMap for comparison when they didn’t have common genotyped SNPs across groups. We initially implemented local programs to lookup upHapMap LD and also consider the presence of query and proxy SNPs on different commercial genotyping arrays. We quickly realized this was a community-wide problem as we received requests from outside collaborators so we decided it was worth developing a public tool and approached investigators at the Broad Institute. Through collaboration with Paul de Bakker, Bob Handsaker and others at the Broad Institute we were able to add more features like plotting and build a nice, quick and accessible interface. Many people have contributed ideas, testingand improvements to SNAP, and Bob Handsaker and Pei Lin in particular continue to maintain and update SNAP.

What do you use SNAP for the most?

The two major features of SNAP widely used 1) SNP LD queries, and 2) plotting of LD and association data. There are a number of flexible options for these functions. Beyond these, as a SNP bioinformatics specialist, I often use SNAP to rapidly retrieve information about a list of SNPs for other uses (see specialized queries below).

What are some commonly asked questions from users of SNAP?

Click to continue reading “Guest Post: SNAP — Andrew Johnson”

Guest Post: WAVe – Pedro Lopes

25 May, 2010 (00:02) | Genomics Resource News, Guest Posts, New Resource | By: Guest

This next post in our continuing semi-regular Guest Post series is from Pedro Lopez, developer of WAVe at the University of Aveiro Bioinformatic Group in Aveiro Portugal. If you are a provider of a free, publicly available genomics tool, database or resource and would like to convey something to users on our guest post feature, please feel free to contact us at wlathe AT openhelix DOT com or the contact form (write ‘guest post’ as subject heading). We welcome introductions to your resource, information on updates, highlights of little known gems or opinion pieces on the state of genomic research and databases.

I would like to start by thanking Trey Lathe  for the opportunity to promote WAVe in this great blog. After his short tip of the week post, I’ll now try to make a more detailed overview of this new application.

What is WAVe?

WAVe stands for Web Analysis of the Variome and is a simple application focused on centralizing the access to distributed and heterogeneous locus-specific databases (LSDB). LSDBs are an emerging type of bioinformatics applications, aiming at providing gene-centric information regarding discovered genomic variants. In WAVe, we offer both LSDBs as well as to its variants. Moreover, we also provide access to a comprehensive list of carefully selected external resources. With this, users have, in a single application, access to gene and variation information enriched with a multitude of gene-related resources in a lightweight and easy to use web application.

What are WAVe’s key features?

At this early stage, WAVe’s publicly available features are related with data access. Users can easily browse through available genes, search for genes, view gene info and access each gene RSS feed. In WAVe’s entry page, users simply need to start typing a gene HGNC-approved symbol and several suggestions will appear: accepting one of them leads directly to the gene view page. Following the view all link, users can browse all available genes or check, for each gene, how many LSDBs and variants are available.

To access the application data, users just need to navigate in the gene tree. Each tree node represents a distinct data type and the various leaf provide access to external applications: by clicking a leaf, the destination page is loaded in the main content area. Repeating this process, users can navigate in the dozens of listed links for each gene.

WAVe also offers its core data to other developers. To obtain the gene tree and its links, users just need to add the rss tag to the end of gene address. This will output a RSS2.0 feed that can be easily parsed by any application or added to a feed reader.

How was WAVe born?

The european GEN2PHEN project is an initiative to link, as deeply as possible, data from genotype features to its phenotype counterparts. The first step consisted in an attempt to improve various genomic variation resource scenarios. This implied normalizing LSDBs (the “LSDB-in-a-box” approach, LOVD) and defining novel data models and formats for data exchanges from and to LSDBs.

In a long term perspective, applying the GEN2PHEN-approved data models, will enhance the creation of new services and applications to integrate and interact with the exponentially growing dataset of genomic variation data.

With WAVe we tried a different approach based on three questions: why wait for everyone to adopt these new formats? What will happen to legacy LSDBs that won’t adopt the new formats? How can we have an immediate solution? We have created a lightweight integration architecture, based on links to applications and adopted a simple (yet familiar) tree-based navigation interaction to deploy a new application that can be used right now and will easily scale to integrate the foreseen data exchanges formats. Technical details aside, based on a manually curated LSDB list, we can connect and integrated any kind of LSDB application whether it is a modern LOVD application or a simple text-based legacy LSDB.

How is it relevant?

To demo WAVe efficiency let’s just try to perform a simple search in our lab: Are there any LSDBs for COL3A1 gene in the human species? And known variants? And what are the associated proteins and pathways?

In a WAVe-free scenario, to find out COL3A1 LSDBs (if any), researchers need to google it (the main COL3A1 LSDB does not appear in the first result page) or, if you they are used to it, go to HGVS site, go to the “Databases & Tools” section, select “Locus-specific Mutation Databases” and then search for the gene in search box. Now for the variants researchers just need to browse the last page they’ve just entered. How many clicks (and time!) does it take?

For protein information, researchers enter in UniProt and search for COL3A1: that gives about 29 results. Add a filter for the human species and there are 5 results. Good enough to access directly to P02461 (SwissProt reviewed). Though, there is new window/tab open. Now for pathway information, a KEGG quick search for COL3A1 lists 14 results. In the end, there are about 3 windows/tabs and made some 20 mouse clicks to obtain the desired information.

Using WAVe, researchers simply need to access WAVe, start typing the gene HGNC symbol, select COL3A1 from the suggestions and access COL3A1 page. Once in the page, it’s as easy as browsing in the tree… Variations? Check the variation node, they’re even grouped according to the change type. UniProt information? Check the protein node where you have direct access to SwissProt, TrEMBL, PDB, Expasy and InterPro. And I guess you get the picture. In the end, one window/tab and about 6/7 mouse clicks.

Other UA.PT Bioinformatics tools

At the University of Aveiro’s Bioinformatics research group we are mainly young and enthusiast computer science experts, simply trying to make biology easier (at least in terms of computer applications!). Our more relevant web-based tools include MIND (a microarray analysis tool), GeneBrowser (a gene expression tools, useful to process data gathered from systems like MIND) and QuExT (a comprehensive MEDLINE mining application).

-Pedro Lopes

Guest Post: New features at CTD – Allan Peter Davis

18 May, 2010 (00:27) | Guest Posts | By: Guest

This next post in our continuing semi-regular Guest Post series is from Allen Peter Davis, of Comparative Toxicogenomics Database (CTD) at Mount Desert Island Biological Laboratory (MDIBL). If you are a provider of a free, publicly available genomics tool, database or resource and would like to convey something to users on our guest post feature, please feel free to contact us at wlathe AT openhelix DOT com.

The Comparative Toxicogenomics Database (CTD) is a free, public resource that promotes understanding about the effects of environmental chemicals on human health.  Since Trey’s original Tip of the Week about CTD, we’ve added many new features we’d like to highlight.

* The redesigned CTD homepage makes navigation easier and more intuitive.  Check out the keyword quick search box on every page, and try the “All” setting to see the scope of information available at CTD.

* A new Data Status page uses tag clouds to display the updated content for that month.

* We are particularly pleased to announce new statistical analyses of CTD data.  Chemical pages now feature enriched Gene Ontology (GO) terms, garnered from the genes that interact with a chemical.  In this release, CTD connects over 5,000 enriched GO terms to more than 4,500 chemicals.  As well, now our inferred chemical-disease relationships are also statistically scored and ranked.  Both new features will help users explore and generate testable hypotheses about the biological effects of chemicals.

* GeneComps and ChemComps discover genes or chemicals with a similar toxicogenomic profile to your molecule of interest.  Learn more about this feature in our recent publication.

* Reactome data are now also included with KEGG, for a more comprehensive view of pathways affected by chemicals.

VennViewer and MyGeneVenn are new tools that compare datasets for chemicals, diseases, or genes (including your own gene list) using Venn diagrams to discover shared and unique information.  These two visualization tools are a nice accompaniment to our original Batch Query tool for meta-analysis.

* The FAQ section under the “Help” menu provides examples of how to maximize your experience with CTD.

* Download our Resource Guide (pdf link) to keep as a handy reference card for CTD.

From the homepage, you can also subscribe to our monthly email newsletter to keep current with CTD’s growing content and features.  You can always contact us to request curation of your favorite chemical or paper.  And with our new “Author Alert” email program, we’ll even contact you to let you know when we’ve curated data from one of your publications in CTD.

We strive to be the best possible resource of chemical-gene-disease networks for the biological community, so feedback and input from users are of great importance to us.

- Allan Peter Davis

Guest Post: CoGe, The Suite for Comparative Genomics – Eric Lyons

4 May, 2010 (05:01) | Genomics Resource News, Guest Posts | By: Guest

This next post in our continuing semi-regular Guest Post series is from Eric Lyons, of CoGe at the University of California, Berkeley. If you are a provider of a free, publicly available genomics tool, database or resource and would like to convey something to users on our guest post feature, please feel free to contact us at wlathe AT openhelix DOT com.

Thanks both for the prior CoGe post (editors note: a tip of the week on GoGe) and the invitation to write a bit about CoGe.  Since most people are probably not familiar with CoGe, let me begin with how it is designed:

CoGe’s architecture and philosophy:  Solve a problem once

CoGe is a web-based platform for comparative genomics and consists of many interconnected web-based tools.  The entire system is hooked up to a database that can store any version of any genome in any state of assembly from any organism (currently ~9000 genomes from ~8000 organisms). Each of CoGe’s tools is designed to do one task (e.g. search and display information about a genome, compare two genomes and generate syntenic dotplots, search any number of genomes for similar sequence, manage a list of genes, etc.), and are linked to one another. This means that there is no predefined analysis workflow. Instead, people can begin exploring a genome of interest, compare it to what they want, find something interesting, explore that, finding something else, explore that, etc.) People anywhere in the world can perform computationally intense analyses by clicking a few buttons on a web-page, and letting our servers crunch away on whatever genomes we have currently loaded in our system .  Since each tool is web-based, links are used to move from tool to tool which creates an easy way to save an analysis for future work or to send to a colleague. This also has the benefit that as we develop new tools to solve a specific problem, we can generalize the solution, and plug it into CoGe’s database and connect it to its pre-existing tool set. Overall, this allows an easy way for us to expand CoGe’s functionality.

Click to continue reading “Guest Post: CoGe, The Suite for Comparative Genomics – Eric Lyons”

Guest Post: CHOP’s new tool, CNV Workshop – Xiaowu Gai

2 March, 2010 (00:01) | Genomics Resource News, Guest Posts, New Resource | By: Guest

This next post in our continuing semi-regular Guest Post series is from Xiaowu Gai, the Bioinformatics Core Director at CHOP . If you are a provider of a free, publicly available genomics tool, database or resource and would like to convey something to users on our guest post feature, please feel free to contact us at wlathe AT openhelix DOT com.

Thanks to Mary for running a Tip of the Week – “CHOP CNV database” a couple of months back. CHOP CNV database is a high-resolution genome-wide survey of copy number variations of a large number (2,026) of apparently healthy individuals. It is publicly accessible and has been widely used by a large number of research groups world-wide. I am now pleased to announce the public release of our software system behind it: CNV Workshop. CNV Workshop is a suite of software tools that we have developed over the last a few years. It provides a comprehensive workflow for analyzing, managing, and visualizing genome copy number variation (CNV) data.

It can be used for almost any CNV research or clinical project by offering the following capabilities for both individual samples and cohort studies:

CNV identification
Implements a modified circular binary segmentation algorithm that reduces false positives
Fully configurable parameters for sensitivity/specificity management
Annotation
Individual locus-specific annotations such as position, type of variation, call metrics, and overlap with CNVs of other data sets, including the Database of Genomic Variants.
Functional gene annotations such as genes affected and known disease associations
Accepts user-provided annotations
Presentation
GBrowse-enabled visuals for querying, browsing, interpreting, and reporting CNVs
Export of results into Excel, XML, CSV, and BED files
Direct links to public resources such as the UCSC Genome Browser, NCBI Entrez, Entrez Gene, and FABLE
Project and Account Management
Authentication and permission scheme that is especially useful for clinical diagnostic settings
Analysis result sharing within and between projects
Simple Web-based administrative interface
Remote access and administration enabled

CNV Workshop currently accepts genotyping array data from Illumina’s 550k, 610- and 660-Quad, and Omni arrays, along with Affymetrix’s 5.0 and 6.0 arrays, and can be easily configured to accept data from other platforms. The package comes preloaded with publicly available reference data from more than 2,000 healthy control subjects (the CHOP CNV Database). CNV Workshop also allows the user to upload already processed CNV calls for annotation and presentation.

The software package is freely available at http://sourceforge.net/projects/cnv/. It is also described in more detailed in our recent paper on BMC Bioinformatics.

-Xiaowu Gai

Guest Post: New at VISTA- Inna Dubchak

16 February, 2010 (07:00) | Genomics Resource News, Guest Posts | By: Guest

Our first guest post in our new semi-regular Guest Post series is from Inna Dubchak , principal investigator at the LBNL/JGI group, developers of the VISTA comparative genomics resource (who sponsors a tutorial, free to the users). If you are a provider of a free, publicly available genomics tool, database or resource and would like to convey something to users on our guest post feature, please feel free to contact us at wlathe AT openhelix DOT com.

I would like to give you a heads up on some new VISTA updates and ongoing development!

Updates: As you probably know from this blog, a new, still free VISTA tutorial is available now. We have introduced a lot of updates to these tools - built new programs, improved the existing ones, and entirely changed the design of the site to make it more up-to-date and convenient.

Main addition to the site – VISTA Point – combines capabilities of the three tools currently available at the site – VISTA Gateway, VISTA Browser, and Text Browser usually used step-by-step. VISTA Point makes analyzing multiple and pairwise genome alignments and extracting relevant numerical data much more straightforward, it is easy to update, expand and add new programs.

Soon: We are actively working on visualizing synteny at scales ranging from whole-genome alignment to the conservation of individual genes, with seamless navigation across different levels of resolution. In our upcoming VISTA-Dot tool we used the concept of two-dimensional “dot-plots”, historically employed in the analysis of local alignment, and an interactive Google-map-like interface to visualize whole-genome alignments. You will be able to get a display and analyze large-scale duplication in plants in one click! It can also be useful in genome assembly and finishing. Another addition coming in the near future, VISTA Synteny Viewer, presents a novel interface as three cross-navigable panels representing different scales of the alignment.

Attention: do not forget to use our whole-genome capabilities – Whole-genome VISTA to align sequence of any quality, from draft to finished, up to 10MB long, and Whole Genome rVISTA to evaluate which transcription factor binding sites (TFBS) are over-represented in upstream regions in a group of genes.

-Inna Dubchak

Coming up, Guest Posts

12 February, 2010 (11:35) | Guest Posts | By: Trey

Greetings! OpenHelix Blog is instituting a new semi-weekly feature. Every Wednesday we have our “Tip of the Week,” on Thursdays we have our “What’s Your Problem,” and now on an occasional Tuesdays we are going to have our “Provider Guest Post.” These will be posts from providers of genomics tools and database and will be opinions, updates and upcoming features of the resource, whatever the provider of the resource would like to convey to users. We have several lined up for the coming weeks, so keep checking back.

Additionally, if you are a developer or provider of an free, publicly available genomics or biological resource, database or analysis tool and would like to post in our guest feature, be it an introduction to your tool, updates or upcoming features or even an opinion about the current state of genomics research and data, please write us at wlathe AT openhelix DOT com. We would love to put you in the queue for the next guest post.

Our first guest post next Tuesday will be from Inna Dubchak , principal investigator at the LBNL/JGI group, developers of the VISTA comparative genomics resource (who sponsors a tutorial, free to the users). She’ll discuss some new tools at VISTA and give you a quick preview of some new upcoming features.

Happy Memorial Day (and gardening) to you this weekend!

22 May, 2009 (06:35) | General Science, Guest Posts | By: Cyndy

Summer is rapidly approaching and I’m so looking forward to a nice long Memorial Day weekend with outdoor cookouts and plenty of time for gardening. Those of us New Englanders that have endured a long, hard winter really appreciate ending our hibernation and spending time outside in the spring and summer. Gardening is one of my favorite activities, and in this region we are strongly advised to wait until Memorial Day to do the majority of our planting. But after hearing that one of my colleagues had just come down with poison ivy, I began to wonder why these plants so often get in the way of enjoying our short season of outdoor life. Poison ivy, oak and sumac have always been a very annoying part of growing up in New England. They are plants that I never had too many fond thoughts of. Yet, I never really knew much at all about them – other than the itchy, irritating red rash they cause – that is. I decided to do a little digging, reasoning that they must have some redeeming, or at least interesting, biological qualities. After all, it seems that they are only protecting themselves against all of us herbivores. They can’t exactly run away from us, so they have to keep us at bay some how. Their defense mechanism seems quite clever actually.

A quick check in Wikipedia revealed that poison ivy is a member of the Anacardiaceae family of flowering plants. To my surprise cashew and pistachio plants are also members of this same family. Apparently not all members of this plant family are skin irritants at least! The reaction you get from poison ivy is due to contact with urushiol, a very potent oil found in the sap. In fact, only about 1 nanogram is needed to cause a rash (as little as ¼ of an ounce is said to be necessary to cause a rash on every person on earth). The rash, or Toxicodendron dermatitis, is a result of the immune system’s delayed hypersensitivity response – i.e., the reaction may take hours or days to develop. Interestingly, about 20% of the population is not allergic to urushiol. They can wander through poison ivy indefinitely and have no problems (the genetic variations responsible for this trait are certain to be an interesting topic for future work in the genomics and immunology fields). Another surprising fact was that many animals don’t have any type of allergic reaction to urushiol. Deer, goats, horses and cattle are fine with these poisonous plants. In fact, one of the suggested ways to get rid of poison ivy is to get a goat. This seems to be another very interesting genetics of immunity issue – how and why do some animals manage to not only evade these plants, but thrive on them. As more complete genomes are resolved the genes, SNPs, or genetic variations in general, will be uncovered and we should all be enlightened.

Click to continue reading “Happy Memorial Day (and gardening) to you this weekend!”

Tugging at the public’s heart strings-and wallets!

12 January, 2009 (05:09) | General Science, Genomics News, Guest Posts | By: Cyndy

kids runningAfter the mad rush of the holiday season, I found a little time to catch up on reading, and stumbled across an astounding article in the NYtimes: New Genetic Test Asks Which Sport a Child was Born to Play, by Juliet Macur. A company called ATLAS (Athletic Talent Laboratory Analysis System) is offering a genetic test for kids which they claim will determine if your child is likely to be a super-athlete.  Furthermore, they say it can tell you what type of a super-athlete your child may become, i.e., a sprinter or a weightlifter, so that you can begin preparations and training sessions for your child as early as one year old. This test is based on genotyping for an isoform of the muscle protein actinin, ACTN3. Actinins are an important family of actin-binding proteins.

Well, since I am a mom of a seven year old and a scientist that has spent more than fifteen years doing basic muscle research, this really caught my attention. My first thoughts were that this was morally appalling and scientifically impossible. All that I had learned and studied about muscle had led me to believe that muscles just don’t work that way. No doubt that the actinins are crucial to muscle function, but I thought that one muscle protein couldn’t do all of that. My off-the-cuff guess would be that at least 200 genes would contribute to such traits, and probably half would encode non-muscle proteins (genes related to drive, metabolism and many other factors would seem likely to be involved equally here). I wondered if I had missed some major development in the muscle field while enjoying motherhood,

Click to continue reading “Tugging at the public’s heart strings-and wallets!”

Summary of webinar “CNVs vs. SNPs: Understanding Human Structural Variation in Disease”

11 November, 2008 (08:10) | General Science, Genomics Research, Guest Posts | By: Cyndy

NHGRI CNV imageDo you still believe that monozygotic, or identical, twins are really genetically identical? Or that we are all 99.9% genetically similar to each other? Well I certainly did, and boy was I wrong! It turns out that CNVs (Copy Number Variations) are causing the “facts” some of us learned in Molecular Biology 101 to be rewritten. If you, like me, thought that what you learned years ago was still true, then there is a great webinar you may want to watch. It is brought to you by Science/AAAS, and it features three prominent experts in genetic variability, Drs. Charles Lee, Lars Feuk and Alexandra Blakemore. The moderator is Dr. Sean Sanders, who is the Commercial Editor of Science. Even those of you that are up to speed on the current research can find many interesting facts and learn about the new techniques used to study CNVs, or just genetic variability in general. It turns out that CNVs are much more prevalent than was previously thought. You hear so much about SNPs that it seems like they are the source of genetic variability that we should be most concerned about, but CNVs are catching up real fast. This new field is rapidly advancing because of major technology breakthroughs.

All of the panelists present a short talk highlighting the prevalence, importance and experimental limitations of studying CNVs and their role in normal human variability, as well as in disease. They present some of their own data and discuss the future direction of this young field. This is followed by a very interesting question and answer session where they allowed listeners to email their questions. It may even turn out that CNVs are the reason that your personality, IQ, height and weight differ from your colleagues, friends and family. So not only is this an exciting new field, but it is certainly one we can all relate to!

Click to continue reading “Summary of webinar “CNVs vs. SNPs: Understanding Human Structural Variation in Disease””