Category: Genomics Resource News

Friday SNPpets

23 July, 2010 (08:46) | Genomics Research, Genomics Resource News, SNPpets | By: Mary

Welcome to our Friday feature link dump: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

Get Your Abstract In for Biocuration2010

28 June, 2010 (09:09) | General Science, Genomics News, Genomics Resource News | By: Jennifer

As I’ve posted before, the next Biocuration meeting will be in Tokyo, Oct 11-14th. I recently got notified that they are now accepting abstracts for the meeting:

We cordially invite you to join us for Biocuration2010, the Conference
of International Society for Biocuration (ISB) and the 4th International Biocuration
Conference, is held in Tokyo, Japan, from October 11-14, 2010.

Abstract submission and registration are now open!

* abstract submission
http://hinv.jp/biocuration2010/abstract.html
* registration
http://hinv.jp/biocuration2010/registration.html

Currently abstracts are only being accepted June 1- July 14th and the organizers say authors will be notified by August 14th as to whether their abstracts have been accepted for presentation.

OpenHelix’s abstract was selected for presentation & I was really happy with all the comments, feedback and interest that we received – both from the poster presentation and the 15 minute talk we were able to give. If you’ve got a resource that you’d like to present, or get feedback on, from this group I highly suggest that you submit an abstract now!

Guest Post: SNAP — Andrew Johnson

22 June, 2010 (14:01) | Genomics Research, Genomics Resource News, Guest Posts, New Resource | By: Trey

This next post in our continuing semi-regular Guest Post series is from Andrew Johnson, one of the developers and the concept designer of SNAP, SNP Annotation and Proxy Search which is hosted at the Broad Institute. If you are a provider of a free, publicly available genomics tool, database or resource and would like to convey something to users on our guest post feature, please feel free to contact us at wlathe AT openhelix DOT com or the contact form (write ‘guest post’ as subject heading). We welcome introductions to your resource, information on updates, highlights of little known gems or opinion pieces on the state of genomic research and databases.

SNAP (http://www.broadinstitute.org/mpg/snap/, Johnson et al. (2008) Bioinformatics 24(24): 2938), “SNP Annotation and Proxy search”, is a flexible, web-based tool that allows anyone in the world to quickly accomplish a range of SNP-related genetics and bioinformatics tasks. This post highlights some common questions andfeatures of SNAP, some more obscure uses, and recent and planned developments.

How did SNAP come about?

The idea for SNAP was originally sparked by GWAS analysts within a large collaborative group (the Framingham Heart Study SHARe project). This was in the pre-imputation era when GWAS investigators from different groups using different SNP arrays often wanted to find best proxy SNPs based on HapMap for comparison when they didn’t have common genotyped SNPs across groups. We initially implemented local programs to lookup upHapMap LD and also consider the presence of query and proxy SNPs on different commercial genotyping arrays. We quickly realized this was a community-wide problem as we received requests from outside collaborators so we decided it was worth developing a public tool and approached investigators at the Broad Institute. Through collaboration with Paul de Bakker, Bob Handsaker and others at the Broad Institute we were able to add more features like plotting and build a nice, quick and accessible interface. Many people have contributed ideas, testingand improvements to SNAP, and Bob Handsaker and Pei Lin in particular continue to maintain and update SNAP.

What do you use SNAP for the most?

The two major features of SNAP widely used 1) SNP LD queries, and 2) plotting of LD and association data. There are a number of flexible options for these functions. Beyond these, as a SNP bioinformatics specialist, I often use SNAP to rapidly retrieve information about a list of SNPs for other uses (see specialized queries below).

What are some commonly asked questions from users of SNAP?

Click to continue reading “Guest Post: SNAP — Andrew Johnson”

Tip of the Week: BioCatalogue for finding web services

16 June, 2010 (08:45) | Genomics Research, Genomics Resource News, New Resource, Tip of the Week | By: Mary

A couple of years back at a conference I was introduced to BioCatalogue.  It seemed to me to be a really useful idea: locate bioinformatics tools and databases that are web-accessible, and that also have a mechanism to use the web service features to access the tool/server using strategies that don’t require the main web interface of the site.  There are some introductions  to the concept of web services out there–some of them are more for introduction, but most are aimed at programmers.  Essentially it is kind of a back door into the tool, and lets you pull the information you need out in ways that you want–not constrained by the main user interface.

BioCatalogue is a curated collection of these web services.  The creators  of BioCatalogue provide the framework and perform some of the  collection and annotation–but they also enable the user community to bring in web services and annotate them as well.  This means that you can use BioCatalogue to find and learn more about the services, and you can feed back into the system as well if you join the community.  If you are a software provider you can register your service there–so more people can locate you and learn about your project.  Another really nice aspect of BioCatalogue is that they monitor the services.  As we know at OpenHelix–plenty of times a tool you have accessed in the past is suddenly unavailable.  Sometimes they are intermittent server problems, but sometimes they are longer-term issues.  BioCatalogue is regularly checking  the status of the tools so you can have confidence that the tool has been up and seems stable.

The Web Server issue (see the 2009 issue here) of Nucleic Acids Research provides a wealth of  information about useful servers with bioinformatics tools.  And there’s a paper for the 2010 Server  issue about BioCatalogue that will offer more details on the background (linked below).  In this week’s movie I can only briefly introduce the site and the features available.  Check out the paper from the BioCatalogue team, and explore the documentation wiki to learn more about the features and functions that are  provided.

Now, these web services are not for everyone.  For many people the main user interface will still be the best mechanism to access a tool. But if you need more advanced or customized queries, or if you want to create inflows into your own tools, or if you want to use some of the cool work flow software that’s  out there now (such as Galaxy or Taverna)–web services may be right for you.

Check out BioCatalogue  (and remember the -ue spelling!) http://www.biocatalogue.org/

Bhagat, J., Tanoh, F., Nzuobontane, E., Laurent, T., Orlowski, J., Roos, M., Wolstencroft, K., Aleksejevs, S., Stevens, R., Pettifer, S., Lopez, R., & Goble, C. (2010). BioCatalogue: a universal catalogue of web services for the life sciences Nucleic Acids Research DOI: 10.1093/nar/gkq394

Galaxy Developer Conference slides available

8 June, 2010 (08:38) | Genomics Resource News | By: Mary

Recently the Galaxy team hosted a developer conference.  As I was following the tweets I was very intrigued by the topics–and the attendees certainly seemed excited by the presentations.

If you want to see what’s going  on out there  with various installations of Galaxy and the applications that are being employed, check out the slide collection from the conference here:

http://bitbucket.org/galaxy/galaxy-central/wiki/DevConf2010

If  you don’t  know much about Galaxy yet, check  out our introductory tutorial on it, sponsored by the Galaxy folks and freely available: http://www.openhelix.com/galaxy

Tip of the Week: Database of mouse databases

2 June, 2010 (09:15) | Genomics Resource News, New Resource, Tip of the Week | By: Mary


We are acutely aware of the thousands of bioinformatics resources out there, and we are often asked for guidance on finding a particular type of tool for some function or other.  There are some excellent lists out there which attempt to catalog the various tools–the NAR Database Issue and corresponding list, the Resource Collection at the Univ. of Pittsburg, and others.  But recently we saw one developed with a specific focus, which claims to bring together over 200 resources for the mouse.  The Mouse Resource Browser collects and categorizes a number of different types of things–not just databases, as we’ll see.  Find them here: http://bioit.fleming.gr/mrb

The curated collection of sites that may  be of use to mouse researchers has a number of features.  The developers used a questionnaire to elicit some information from the resource providers, and when they don’t have that input they have created some basic information for the records themselves. You can do a basic search for resources with a quick search box. There is an advanced search option.  I found the option of browsing by category (they have 22 categories) the most informative to figure out what kind of resources they had collected.

The data for a given record is organized across a series of tabs:

  • General: description, highlights and subject matter of the resource
  • Ontologies and Standards: if the resource relies on any of the important vocabularies or standards formats in the field, they are listed here
  • Technical: details of implementation, type of database, access methods, if there is a web services component, whether there are downloads or not
  • CASIMIR DDF: this is an interesting tab that assesses some of the features of the resources such as currency/updates, quality control process, versioning, technical documentation, user support, and more.

Although the focus is mouse, you’ll see some more broad types of resources in there.  For example, UCSC Genome Browser is listed as there is a mouse database there.  Reactome is listed.  These have a species range and include mouse, but are certainly not focused on mouses.  Other types of resources include commercial suppliers such as Charles River. So it isn’t limited just to things like sequence databases and things of that nature–it’s got more aspects that researchers employing mouse as a model system might find useful.

There are some choices they have made that I’m not sure I would have.  They list the MGI mailing list as a separate feature from MGI.  But as I thought more about it, I could see why.  There is good information there, and if you don’t know of it already a pointer might help.  But as I was thinking of the 200+ resources just for mouse, I thought that sort of affected the total.

If you use mouse as your model system, you will probably find some useful databases and other web sites that are handy for your work.  If you don’t work with mice, there are probably still some useful resources for your work as well.  Check out MRB’s site for more information: http://bioit.fleming.gr/mrb

Reference:
Zouberakis, M., Chandras, C., Swertz, M., Smedley, D., Gruenberger, M., Bard, J., Schughart, K., Rosenthal, N., Hancock, J., Schofield, P., Kollias, G., & Aidinis, V. (2010). Mouse Resource Browser–a database of mouse databases Database, 2010 DOI: 10.1093/database/baq010

Guest Post: WAVe – Pedro Lopes

25 May, 2010 (00:02) | Genomics Resource News, Guest Posts, New Resource | By: Guest

This next post in our continuing semi-regular Guest Post series is from Pedro Lopez, developer of WAVe at the University of Aveiro Bioinformatic Group in Aveiro Portugal. If you are a provider of a free, publicly available genomics tool, database or resource and would like to convey something to users on our guest post feature, please feel free to contact us at wlathe AT openhelix DOT com or the contact form (write ‘guest post’ as subject heading). We welcome introductions to your resource, information on updates, highlights of little known gems or opinion pieces on the state of genomic research and databases.

I would like to start by thanking Trey Lathe  for the opportunity to promote WAVe in this great blog. After his short tip of the week post, I’ll now try to make a more detailed overview of this new application.

What is WAVe?

WAVe stands for Web Analysis of the Variome and is a simple application focused on centralizing the access to distributed and heterogeneous locus-specific databases (LSDB). LSDBs are an emerging type of bioinformatics applications, aiming at providing gene-centric information regarding discovered genomic variants. In WAVe, we offer both LSDBs as well as to its variants. Moreover, we also provide access to a comprehensive list of carefully selected external resources. With this, users have, in a single application, access to gene and variation information enriched with a multitude of gene-related resources in a lightweight and easy to use web application.

What are WAVe’s key features?

At this early stage, WAVe’s publicly available features are related with data access. Users can easily browse through available genes, search for genes, view gene info and access each gene RSS feed. In WAVe’s entry page, users simply need to start typing a gene HGNC-approved symbol and several suggestions will appear: accepting one of them leads directly to the gene view page. Following the view all link, users can browse all available genes or check, for each gene, how many LSDBs and variants are available.

To access the application data, users just need to navigate in the gene tree. Each tree node represents a distinct data type and the various leaf provide access to external applications: by clicking a leaf, the destination page is loaded in the main content area. Repeating this process, users can navigate in the dozens of listed links for each gene.

WAVe also offers its core data to other developers. To obtain the gene tree and its links, users just need to add the rss tag to the end of gene address. This will output a RSS2.0 feed that can be easily parsed by any application or added to a feed reader.

How was WAVe born?

The european GEN2PHEN project is an initiative to link, as deeply as possible, data from genotype features to its phenotype counterparts. The first step consisted in an attempt to improve various genomic variation resource scenarios. This implied normalizing LSDBs (the “LSDB-in-a-box” approach, LOVD) and defining novel data models and formats for data exchanges from and to LSDBs.

In a long term perspective, applying the GEN2PHEN-approved data models, will enhance the creation of new services and applications to integrate and interact with the exponentially growing dataset of genomic variation data.

With WAVe we tried a different approach based on three questions: why wait for everyone to adopt these new formats? What will happen to legacy LSDBs that won’t adopt the new formats? How can we have an immediate solution? We have created a lightweight integration architecture, based on links to applications and adopted a simple (yet familiar) tree-based navigation interaction to deploy a new application that can be used right now and will easily scale to integrate the foreseen data exchanges formats. Technical details aside, based on a manually curated LSDB list, we can connect and integrated any kind of LSDB application whether it is a modern LOVD application or a simple text-based legacy LSDB.

How is it relevant?

To demo WAVe efficiency let’s just try to perform a simple search in our lab: Are there any LSDBs for COL3A1 gene in the human species? And known variants? And what are the associated proteins and pathways?

In a WAVe-free scenario, to find out COL3A1 LSDBs (if any), researchers need to google it (the main COL3A1 LSDB does not appear in the first result page) or, if you they are used to it, go to HGVS site, go to the “Databases & Tools” section, select “Locus-specific Mutation Databases” and then search for the gene in search box. Now for the variants researchers just need to browse the last page they’ve just entered. How many clicks (and time!) does it take?

For protein information, researchers enter in UniProt and search for COL3A1: that gives about 29 results. Add a filter for the human species and there are 5 results. Good enough to access directly to P02461 (SwissProt reviewed). Though, there is new window/tab open. Now for pathway information, a KEGG quick search for COL3A1 lists 14 results. In the end, there are about 3 windows/tabs and made some 20 mouse clicks to obtain the desired information.

Using WAVe, researchers simply need to access WAVe, start typing the gene HGNC symbol, select COL3A1 from the suggestions and access COL3A1 page. Once in the page, it’s as easy as browsing in the tree… Variations? Check the variation node, they’re even grouped according to the change type. UniProt information? Check the protein node where you have direct access to SwissProt, TrEMBL, PDB, Expasy and InterPro. And I guess you get the picture. In the end, one window/tab and about 6/7 mouse clicks.

Other UA.PT Bioinformatics tools

At the University of Aveiro’s Bioinformatics research group we are mainly young and enthusiast computer science experts, simply trying to make biology easier (at least in terms of computer applications!). Our more relevant web-based tools include MIND (a microarray analysis tool), GeneBrowser (a gene expression tools, useful to process data gathered from systems like MIND) and QuExT (a comprehensive MEDLINE mining application).

-Pedro Lopes

Galaxy Developer conference tweets

17 May, 2010 (08:49) | Genomics Research, Genomics Resource News | By: Mary

Sorry, this was mostly over the weekend–but there’s still some more today.  But the Galaxy team had a conference that brought together a lot of people running local installations of Galaxy for various purposes.  You can see the agenda here:

First Galaxy Developer Conference

There are slides, notes, and twitter comments from attendees if you are interested.  The hash tag #gxy will bring up the links. There is still another session today so there may be more activity.

For those of you who don’t know why Galaxy is so cool, check out our introductory training on Galaxy, freely available because it is sponsored by the Galaxy project team.  That will give you the foundation.  What these developers are doing is taking that foundation and customizing it for local needs and the tools their users want to access, and generate workflows, and share analysis, etc.

Or go directly to Galaxy here: http://www.galaxyproject.org

Tip of the Week: Chromhome, for karyotype level comparative genomics

12 May, 2010 (09:15) | Genomics Research, Genomics Resource News, Tip of the Week | By: Mary

Usually when we think about comparative genomics data, we are thinking about genomes that are pretty well sequenced, and we want to look at that data with variety of tools and algorithms.  But this past week we saw a question about less-well-sequenced genomes, and we thought it was an interesting inquiry.  The question was: is there a web site that displays comparative karyotype data?  So we went looking. And we found Chromhome.

Chromhome has a very straightforward interface.  You choose a target species.  You choose the probe species.  You click paint–and you get a look at the chromosome level homology.  When the data was performed with actual probes and reported in the literature, that data is provided. At the time the paper was published this consisted of more than 100 data sets.

There is also the opportunity to see inferred painting as well.  I’ll let the Chromhome paper authors describe that strategy:

If species A and species B are mapped on species N, then it is possible to deduce some of the chromosomal arrangements of A on B or B on A with respect to the arrangements of N chromosomes. Many of the species in Chromhome have been mapped on human chromosomes using chromosome painting. It is therefore possible to infer homologies between two species each of which have been hybridized with human probes.

So if this type of comparative genomics may be of interest to you, check out Chromhome.

http://www.chromhome.org/

Reference:
Nagarajan, S., Rens, W., Stalker, J., Cox, T., & Ferguson-Smith, M. (2008). Chromhome: A rich internet application for accessing comparative chromosome homology maps BMC Bioinformatics, 9 (1) DOI: 10.1186/1471-2105-9-168

I can haz outreach? Nobody speaks for the end users.

11 May, 2010 (08:23) | Genomics News, Genomics Research, Genomics Resource News | By: Mary

Recently there was much buzz in the #bioinformatics twittersphere over this blog post by Sean Eddy: The next five years of computational genomics at NHGRI

It is a very nice post about some exciting prospects for the future.  The idea of planning “explicitly for sustainable exponential growth” is wise.  There will be no abatement of the flow of data at this point–it’s no longer a big bolus of one species data, or one type of project.  The taps are wide open now, and we just keep adding more taps.

I also love the idea of “democratization“.  In part, it includes:

….To enable individual investigators to make effective use of large datasets, we must create an effective infrastructure of data, hardware, and software. NHGRI has extensive experience in big data, and can lead and catalyze across the NIH….

Now, I know this is a snippet of some thoughts–there may be more to it in the actual planning meetings on this.  But it pushed my buttons because it sounds a lot like what we always hear about big data projects: build it and they will come.

It got a little better in another segment:

Spur better software development. Traditional academia and funding mechanisms do not reward the development of robust, well-documented research software; at the same time, the history of commercial software viability in a narrow, rapidly-moving research area like computational genomics is not at all encouraging….

Well-documented research software.  Sigh.  We probably read more documentation than most people. And even the good documentation can be brutal. Dated. And not particularly effective. But still–if nothing else, please reward time spent on documentation….

But what is missing for me from this–and not just this, but most of these big data types of projects–is a real commitment to outreach and support for end users.  Formal, organized, supported, rewarded, outreach.  Sometimes there is a place to write to with questions.  But we probably send in more questions to projects than most people too–and the success rate for answers varies widely.  But even when we get good answers–that’s not enough.

I know funding is hard.  We can’t fund everything.  Databases and software project have to struggle to even persist.  Curation is frequently not valued enough.  And often curators are expected to do outreach as just one of their tasks…which pushes outreach even further down the priority list.  But without dedicated outreach–formal, quality, active outreach–databases and software projects won’t have so many users, and not many effective users.   Which will make funding agencies wonder if they should keep supporting them.  Which…well, you can see where this spiral goes….

What bugs me, I guess, is essentially this: Nobody speaks for the end users. There’s really no one in these types of meeting that really speaks for the consumers of this software and this data.  I mean people who aren’t directly attached to the data production and management.   The project teams think they are thinking about the users.  They really want users.  But ur not doin’ it rite.

I would like to see outreach and end user support valued, required, and really done right.  No matter how  much hardware and documentation you throw at these projects, if people 1) don’t know it exists, and 2) have no idea how to use it, the project will not yield all the results that it could. A marker paper is nice.  But it’s not sufficient, folks. And it’s nice to have the high-end team members talk at conferences. But that reaches only a tiny subset of the users or potential users.  And another thing about that: a lot of times people are hesitant to ask what sound like naive questions to the high-end representatives of these projects.  I’m jes’ sayin.

Yes, this is fairly self-serving for me to say.  But we see the users when we do outreach.  They crave it.  They love it.  We’ve been lucky to be a part of some great projects that do outreach right.  We have seen it work.  It should be Standard Operating Procedure on software and database projects.  Not an afterthought.