Tag: ensembl

Tip of the Week: 1000 Genomes Project Browser

21 July, 2010 (08:27) | Genomics Research, New Resource, Tip of the Week | By: Mary


You may have been hearing about the 1000 Genomes project–it’s one of the ongoing “big data” projects that is going to yield a great deal of variation information about the human genome. The goal is to sequence well over1000 genomes to identify “most genetic variants that have frequencies of at least 1% in the populations studied”.  They are doing this by sequencing large numbers of samples  with 4x coverage. You can read more about their strategy in their About page on their web site. It also lists the anticipated sample populations.

In this week’s Tip of the Week I’m going to take a quick spin through their browser. (You can also download all the data, but I’ll be focusing on the browser.) They have begun to release data now, and there are 6 individual sequences available at this time.  These are part of their “pilot” studies.  You can get some details on the pilot from their about page, which links to this PDF about the samples.

They are using the Ensembl framework to display their data. So if you are familiar with using Ensembl you’ll have some facility moving around this browser.  One thing that isn’t apparent right away from the site is that you can click the Resembl link on the display to turn on a track that puts the read/coverage data on the viewer. I also liked the alignment display  of all 6 genomes–but I’m sure that’s going to get challenging to view later with more and more genomes.

In an exchange with their very helpful help desk yesterday, I got this quick summary of the samples you’ll see:

For the high coverage populations NA12891, NA12892 and NA12878 are the CEU trio, NA19238, NA19239 and NA19240 are the YRI trio both father, mother, child respectively and both children were daughters.

If you have questions about their data, be sure to go ask them for help–they were very speedy with answers for me :) .

Some of the project data has also been picked up by UCSC and you can access the same sequences in the UCSC Genome Browser in the Genome Variants track on the March 2006 human assembly. (You’ll also see Venter, Watson, and some other individual genomes there).

Quick links:

The Project: http://www.1000genomes.org/

The Browser: http://browser.1000genomes.org/

An article in Science with some background:  A Plan to Capture Human Diversity in 1000 Genomes

Friday SNPpets

28 May, 2010 (00:29) | SNPpets | By: Trey

Welcome to our Friday feature link dump: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

Tip of the Week: WAVe, Web Analysis of the Variome

5 May, 2010 (00:14) | Tip of the Week | By: Trey

Today’s Tip of the Week is a short introduction to WAVe, or the Web Analysis of the Variome. The tool was recently introduced to us, and I’ve found it a welcome introduction to the tools available to the researcher to analyze human variation. This is apropos considering the recent paper we’ve been discussing on the clinical assessment of a personal genome (here, here and here) and that papers implications for personalized medicine and the use of online variation resources. WAVe also has introduced me to some additional tools I’ve either not been aware of, or haven’t used, which might be of use such as: LOVD (Leiden Open Variation Database), QuExT (Query Expansion Tool, also from the same developers as WAVe), and others. Of course there are also database information pulled in from Ensembl, Reactome, KEGG, InterPro, PDB, UniProt, NCBI and many others. Take some time to check it out.

Choosing a genome browser for your organism…

26 April, 2010 (15:26) | Genomics Research, Genomics Resource News | By: Mary

There are a number of genome browsers out there–we’ve covered that a number of times.  And there are always new ones coming along.  With the onslaught of sequence data we’re about to get from high-throughput sequencing, more and more research groups, communities, and individuals are going to need to choose a genome browser to use to display their data.

One time I stumbled across the survey results for a group that was choosing a new platform to display their community’s data: MaizeGDB.  I wrote about it then because I thought it was interesting, and because I know people are facing this pretty regularly now.  We get asked.  But since that time they have progressed, implemented, and they wrote up their experience.  It’s now been published in Database.

It’s a pretty straighforward paper.  They describe their needs and their assessment of the resources their community had and used.  They surveyed likely users to see what they wanted, and how they felt about the pieces that already existed.  One piece they specifically noted–when asked, many users did not say they used Ensembl, but the Ensembl software was the foundation of one of the items they did say they used.  MaizeGDB writes:

This result shows that users may not be aware of the underlying browser software that the various web sites use.

Ah, yeah.  Here’s another thing this shows: database end users are definitely not thinking about browser software the same way database developers are.  And I do not mean end users are stupid.  They just do not think about this stuff the way software providers think they do.  We keep trying to tell providers this.  It’s not always well received.

So anyway, they move on to assess the candidates for their new implementation.  The focus on Ensembl, GBrowse, Map Viewer, UCSC Genome Browser, and xGDB.  They describe the framework, possibilities, and limitations of each for their purposes.  I think this is a nice look at the various options that lots of people considering the issue should find useful.  They also address that there are other browser that have since, or may still, come along in the future that could be considered, but at the time these were the focus.

They go on to describe their implementation experience.  They seem pleased with it.  And they highlight a one of their favorite pieces, a Locus Lookup tool, that they have added as well.  It sounds like it’s serving their community really nicely.

This is a highly useful paper for the people in the market for genome browsers.  It’s not for everyone, for sure.  Well, at least not yet.  But your day is coming. You’ll need a browser eventually….

You can check out their GBrowse implementation at: http://gbrowse.maizegdb.org/

And if you are interested you can see our free GBrowse training suite here: http://www.openhelix.com/gbrowse

References:
Sen, T., Harper, L., Schaeffer, M., Andorf, C., Seigfried, T., Campbell, D., & Lawrence, C. (2010). Choosing a genome browser for a Model Organism Database: surveying the Maize community Database, 2010 DOI: 10.1093/database/baq007

Andorf, C., Lawrence, C., Harper, L., Schaeffer, M., Campbell, D., & Sen, T. (2010). The Locus Lookup tool at MaizeGDB: identification of genomic regions in maize by integrating sequence information with physical and genetic maps Bioinformatics, 26 (3), 434-436 DOI: 10.1093/bioinformatics/btp556

EDIT: added links to a couple of older blog posts, should have had them in before….

New and Updated Online Tutorials for Ensembl Legacy and Overview of Genome Browsers

26 April, 2010 (11:50) | OpenHelix News | By: Trey

Comprehensive tutorials on the publicly available Ensembl and an overview of genome browsers enable researchers to quickly and effectively use these invaluable resources.

Seattle, WA (PRWEB) April 26, 2010 — OpenHelix today announced the availability of a new tutorial on Ensembl, and an updated tutorial suite on the Overview of Genome Browsers.

Ensembl is a genome browser to visualize and analyze human and many other species genomes. Though Ensembl recently updated the browser software, many species genome browsers still use the older versions of the browser. OpenHelix has a tutorial on the latest version, and has now created a new tutorial, Ensembl Legacy, to acquaint researchers with the older versions they might encounter. Overview of Genome Browsers is an updated tutorials which introduces researchers to some of the more popular genome browsers including Ensembl, Map Viewer, UCSC Genome Browser, the Integrated Microbial Genomes (IMG) browser and the GBrowse software. These two tutorials, in conjunction with larger, in-depth OpenHelix tutorials on UCSC Genome and Table Browsers, GBrowse. IMG, IMG/M, Ensembl and MapViewer and others will give you a set of training resources to help be efficient and effective at accessing and analyzing genome data.

The tutorial suites, available through an annual OpenHelix subscription, contain an online, narrated, multimedia tutorial, which runs in just about any browser connected to the web, along with slides with full script, handouts and exercises. With the tutorials, researchers can quickly learn to effectively and efficiently use these resources. The scripts, handouts and other materials can also be used as a reference or for training others.

These tutorials will teach users:

Ensembl Legacy

*about the Ensembl software and its developers
*how to access older versions of the browser from the Ensembl archive
*the differences and similarities between versions
*about some example installations of Ensembl at other databases

Overview of Genome Browsers

*where to find these 5 useful tools
*an overview of the organization and display features
*some guidance on how or why to choose a given browser for your research needs
To find out more about these and over 85 other tutorial suites visit the OpenHelix Catalog and OpenHelix. Or visit the OpenHelix Blog for up-to-date information on genomics and genomics resources.

About OpenHelix
OpenHelix, LLC, (www.openhelix.com) provides a bioinformatics and genomics search and training portal, giving researchers one place to find and learn how to use resources and databases on the web. The OpenHelix Search portal searches hundreds of resources, tutorial suites and other material to direct researchers to the most relevant resources and OpenHelix training materials for their needs. Researchers and institutions can save time, budget and staff resources by leveraging a subscription to nearly 100 online tutorial suites available through the portal. More efficient use of the most relevant resources means quicker and more effective research.

Quick Reference Cards for teaching and outreach

23 March, 2010 (11:25) | Genomics Research | By: Mary

We know there are a number of different ways that scientists and students become familiar with genomics software.   Some of it comes from the traditional publication routes–like the very handy NAR Database issue.  Or like the Current Protocols papers we’ve done recently.  We have these online tutorials that people use in various ways: some teach themselves by watching the video and working the exercises, some download the matching slide sets and run local workshops (our catalog: some are free/sponsored and green icons indicate that; red indicates subscription required). Librarians are using them to become “embedded” in courses in some cases.

A less-well-known type of material we have is the Quick Reference Card.  These are printed cards with URLs, hints, tips, definitions, shortcuts–for stuff that you may want a quick reminder of: where a feature is located, or how to use it.  People who run the local workshops will sometimes write to us to get a set for their courses.  They are great to give out at conferences to raise awareness of the software.

We have these cards for several resources that we also have free sponsored training videos + slides + exercises with: UCSC Genome Browser (2 cards–intro and table browser); Galaxy, and our newest: RCSB PDB and SGKB.  You can go to this form and order them, and we’ll send them out.

I bring this up today because we just received word from Ensembl that they have created a card that we can distribute as a PDF.  You can print it up and put it on the wall near the computer as a handy reminder of some features and tools at Ensembl.  Click the image to download the PDF, or go directly to the link below.

Summary:

Order OpenHelix printed cards for resources: http://www.openhelix.com/cgi/qrcOrder.cgi

Ensembl PDF card download: Ensembl_card_march2010.pdf

For good results, try several databases

22 January, 2010 (15:41) | Genomics Research, Genomics Resource News | By: Trey

Just wanted to point out this paper recently published in BMC Genomics: Integrating multiple genome annotation databases i… [BMC Genomics. 2010]

Often in our trainings we are asked which annotations or databases are best. Our stock and, frankly, accurate answer is that it depends on what you are looking for, your personal preferences and more. This paper concludes, at least when it comes to zebrafish transcript data, that pulling annotations from several databases instead of one increases the ability to get full data. Might sound obvious to some, but it’s always good to see the data.. and to point it out.

Can you spare a genome browser?

31 July, 2009 (16:11) | Genomics Research | By: Trey

Recently I’ve been coming across more and more requests and need for genome annotation and visualization software. Genomes are being completed left and right and researchers need ways to browse and annotate these genomes. There are a lot of tools out there. This post is a quick attempt to start listing those. It is not exhaustive right now, right now there are the ones off the top of my head and those focused a bit on visualization (though there is annotation). I plan to expand this list (have any to suggest) and enhance it with more descriptions as time goes forward. Probably make it a page if it becomes useful enough. I’m not listing databases (such as UCSC Genome Browser, RGD, Ensembl, Flybase, but rather software that researchers can use to create such browsable genomes). So, here we go…

Click to continue reading “Can you spare a genome browser?”

RetroDogs

27 July, 2009 (17:58) | Genomics Research, Genomics Resource News | By: Trey

nr_Bassett-DachshundI had a Basset Hound growing up. His name was Useless, Useless S. Grunt. Well, actually it was formally Ulysses S. Grant because the US Kennel Club wouldn’t accept Useless S. Grunt as a name as they felt it was too demeaning. Not sure if they felt it was demeaning to the dog or to the president, but that’s neither here nor there is it?

So,you ask, what made me think of that long-passed sweet dog that tripped over it’s too-long ears with it’s too-short legs? It turns out that they found out what genetic cause there was for those short legs in Basset Hounds (and Dachshunds and other breeds).

As NHGRI’s press release states:

In a study published in the advance online edition of the journal Science, the researchers led by NHGRI’s Elaine Ostrander, Ph.D., examined DNA samples from 835 dogs, including 95 with short legs. Their survey of more than 40,000 markers of DNA variation uncovered a genetic signature exclusive to short-legged breeds. Through follow-up DNA sequencing and computational analyses, the researchers determined the dogs’ disproportionately short limbs can be traced to one mutational event in the canine genome – a DNA insertion – that occurred early in the evolution of domestic dogs.

The insertion turns out to be a retrogene, which of course I also find interesting in that I studied retrotransposable elements. Reverse transcriptase has this habit of reverse transcribing RNA into DNA which can get reinserted back into the genome (hence processed pseudogenes of course).

The study is interesting for two reasons (other than because I had a Basset Hound and studied the evolution of retroelements ;) , it gives us a further clue into evolutionary events that lead to large changes in morphology and the role of retrotranscription and it gives us a clue into possible human conditions.

For more about dog genome, you can read our several posts about the dog genome, go to NCBI’s dog genome home site (or UCSC or Ensembl and other browsers) and read the paper (needs a subscription of course, it’s in Science). It’s an interesting read so far (I want to find some time to read it more fully, perhaps Useless doesn’t live up to his name.. he didn’t really even then :D ).

Tip of the Week: F-SNP

17 June, 2009 (00:01) | Tip of the Week | By: Trey

fsnp_thumbThere are a lot of databases to search for to find SNP data, HapMap, dbSNP, SeattleSNPs, Genome Variation Server and many more. I’m going to add one more to your data mining arsenal, F-SNP. F-SNP (described more fully here in the 2008 NAR Database issue),

provides integrated information about the functional effects of SNPs obtained from 16 bioinformatics tools and databases. The functional effects are predicted and indicated at the splicing, transcriptional, translational, and post-translational level. As such, the F-SNP database helps identify and focus on SNPs with potential pathological effect to human health.

…as they say in the introduction. It looks to be a good first stop to find SNPs of functional relevance. The databases they pull from to get their information include several I’ve mentioned above and also the UCSC Genome database, Ensembl, SIFT and PolyPhen predictions and more. I’ve given a quick intro in the tip this week on how to get functional SNP information from F-SNP.