Tag: UCSC Genome Browser

Friday SNPpets

30 July, 2010 (09:18) | General Science, Genomics News, Genomics Research, SNPpets | By: Mary

Welcome to our Friday feature link dump: SNPpets. During the week we come across a lot of links and reads that we think are interesting, but don’t make it to a blog post. Here they are for your enjoyment…

http://genome.hmgc.mcw.edu/
http://genome-mirror.duhs.duke.edu/
http://genome-mirror.bscb.cornell.edu/
http://genome-mirror.binf.ku.dk/
http://genome.qfab.org/

Free Genome Browser workshops available

27 July, 2010 (08:47) | General Science | By: Mary

Just a quick note: the UCSC team has announced the availability of free workshops. You can apply to bring the workshops to your facility.  If you go to the UCSC Genome Browser homepage right now and look in the news section at  the July 21 News area there’s a link to their survey/form.  You can also access it from a Gateway page in the yellow highlight.

Get a workshop! Tell your bioinformatics support folks. Sometimes librarians are the  keepers of training rooms and schedules at other sites. Suggest it to them. It would be really great for your community of researchers.

Tip of the Week: 1000 Genomes Project Browser

21 July, 2010 (08:27) | Genomics Research, New Resource, Tip of the Week | By: Mary


You may have been hearing about the 1000 Genomes project–it’s one of the ongoing “big data” projects that is going to yield a great deal of variation information about the human genome. The goal is to sequence well over1000 genomes to identify “most genetic variants that have frequencies of at least 1% in the populations studied”.  They are doing this by sequencing large numbers of samples  with 4x coverage. You can read more about their strategy in their About page on their web site. It also lists the anticipated sample populations.

In this week’s Tip of the Week I’m going to take a quick spin through their browser. (You can also download all the data, but I’ll be focusing on the browser.) They have begun to release data now, and there are 6 individual sequences available at this time.  These are part of their “pilot” studies.  You can get some details on the pilot from their about page, which links to this PDF about the samples.

They are using the Ensembl framework to display their data. So if you are familiar with using Ensembl you’ll have some facility moving around this browser.  One thing that isn’t apparent right away from the site is that you can click the Resembl link on the display to turn on a track that puts the read/coverage data on the viewer. I also liked the alignment display  of all 6 genomes–but I’m sure that’s going to get challenging to view later with more and more genomes.

In an exchange with their very helpful help desk yesterday, I got this quick summary of the samples you’ll see:

For the high coverage populations NA12891, NA12892 and NA12878 are the CEU trio, NA19238, NA19239 and NA19240 are the YRI trio both father, mother, child respectively and both children were daughters.

If you have questions about their data, be sure to go ask them for help–they were very speedy with answers for me :) .

Some of the project data has also been picked up by UCSC and you can access the same sequences in the UCSC Genome Browser in the Genome Variants track on the March 2006 human assembly. (You’ll also see Venter, Watson, and some other individual genomes there).

Quick links:

The Project: http://www.1000genomes.org/

The Browser: http://browser.1000genomes.org/

An article in Science with some background:  A Plan to Capture Human Diversity in 1000 Genomes

Ok, really, I’m going to blog again…

19 July, 2010 (15:07) | General Science | By: Mary

Sorry for the sparseness of late. We were all  over the place doing UCSC Genome Browser (we do intro + advanced), ENCODE, and Galaxy workshops.  At NIH we also did IMG and VISTA (Man, that security at NIH is fierce….).  Trey is still on the road, in fact, doing the training in Morocco.

Ok, you couldn’t be there…but all of those trainings are available on our web site right now, except for ENCODE. The same material we do in the online materials is what we do in workshops. The only one of those that requires subscription is IMG. And you won’t find ENCODE as a stand-alone tutorial yet–but that’s coming. We now have sent the script to the studio and we’ll be assembling that soon.

I do want to mention one thing that we think is interesting, and we see in almost every training we do. Nearly every time, more than half of the attendees at our trainings are female. Based on what you read about women falling out of the pipeline in science, you’d think there would be no way we’d even get 50%. But generally it is more than half women in these trainings. (We have the data if anyone can think of a way we can use that to get a grant :) )

Our current theory is that women are more likely to admit they could use the training (something like asking for directions…you know…?).  Or do men prefer documentation? We don’t know. What’s your theory?

Big data specialists…yeah, but…

23 June, 2010 (15:05) | General Science, Genomics Research | By: Mary

There is a great discussion on Big Data today that I found on the twittosphere.  Hat tip to Paul Blaser on the tweet that got my attention.  I have posted a comment over there, but I decided as I was writing it that I wanted to bring it over here as well.  (I also added some links here that I couldn’t add over there since without preview I hate to not be able to test them.)

Deepak has a post up on the blog business|bytes|genes|molecules called The Biological Data Scientist.  It speaks to big data projects, and the need to have specialists in biological data to handle it.

I suspect that we do actually agree on much of the concept.  But like a lot of things, I think more downstream about the implementation of the topic on the ground.  And my thoughts on that are below, which I posted as a comment over there.

+++++++++++++++++++

Hmmm…I certainly agree with large chunks of this. But I don’t agree that this should be the domain of some kind of data scientist.  Or–more specifically–it does need to have their hands to some point.  But I think it still needs to be accessible to the handful-of-genes bench biologists.

The idea of the multi-functional team is terrific, when it is possible.  But we see a lot of people who are not getting that kind of support from their local “bioinformatics” club–for a couple of reasons: if you have some big-data folks on site, they have their own project to worry about. They are not eager to hand-hold others on the way in to the data.  It’s not their job. It’s not what they are supported to do, and it doesn’t help them with their next grant.

If you have some kind of dedicated bioinformatics core support, the quality of the support varies widely: the kinds of things they do, the skills they have, the interest in actual support.

We have seen some great examples.  For example, it seems to me the team at CHOP in Philly provides this kind of support: in house tools to support the researchers, bringing in the right tools to add more support, training everyone up to some level so they are at least aware of what the tools can do. (Samples of CHOP tools, team, and training.)

On the other hand, we’ve been to some major institutions–many with “big data” projects, who are getting next to zero interaction with anyone who could help them.  You’d be stunned if I told you who these people are.

Then there are those who don’t even have a shot at this.  People trying to keep up, and write new grants with hot new data, that are in some mid-western campus that really just doesn’t even have someone to ask.  I talked to one woman once that needed a really simple thing out of the UCSC Genome Browser.  It took me roughly 5 minutes to build the right query, pull the data out of the table browser, and hand it to her. I thought she was going to kiss me.  She told me she had expected that to take her 6 months of benchwork.

I would hate to see this strategy create a tier of biologists who are nearly locked out the data.  Because it is also still imminently clear that we can throw a lot of big data at project, but the crucial details require the “small people” to look closely at them.  And many of them feel excluded from the club already.

Tip of the Week: Genome Variation Tour II

9 June, 2010 (02:28) | Tip of the Week | By: Trey

The last tip of the week I did was Genome Variation Tour I where we started our journey following one SNP in an individual’s genome through various databases to see what we can find out about that variation. In that tip we started out by looking at a SNP in the CYP4F2 gene in the UCSC Genome Browser and followed it to dbSNP. Today’s tip will continue our journey to OMIM to see what information we can find there. We’ll find this variation is clinically associated with Warfarin dosage effects and specifically this individual’s C/T heterozygosity indicates an intermediate dosage for effectiveness if indeed he ever needed this drug.  In some ways, your guess is as good as mine as to what we will find and what avenues we will be taking in the next few tips I’ll be doing. I’m am discovering information as I go along too. I can tell you though that the next installment of the genome variation tour will take us to PubMed, and a few not particularly well known but gem databases perhaps and probably back to the UCSC Genome Browser to expand our look at the interactions of several variations in this individuals genome.

Tip of the Week: Database of mouse databases

2 June, 2010 (09:15) | Genomics Resource News, New Resource, Tip of the Week | By: Mary


We are acutely aware of the thousands of bioinformatics resources out there, and we are often asked for guidance on finding a particular type of tool for some function or other.  There are some excellent lists out there which attempt to catalog the various tools–the NAR Database Issue and corresponding list, the Resource Collection at the Univ. of Pittsburg, and others.  But recently we saw one developed with a specific focus, which claims to bring together over 200 resources for the mouse.  The Mouse Resource Browser collects and categorizes a number of different types of things–not just databases, as we’ll see.  Find them here: http://bioit.fleming.gr/mrb

The curated collection of sites that may  be of use to mouse researchers has a number of features.  The developers used a questionnaire to elicit some information from the resource providers, and when they don’t have that input they have created some basic information for the records themselves. You can do a basic search for resources with a quick search box. There is an advanced search option.  I found the option of browsing by category (they have 22 categories) the most informative to figure out what kind of resources they had collected.

The data for a given record is organized across a series of tabs:

  • General: description, highlights and subject matter of the resource
  • Ontologies and Standards: if the resource relies on any of the important vocabularies or standards formats in the field, they are listed here
  • Technical: details of implementation, type of database, access methods, if there is a web services component, whether there are downloads or not
  • CASIMIR DDF: this is an interesting tab that assesses some of the features of the resources such as currency/updates, quality control process, versioning, technical documentation, user support, and more.

Although the focus is mouse, you’ll see some more broad types of resources in there.  For example, UCSC Genome Browser is listed as there is a mouse database there.  Reactome is listed.  These have a species range and include mouse, but are certainly not focused on mouses.  Other types of resources include commercial suppliers such as Charles River. So it isn’t limited just to things like sequence databases and things of that nature–it’s got more aspects that researchers employing mouse as a model system might find useful.

There are some choices they have made that I’m not sure I would have.  They list the MGI mailing list as a separate feature from MGI.  But as I thought more about it, I could see why.  There is good information there, and if you don’t know of it already a pointer might help.  But as I was thinking of the 200+ resources just for mouse, I thought that sort of affected the total.

If you use mouse as your model system, you will probably find some useful databases and other web sites that are handy for your work.  If you don’t work with mice, there are probably still some useful resources for your work as well.  Check out MRB’s site for more information: http://bioit.fleming.gr/mrb

Reference:
Zouberakis, M., Chandras, C., Swertz, M., Smedley, D., Gruenberger, M., Bard, J., Schughart, K., Rosenthal, N., Hancock, J., Schofield, P., Kollias, G., & Aidinis, V. (2010). Mouse Resource Browser–a database of mouse databases Database, 2010 DOI: 10.1093/database/baq010

Tip of the Week: The Cancer Genome Workbench

26 May, 2010 (08:34) | Tip of the Week | By: Jennifer


In today’s tip I’d like to introduce you to the Cancer Genome Workbench, or CGWB. The workbench gathers cancer information from a wide variety of projects including Johns Hopkins University and GlaxoSmithKline Cancer Cell Line Genomic Profiling Data, NCI’s Therapeutically Applicable Research to Generate Effective Treatment (TARGET), NHGRI’s Tumor Sequencing Project (TSP), The Cancer Genome Atlas (TCGA), and the Sanger Center’s COSMIC initiative and presents the cumulative data as high-level summary visualizations. The CGWB’s genome-browser view is built on a UCSC Genome Browser backbone, for power and flexibility.

I noticed an announcement in the May 7th Nature Signaling Gateway Update email that the NCI-Nature Pathway Interaction Database – May Update was featuring a bioinformatics primer on The Cancer Genome Workbench. The primer is great & goes into much more detail about the Cancer Genome Workbench than I will be able to in this quick tip. I strongly check the primer, and the workbench out. When I went over to the workbench to explore, I quite honestly was a bit taken back by the complexity of the displays – the amount of data presented in their summary visualizations are somewhat intense.

I hope that in my tip movie I will be able to convince you that the small investment you will need to do to get acclimated to their images is well worth the amount of data you will quickly understand how to analyze. The views are so data rich, it takes a bit of adjusting to – there is very little labeling (to keep displays as clean as possible) and information is provided via pop-up messages as you scroll over the display. Once I got past the intensity of the displays, I was really amazed by the scope of data visualized in CGWB displays – data on every chromosome & gene over multiple datasets/experiments, in one 2D image. As the NCI primer says, cancer is complex – really complex. Being able to see such ‘big picture’ views as those provided by the Cancer Genome Workbench is a really powerful analysis aid. I for one am impressed with this resource, which is why I’ve chosen to feature it today.

In my 5 minute tip I was only able to show you the briefest of glimpses of the CGWB landscape and heatmap views. I was not able to show you the details of wither view, including a hyperlinked list of genes with the highest mutation frequencies. Nor was I able to show you the full scope of other views which include genome browser views (based on the UCSC Genome browser, as I mentioned earlier), correlation plots, protein domain views, 3D vizualizations, as well as next-gen and trace sequence views. Check out figure 1 of the bioinformatics primer to see examples of those.

I’ve added a citation to the original CGWB publication. It was published in 2007, and so does not cover all the current functions of the workbench, but I think reading it might help give you an idea of the workbench because it goes into the goals and background that the CGWB is based on more than the primer, which is much more up-to-date and focuses on the functionality of the workbench. In this paper you can also read how the authors utilized the workbench to analyze three public datasets, and see how it expanded their research findings.

All & all, I think the Cancer Genome Workbench is an amazing resource for cancer research. Be sure to check out the tip movie, the primer, the original CGWB publication and especially the CGWB! Thanks for joining us for this week’s tip.

ResearchBlogging.orgZhang, J., Finney, R., Rowe, W., Edmonson, M., Yang, S., Dracheva, T., Jen, J., Struewing, J., & Buetow, K. (2007). Systematic analysis of genetic alterations in tumors using Cancer Genome WorkBench (CGWB) Genome Research, 17 (7), 1111-1117 DOI: 10.1101/gr.5963407

Tip of the Week: Genomic Variation Tour I

19 May, 2010 (00:07) | Tip of the Week | By: Trey

Today’s tip of the week is actually the first in a series of tips I will be doing over the next couple months. The recent paper in Lancet did a clinical assessment of an individual genome. In doing so, the researchers used various genomic resources do ascertain and interpret the data. We have a free tutorial on NIEHS SNPs that walks through some of these resources, but I thought it might be useful to follow one specific nucleotide variation through a lot more genomic databases to show the user what data is available and how to access it. Each tip I do over the next couples months (not every week, I do tips every 2-3 weeks) will follow a specific SNP through the databases. In this case, rs108622 in the CYP4F2 gene (cytochrome P450, family 4). These tips aren’t for the genome jockey’s and SNP surfers among us, they are more an introductory tour of what’s out there. They will be useful for those just starting to look at genomic variations, medical practitioners, clinicians or those just curious what is available. Today’s tip will start with the UCSC Genome Browser, find the variation and follow it through to dbSNP. Next tip will look closer at the dbSNP information and then follow the trail to OMIM and GeneTests. In later tips we’ll take the variation to another 4-6 different databases and genomic variation resources from HapMap and others. In the posts themselves I’ll link to even other variation databases. There is a plethora of them.

Education at NCBI

14 May, 2010 (09:59) | Genomics Research | By: Trey

I’d like to point out the new NCBI Education page. There is a lot there that you might want to check out. NCBI will be, starting this fall, offering a series of two-day training courses they are calling Discovery Workshops. Two years ago they ended the NCBI Field Guide workshops, so this seems to be a welcome change.

There are also webinars. Our research suggests that webinars are not particularly popular, so I’m curious how these turn out. There are also ‘how-to’ guides, documentation, community, teacher resources. It’s quite a nice site with lots of things to check out.

I’d also like to point out the “recommended links” section. There are lots of links to additional educational resources like the Cold Spring Harbor’s Dolan DNA Learning Center and much more. And, incidentally :) , a link to our own free tutorials which was very nice to see. You might want to check those out, we have over 10 including PDB, SGKB, UCSC Genome Browser, Galaxy, several model organism databases, and more.