Building computational skills for women in science

get trainingIt’s really crucial for scientists of all stripes to have some computational skills in their toolbelts. In genomics, the deluge of data that needs to be sorted, sifted, and analyzed is not going to stop–so it’s even more urgent that everyone gets some comfort and capability to work with the data, do some scripting to solve some issues you might have, and assess the tools you might want to be using.

For various reasons, women in science may not have pursued much coding. But there are new efforts to bring women researchers up to speed on some of these skills by the Software Carpentry team. There’s a workshop coming up in Boston in June that is a great opportunity if you’ve been thinking about tackling some of the basics.

Women in Science and Engineering (Boston): Jun 24-25, 2013

Click that link to access the registration page. It’s only $20 for two days. It’s a bargain.

Bostonians: if you don’t need the skills, send your students. Tell your department. Put up a flyer on the bulletin board.

Rare photo of me in the wild….

Of downtown Boston, at Tufts Medical Center, singing the praises of IMG and the Integrated Microbial Genomes resources.

I love workshops that only require a trip on the Orange Line.

Today we were doing the World Tour of Genomics Resources. Tomorrow it is UCSC Genome Browser (intro + advanced), and Thursday ENCODE. So if you want to workshop vicariously you can check out all of our tutorials on those. The slides, handouts, and exercises are all over there for you to download if you’d like.

As much as I love the online training and webinars and all, you really do get important information about the needs of folks in the room that you just don’t really get from the intertubz, and I do like to do the material live.

UCSC / ENCODE webinar follow up post

We’ll be having our July 24th UCSC – ENCODE webinar today, and we find there are questions to follow up afterwards that are often better handled in discussions on the blog.

If there are questions we didn’t have time to get to–or things we want to expand on with more detail–we can discuss them in this thread.

Or if you have other things you’ve been meaning to ask, let us know.

If you aren’t able to make the webinar today, don’t worry. In the near future, a movie with the exact same material will be available with the slides, and exercises tutorial suite–we’ll announce that on the blog when it’s ready. For now you can download the slides here.

You can also sign up to be informed of future webinars coming up other topics.

ENCODE data in the UCSC Genome Browser webinar Tuesday

We’ve got a webinar coming up Tuesday July 24 exploring our brand-new “ENCODE in the UCSC Genome Browser” materials. We are preparing the accompanying movie, but it’s still in our studio right now. So this is your first chance to see the new materials.

To register, go here:

This webinar explores how to identify and explore many of the human data types in the UCSC Genome Browser that have come through the ENCODE project. If you aren’t familiar with the ENCODE project–or need a refresher–you can see our ENCODE Foundations tutorial suite. That’s freely available because it was sponsored by the UCSC ENCODE team. It’s not required for the new tutorial, but it would give you some background on the project.

You can access our webinar slides here (zipped):

We’ll announce when the full suite of materials is available (very soon we hope!).

After the webinar, we’ll also field questions here on the blog in a follow-up post if we didn’t have time or the answers need links and further detail, or if you go away and try out some things and have other issues that arise you can ask us later.

Galaxy Intro Webinar follow-up post (July 19)

We’ll be having our July 19th Galaxy webinar today, and we find there are questions to follow up afterwards that are often better handled in discussions on the blog.

If there are questions we didn’t have time to get to–or things we want to expand on with more detail–we can discuss them in this thread.

Or if you have other things you’ve been meaning to ask, let us know.

If have registered for the webinar, the same material will be available  in the training movie, slides, and exercises tutorial suite: You can also sign up to be informed of future webinars coming up on these topics, UCSC, ENCODE and others.

Some questions asked in today’s webinar, with answers:

1) Galaxy seems to downloadable in addition to the PSU portal and the cloud at Amazon. How would you choose?

Each has it’s purposes. From the Galaxy Wiki:
Install your own Galaxy if you want to,

a) Develop it further
b) Add new tools
c) Plug-in new datasources,
d)Run a local production server for your site because you have
Sensitive data (e.g., clinical) or
Large datasets or processing requirements that are too big to be processed on Main

Use the Cloud:

“With sporadic availability of data, individuals and labs may have a need to, over a period of time, process greatly variable amounts of data. Such variability in data volume imposes variable requirements on availability of compute resources used to process given data. Rather than having to purchase and maintain desired compute resources or having to wait a long time for data processing jobs to complete, the Galaxy Team has enabled Galaxy to be instantiated oncloud computing infrastructures”

2) Can I use Galaxy to analyze protein data?

Yes, there are a few tools for analysis on the main instance, but also you can add your own tools to a local instance.

3) What kind of local server? Can you describe the PSU instance as an example? server size, storage. filesystem , etc. ?

Check out this link for needs.

4) Can we use galaxy to align the whole genome sequences of rice to get SNPs?

This link might help.

5) Is there a link to the toolshed from the galaxy interface?

Not that I know, but this is it:

6) How secure is the data we run on galaxy.psu?

 From the site (emphasis added in answer):

This is a free, public, internet accessible resource. Data transfer and data storage are not encrypted. If there are restrictions on the way your research data can be stored and used, please consult your local institutional review board or the project PI before uploading it to any public site, including this Galaxy server. If you have protected data, large data storage requirements, or short deadlines you are encouraged to setup your own local Galaxy instance or run Galaxy on the cloud.


UCSC Introduction Webinar follow-up post (May 17)

We’ll be having our May 17th webinar today, and we find there are questions to follow up afterwards that are often better handled in discussions on the blog.

If there are questions we didn’t have time to get to–or things we want to expand on with more detail–we can discuss them in this thread.

Or if you have other things you’ve been meaning to ask, let us know.

If you can’t make the webinar, the same material is covered in the training movie, slides, and exercises that are freely available, sponsored by the UCSC team:


Webinars on how to use UCSC Genome and Table browsers

As we have in the past, we are offering free webinars in the coming weeks on the UCSC Genome Browser and Advanced discovery using the Table Browser and custom tracks. These have been quite popular in the past, so sign up soon!

The Intro to the Genome Browser webinar will be Thursday, May 17th at 10am Pacific time (1pm ET). Check here for your time zone.

The Table Browser and custom tracks webinar will be Thursday, May 24th at 10am Pacific time (1pm ET). Check here for your time zone.

You can register here for the UCSC Genome Browser Intro and register separately here for the Table Browser and custom tracks webinar. You’ll need to register at OpenHelix if you haven’t already. It’s free and you’ll get no emails from us unless you opt in to our most excellent newsletter  or ask to be notified of future webinars :D. Registration and attendance are free.

We have several months of free webinars planned including on ENCODE, Galaxy, PDB  and others. Keep tabs with us here or on the webinars page (or though email notification) to be notified when those free webinars are coming.

“Regulatory and Epigenetic Landscapes of Mammalian Genomes”

In the series of talks from the Current Topics in Genome Analysis course from NHGRI, Laura Elnitski spoke on regulation and epigenetics. I’ll include some of my notes below, but be sure to check out the whole talk when you have a chance–and the slides are available for download from the CTGA page.

Dr. Elnitski frames the talk by indicating that we’ve been focusing on the roughly 2% of the genome that consists of protein-coding genes, but that there’s a lot more going on outside of that, and how much more there is to learn about other aspects of genome regulation. One of the papers she uses to illustrate that makes it clear how much of the variation we are aware of outside of protein coding regions (Hindoff et al 2009; around 11 minutes). That paper described the NIH GWAS catalog, which analyzed disease/trait-associated SNPs (TAS) and found that “88% of TASs were intronic (45%) or intergenic (43%)”. And if that’s the case, you need to think about ways to evaluate the effects of these differently than if it was a protein variation that resulted.

Due to this fact–that it’s not just proteins we need to be looking at–Elnitski says, “So throughout this talk we’ll take a look at functional categories of the genome, to further explain the steps you might consider to ascertain function at these GWAS sites.”

One way to evaluate a region that contains a non-coding variant is to consider it’s evolutionary relationships. How conserved is this tidbit in other species? Laura describes how PHASTCons and GERP can help you to analyze that (around 21 minutes). These tools use different approaches to find constrained elements. You can use knowledge of regions that have accelerated rates of change to suss out interesting features (she used the opposable thumb and foot/ankle region among bipedals as interesting examples of that sort of change; around 26 minutes).

Another type of landscape feature described was enhancer signatures. She offered a nice diagrammatic view of what this look like around a region to convey possible enhancer function (around 32 minutes). The look at the representation of the histone code could probably help people who are trying to use the ENCODE data tracks at UCSC to visualize that–and in slide 63 she looks at what the pattern of codes in an active promoter might look like, and then after that key differences of enhancers and what repressed regions look like (around 1hr). I found that really helpful.

One point she stressed though–epigenetic patterns are very cell-type specific–be sure to look at various cell types, and tread carefully with conclusions if your cell type of interest has not been evaluated yet (around 36 minutes). [As a side note, I worry about this particularly as a misuse by cranks of the features of epigenetics--they are already going out and telling people they can fix everything wrong with their health by affecting their epigenetics. Now, let's say you claim to treat diabetes or autism with your detox epi-fix--what is the impact on other cell types exactly??]

She also goes on to explain how these features rely on the 3D structure of the nucleus, looping interactions, and the packing of the chromosomes, with some nice guidance on how to think about that and the types of techniques to assess that. And just after I watched this, a paper came out describing more of this topology with the Hi-C strategy that she referenced.

It’s also important to consider that splicing defects can have consequences that wouldn’t be obvious just from looking at coding sequence per se. Although a substitution might be synonymous and not change an amino acid, it could still affect splicing. The SKIPPY tool that was developed by her group (and that Jennifer highlighted as a Tip of the Week) was suggested as a way to explore this (around 47 minutes).

This talk was a useful guide to thinking about non-coding genomic features to consider for your research. There were helpful graphics and tools provided. Have a look–it’s worth your time.


Woolfe, A., Mullikin, J., & Elnitski, L. (2010). Genomic features defining exonic variants that modulate splicing Genome Biology, 11 (2) DOI: 10.1186/gb-2010-11-2-r20

Hindorff, L., Sethupathy, P., Junkins, H., Ramos, E., Mehta, J., Collins, F., & Manolio, T. (2009). Potential etiologic and functional implications of genome-wide association loci for human diseases and traits Proceedings of the National Academy of Sciences, 106 (23), 9362-9367 DOI: 10.1073/pnas.0903103106

So. Cal. UCSC Genome Browser workshop, 3/29/12

Hi folks–

If you are in the Los Angeles area, we wanted to offer you the opportunity to attend one of our UCSC Genome Browser and ENCODE data workshops. Our hosts–led by Yate-Ching Yuan and her terrific team at City of Hope who have hosted us before–have organized a training day and they have a large enough room, so they have offered to invite folks from the wider biomedical community in the region to attend.

It would require you to bring your own laptop. And you’ll have to bring or get your own lunch. But you can participate in all of the workshop sessions with the City of Hope researchers.

The schedule and details of the workshop day are here (PDF).

A registration form is here (PDF). Seats are limited though, and we need to know how many training packets to send, so do fill it out by Friday of this week and send it to the COH team.

Join us!


City of Hope
1500 East Duarte Road
Duarte, California 91010

Map and directions