It’s really crucial for scientists of all stripes to have some computational skills in their toolbelts. In genomics, the deluge of data that needs to be sorted, sifted, and analyzed is not going to stop–so it’s even more urgent that everyone gets some comfort and capability to work with the data, do some scripting to solve some issues you might have, and assess the tools you might want to be using.
For various reasons, women in science may not have pursued much coding. But there are new efforts to bring women researchers up to speed on some of these skills by the Software Carpentry team. There’s a workshop coming up in Boston in June that is a great opportunity if you’ve been thinking about tackling some of the basics.
I love workshops that only require a trip on the Orange Line.
Today we were doing the World Tour of Genomics Resources. Tomorrow it is UCSC Genome Browser (intro + advanced), and Thursday ENCODE. So if you want to workshop vicariously you can check out all of our tutorials on those. The slides, handouts, and exercises are all over there for you to download if you’d like.
As much as I love the online training and webinars and all, you really do get important information about the needs of folks in the room that you just don’t really get from the intertubz, and I do like to do the material live.
If there are questions we didn’t have time to get to–or things we want to expand on with more detail–we can discuss them in this thread.
Or if you have other things you’ve been meaning to ask, let us know.
If you aren’t able to make the webinar today, don’t worry. In the near future, a movie with the exact same material will be available with the slides, and exercises tutorial suite–we’ll announce that on the blog when it’s ready. For now you can download the slides here.
We’ve got a webinar coming up Tuesday July 24exploring our brand-new “ENCODE in the UCSC Genome Browser” materials. We are preparing the accompanying movie, but it’s still in our studio right now. So this is your first chance to see the new materials.
This webinar explores how to identify and explore many of the human data types in the UCSC Genome Browser that have come through the ENCODE project. If you aren’t familiar with the ENCODE project–or need a refresher–you can see our ENCODE Foundations tutorial suite. That’s freely available because it was sponsored by the UCSC ENCODE team. It’s not required for the new tutorial, but it would give you some background on the project.
We’ll announce when the full suite of materials is available (very soon we hope!).
After the webinar, we’ll also field questions here on the blog in a follow-up post if we didn’t have time or the answers need links and further detail, or if you go away and try out some things and have other issues that arise you can ask us later.
a) Develop it further
b) Add new tools
c) Plug-in new datasources,
d)Run a local production server for your site because you have
Sensitive data (e.g., clinical) or
Large datasets or processing requirements that are too big to be processed on Main
“With sporadic availability of data, individuals and labs may have a need to, over a period of time, process greatly variable amounts of data. Such variability in data volume imposes variable requirements on availability of compute resources used to process given data. Rather than having to purchase and maintain desired compute resources or having to wait a long time for data processing jobs to complete, the Galaxy Team has enabled Galaxy to be instantiated oncloud computing infrastructures”
2) Can I use Galaxy to analyze protein data?
Yes, there are a few tools for analysis on the main instance, but also you can add your own tools to a local instance.
3) What kind of local server? Can you describe the PSU instance as an example? server size, storage. filesystem , etc. ?
This is a free, public, internet accessible resource. Data transfer and data storage are not encrypted. If there are restrictions on the way your research data can be stored and used, please consult your local institutional review board or the project PI before uploading it to any public site, including this Galaxy server. If you have protected data, large data storage requirements, or short deadlines you are encouraged to setup your own local Galaxy instance or run Galaxy on the cloud.
We have several months of free webinars planned including on ENCODE, Galaxy, PDB and others. Keep tabs with us here or on the webinars page (or though email notification) to be notified when those free webinars are coming.
Dr. Elnitski frames the talk by indicating that we’ve been focusing on the roughly 2% of the genome that consists of protein-coding genes, but that there’s a lot more going on outside of that, and how much more there is to learn about other aspects of genome regulation. One of the papers she uses to illustrate that makes it clear how much of the variation we are aware of outside of protein coding regions (Hindoff et al 2009; around 11 minutes). That paper described the NIH GWAS catalog, which analyzed disease/trait-associated SNPs (TAS) and found that “88% of TASs were intronic (45%) or intergenic (43%)”. And if that’s the case, you need to think about ways to evaluate the effects of these differently than if it was a protein variation that resulted.
Due to this fact–that it’s not just proteins we need to be looking at–Elnitski says, “So throughout this talk we’ll take a look at functional categories of the genome, to further explain the steps you might consider to ascertain function at these GWAS sites.”
One way to evaluate a region that contains a non-coding variant is to consider it’s evolutionary relationships. How conserved is this tidbit in other species? Laura describes how PHASTCons and GERP can help you to analyze that (around 21 minutes). These tools use different approaches to find constrained elements. You can use knowledge of regions that have accelerated rates of change to suss out interesting features (she used the opposable thumb and foot/ankle region among bipedals as interesting examples of that sort of change; around 26 minutes).
Another type of landscape feature described was enhancer signatures. She offered a nice diagrammatic view of what this look like around a region to convey possible enhancer function (around 32 minutes). The look at the representation of the histone code could probably help people who are trying to use the ENCODE data tracks at UCSC to visualize that–and in slide 63 she looks at what the pattern of codes in an active promoter might look like, and then after that key differences of enhancers and what repressed regions look like (around 1hr). I found that really helpful.
One point she stressed though–epigenetic patterns are very cell-type specific–be sure to look at various cell types, and tread carefully with conclusions if your cell type of interest has not been evaluated yet (around 36 minutes). [As a side note, I worry about this particularly as a misuse by cranks of the features of epigenetics--they are already going out and telling people they can fix everything wrong with their health by affecting their epigenetics. Now, let's say you claim to treat diabetes or autism with your detox epi-fix--what is the impact on other cell types exactly??]
She also goes on to explain how these features rely on the 3D structure of the nucleus, looping interactions, and the packing of the chromosomes, with some nice guidance on how to think about that and the types of techniques to assess that. And just after I watched this, a paper came out describing more of this topology with the Hi-C strategy that she referenced.
It’s also important to consider that splicing defects can have consequences that wouldn’t be obvious just from looking at coding sequence per se. Although a substitution might be synonymous and not change an amino acid, it could still affect splicing. The SKIPPY tool that was developed by her group (and that Jennifer highlighted as a Tip of the Week) was suggested as a way to explore this (around 47 minutes).
This talk was a useful guide to thinking about non-coding genomic features to consider for your research. There were helpful graphics and tools provided. Have a look–it’s worth your time.
Woolfe, A., Mullikin, J., & Elnitski, L. (2010). Genomic features defining exonic variants that modulate splicing Genome Biology, 11 (2) DOI: 10.1186/gb-2010-11-2-r20
Hindorff, L., Sethupathy, P., Junkins, H., Ramos, E., Mehta, J., Collins, F., & Manolio, T. (2009). Potential etiologic and functional implications of genome-wide association loci for human diseases and traits Proceedings of the National Academy of Sciences, 106 (23), 9362-9367 DOI: 10.1073/pnas.0903103106
If you are in the Los Angeles area, we wanted to offer you the opportunity to attend one of our UCSC Genome Browser and ENCODE data workshops. Our hosts–led by Yate-Ching Yuan and her terrific team at City of Hope who have hosted us before–have organized a training day and they have a large enough room, so they have offered to invite folks from the wider biomedical community in the region to attend.
It would require you to bring your own laptop. And you’ll have to bring or get your own lunch. But you can participate in all of the workshop sessions with the City of Hope researchers.