Category Archives: Genomics Research

Participate in an NSF “IDEAS LAB” (generate research agendas and proposals)

The short link: IUSE IDEAS LAB:

NSF’s education directorate has a funding opportunity called “Improving Undergraduate STEM Education” (IUSE).

The IUSE program description [PD 14-7513] outlines a broad funding opportunity to support projects that address immediate challenges and opportunities facing undergraduate science, technology, engineering, and math (STEM) education,
To generate research agendas and proposals for this, NSF is holding an… 

Ideas Lab:
Ideas labs are meetings that bring together researchers, educators and others in an “intensive, interactive and free-thinking environment, where participants immerse themselves in a collaborative dialog in order to construct bold and innovative approaches and develop research projects.” MOre often than not, these “Ideas Labs” produce new collaborations and research projects proposals that often go on to be funded. The Ideas Lab is patterned after the Ideas Factory process.
“to make new connections, which are frequently cross disciplinary, and also generate novel research projects coupled with real-time peer review.”
This NSF Ideas lab has several purposes, but the one most pertinent to this community is finding new ways, and develop research proposals, to infuse computational thinking, literacy and competency into the core curriculum for undergraduate education.
Individuals apply to the Ideas lab, it’s a 2 page proposal and is DUE FEBRUARY 4 (Next Tuesday). Funding is provided for the trip. These ideas labs are excellent ways to meet and discuss genomics, biology and education, build new collaborations and to develop new research proposals.
The letter and more information (read the link):
A Dear Colleague Letter on the topic of ³Preparing Applications to
Participate in Phase I Ideas Labs on Undergraduate STEM Education² [NSF
14-033] has been posted on the NSF web site.
If you have any questions, you can ask here or by email (wlathe AT ). I am _not_ a project officer at NSF and don’t have all the answers, but I can direct you to the places you might find answers.
PLEASE feel free to disseminate!

The Thanksgiving Genomes

Happy Thanksgiving to those who celebrate. For those of you who don’t,  have a nice Thursday.

Light posting this week due to the holiday, but this might be fun for you to keep in your back pocket for dinner discussion–genome information of the traditional foods.

The Genome of Your Thanksgiving Supper

The genetic sequences of the turkey, apple, potato, and other traditional Thanksgiving ingredients are providing bountiful lessons for scientists.

Public service announcement: CAFA2 for protein functional annotations

Just got this email on the Biocurators mailing list, wanted to spread the word:

Announcing CAFA 2: The Second Critical Assessment of protein Function Annotations

Friends and Colleagues,

We are pleased to announce the Second Critical Assessment of protein Function Annotation (CAFA) challenge. The goal of the challenge is to predict functional annotations of genes/proteins. In CAFA, the organizers provide a set of about 100,000 protein sequences, of which most are completely unannotated and some are partially annotated with respect to their function. The participants are asked to predict functional annotation of these proteins before January 15, 2014. At that time, all predictions will be stored and we will wait for 6-12 months until new annotations are available in the biomedical literature and/or major databases. The initial evaluation will be provided in July 2014, during the ISMB conference (Boston, MA). Anyone in the world is welcome to participate.

In brief:

Web site:

Prediction submission deadline: January 15, 2014

Initial evaluation: July 12, 2014 in Boston

All targets can be downloaded from The web site also contains training data; however, the participants are *not* required to use it and even if they do, they can use any additional data of their choice, including the literature. The CAFA challenge is different from many other similar challenges because not even the organizers know which of the original target sequences will be functionally annotated after the submission deadline.

The CAFA 1 experiment is described in the following paper:

P. Radivojac et al. A large-scale evaluation of computational protein function prediction. Nature Methods (2013) 10(3): 221-227.

A brief introduction to the problem for computer scientists is provided at:

The mission of the Automated Function Prediction Special Interest Group (AFP-SIG) is to bring together computational biologists who are dealing with the important problem of gene and gene product function prediction, to share ideas and create collaborations. We also aim to facilitate interactions with experimental biologists and biocurators.

We hope that AFP-SIG serves an important role in stimulating research in annotation of biological macromolecules, but also related fields.

New in CAFA 2:

In CAFA 2, we would like to evaluate the performance of protein function prediction tools/methods and also expand the CAFA challenge to include prediction of human phenotypes associated with genes and gene products. As the last time, CAFA will be a part of the Automated Function Prediction Special Interest Group (AFP-SIG) meeting that will be held alongside the ISMB conference. AFP-SIG will be held as a two-day meeting in July 2014 in Boston.

About the CAFA experiment:

The problem: There are far too many proteins for which the sequence is known, but the function is not. The gap between what we know and what we do not know is growing. A major challenge in the field of bioinformatics is to predict the function of a protein from its sequence (and all other data one can find). At the same time, how can we judge how well these function prediction algorithms are performing and whether we are making progress over time?

The solution: The Critical Assessment of protein Function Annotation algorithms (CAFA) is an experiment designed to provide a large-scale assessment of computational methods dedicated to predicting protein function. We will evaluate methods in predicting the Gene Ontology (GO) terms in the categories of Molecular Function, Biological Process, and Cellular Component. In addition, predictors may use the Human Phenotype Ontology (HPO) for the human dataset. A set of protein sequences is provided by the organizers, and participants are expected to submit their predictions by the submission deadline, January 15, 2014. The predictions will be evaluated during the Automated Function Prediction (AFP) meeting, which has been approved as a Special Interest Group (SIG) meeting, at the ISMB 2014 conference (Boston, USA).

History: The first CAFA experiment was conducted in 2010-2011. Twenty-three groups submitted fifty-four algorithms for assessment. The results and most methods were published in Nature Methods and in a special supplement in BMC Bioinformatics. CAFA 1 has brought together a large group of computational predictors and, for the first time, provided us with a clear picture of the state of this important field. As with other critical assessment experiments, the aim of CAFA is improve protein function prediction by continuously challenging groups to develop more accurate methods.

How to participate in CAFA 2?

1. Go to

2. Download target proteins, already available

3. Submit predictions on or before January 15, 2014

4. Join us at the AFP-SIG, July 11-12, 2014 in Boston for the eighth protein function prediction meeting, to hear the CAFA 2 results, to present your work, and to learn about the latest research in computational protein function prediction

More details at:

Confirmed keynote speakers:

Fiona Brinkman, Simon-Fraser University, Canada

Mark Gerstein, Yale University, USA

We look forward to hearing from you!

The CAFA organizing Team: Predrag Radivojac, Michal Linial, Sean Mooney and Iddo Friedberg

UCSC’s new Variant Annotation Integrator

In case you aren’t on the UCSC announcement mailing list, and you don’t go to the site via their homepage with the posted news–you should know about this new tool at the UCSC Genome Browser. It will take variations that you are exploring and make a prediction about whether the variant is associated with a function, and potentially if it is damaging to a protein. It’s under active development, so try it out. And if there are features you could use, suggest them. See the VAI page for more.

Here are the details via their email, but sign up for the “announce” mailing list to get this news like this in your inbox if you like too:

[Link to the original at the mailing list site]

Hello all,

In order to assist researchers in annotating and prioritizing thousands
of variant calls from sequencing projects, we have developed the Variant
Annotation Integrator (VAI). Given a set of variants uploaded as a
custom track (in either pgSnp or VCF format), the VAI will return the
predicted functional effect (e.g., synonymous, missense, frameshift,
intronic) for each variant. The VAI can optionally add several other
types of relevant information, including: the dbSNP identifier if the
variant is found in dbSNP, protein damage scores for missense variants
from the Database of Non-synonymous Functional Predictions (dbNSFP), and
conservation scores computed from multi-species alignments. The VAI also
offers filters to help narrow down results to the most interesting variants.

Future releases of the VAI will include more input/upload options,
output formats, and annotation options, and a way to add information
from any track in the Genome Browser, including custom tracks.

There are two ways to navigate to the VAI: (1) From the “Tools” menu,
follow the “Variant Annotation Integrator” link. (2) After uploading a
custom track, hit the “go to variant annotation integrator” button. The
user’s guide is at the bottom of the page, under “Using the Variant
Annotation Integrator.”

As always, we welcome questions and feedback on our public mailing list:

Brooke Rhead
UCSC Genome Bioinformatics Group


“Most viewed” item in figshare is….software training?

So if you go to visit figshare today, and you click the “Browse” link at the top, and then you select to sort by “most viewed” from the menu, what do you get?


Yes, for reasons I cannot explain, work that we’ve created or uploaded appears right at the top–the GenoCAD training we are developing, and a copy of the UCSC Genome Browser intro slides. Honestly–how we are beating “World Beer Consumption and Scientific Productivity” completely stumps me. I am rather pleased to see that the herring transcriptome is ranking so high too though.

I was joking on twitter the other night, though, that a #1 viewed rank and $3 will get me a cup of coffee at Dunkin’ Donuts. I’d love to see if this has any value in a grant situation, but I have no idea if it would. But it does make me wonder how and why this has happened. Is it really reflecting interest, or a need? Or is there some other way to interpret this?

Software training on genomics tools is a curious thing. A lot of people tell us how much they need this, and they appreciate the training which saves them lots of time in their work. We know we improve their awareness of what’s available, and their efficiency. At the last workshop we did at WashU, a woman in the back of the room emitted a huge sigh during Trey’s advanced UCSC section. Trey was worried that he’s confused her, but instead she said that in fact what he had just shown her saved her a ton of work. She was actually just incredibly relieved to learn what we could show her. And we see this a lot. But we have no way to measure that really.

But other times we find–say in grant situations–that software training isn’t scoring very high in the priority list. Yeah, it’s not novel and innovated enough I suppose. The people who need the training have no mechanism to push upwards really and express the need or quantify it. It’s kind of individual–you need what you need, when you need it. But it’s not an organized demand that we can point to in any way. Yet just a couple of weeks ago I attended a Software Carpentry training with 120 women who wanted better knowledge of software tools. Demand is there. I wish it was better recognized how important and useful it is.

I’m gonna go get a cup of coffee. And then make some more training. Go figure.


GenoCAD Tutorials. Mary Mangan, Mandy Wilson, Laura Adam, Jean Peccoud. figshare. Retrieved 16:33, Jul 08, 2013 (GMT).

World beer consumption & scientific productivity.. Christopher Lortie. figshare. Retrieved 16:34, Jul 08, 2013 (GMT).
Introduction to the UCSC Genome Browser. Mary Mangan. figshare. Retrieved 16:42, Jul 08, 2013 (GMT).

Who’s your daddy?

A new article in Slate describes a case of non-paternity unearthed as a result of a 23andme scan.

Who’s Your Daddy?

The perils of personal genomics.


I expect a bit of chatter from the genoscenti. I’ll collect responses below if I see them. I agree that the actual studies of non-paternity show values that are all over the map. But I suspect that there are going to be a lot of people affected by this who didn’t see it coming. And many of those stories will be quiet and private, and won’t be widely known. Some will turn into Jerry Springer, perhaps.

But I know of cases where this has already had serious impact, like the woman who was thrown out of her tribe as a result of her DNA test. This is a very heated topic in some circles: Tribal Enrollment and Genetic Testing Resources.

Interesting times.

All I could think of was this:

Decoding Annie Parker: film about the BRCA hunt

I didn’t know that this film was even in the works.

I know there’s controversy over the patents, but you have to acknowledge that the underlying science was really important. And I’m rather pleased to see a woman scientist in film. Looking forward to seeing it somewhere.

Here’s the film website:


Hat tip David Bachinsky via twitter:


EDIT: I just wanted to add some information that’s breaking now about Angelina Jolie’s recent announcement of her double mastectomy due to her BRCA1 testing. And here’s a good piece about some context for that: A Cautionary Perspective On Angelina Jolie’s Double Mastectomy

Video Tip of the Week: My Cancer Genome

computer_docThere are a lot of cancer database resources out there. Most of the ones we’ve focused on have been the data repository types. TCGA, ICGC, CaBIG, COSMIC, Cancer Genome Workbench, UCSC Cancer Genomic Browser, and of course big repositories like GEO. Researchers will need these sources of data to locate key alterations in cancer cells and tissues, and to evaluate changes with treatment conditions. But these are possibly not the most useful places for clinicians faced with a specific sample, or for patients trying to understand their situations. As more and more tumor sampling data becomes available, direct and specific access to actionable pieces of information will be crucial.

The MyCancerGenome site aims to serve that actionable end of the data spectrum. It has been developing for a while, but the recent story in the New York Times reminded me of it: Variations on a Gene, and Tools to Find Them. So for this week’s Video Tip of the Week, I bring you a look at the My Cancer Genome resources. They have a nice intro video that I will include here. It highlights features that I wouldn’t have been able to access–the part that links patient records + mutations + the curated detailed pages about the mutations and relevant studies. The public has access to that last part, but you wouldn’t be able to see the electronic health record part from the public side.

Papers are coming out that describe the deposition of information into the MyCancerGenome site. You can learn more about the philosophy and strategy about cataloging the somatic mutations that are clinically relevant in the recent paper about the DIRECT (DNA mutation Inventory to Refine and Enhance Cancer Treatment) project. A tab at that site shows you the initial data associated with that, from non-small cell lung cancer (NSCLC) mutations in the Epidermal Growth Factor Receptor (EFGR). And as more of this data comes along we’ll see it grow, of course. Seems a good step in translational medicine. So have a look at the useful and evidence-based information about specific cancer-related variations they are collecting.

Another feature is a search option to find clinical trials–by disease or by gene. I don’t think I’ve seen a gene-specific search for this kind of information before. This could be useful for people who need access to new treatment options if they have specific mutation data about their own tumors.

Have a look at My Cancer Genome, and think about where we are going with this data. I hope that the new cancer genomics data will really help drive appropriate and effective treatment strategies.

Quick link:

My Cancer Genome site:

NYT article: Variations on a Gene, and Tools to Find Them


Swanton, C. (2012). My Cancer Genome: a unified genomics and clinical trial portal The Lancet Oncology, 13 (7), 668-669 DOI: 10.1016/S1470-2045(12)70312-1

Yeh, P., Chen, H., Andrews, J., Naser, R., Pao, W., & Horn, L. (2013). DNA-Mutation Inventory to Refine and Enhance Cancer Treatment (DIRECT): A Catalog of Clinically Relevant Cancer Mutations to Enable Genome-Directed Anticancer Therapy Clinical Cancer Research, 19 (7), 1894-1901 DOI: 10.1158/1078-0432.CCR-12-1894

My Cancer Genome. 2013. (Accessed 4/30/2013).