Category Archives: Genomics News

ENCODE floods the news networks…

My social media is abuzz with ENCODE publications and chatter right now. Some of the things I’d recommend (besides the huge collection of papers and Nature site, of course) or that made me laugh:

ENCODE project team leader Ewan Birney’s insights: ENCODE: My own thoughts

Guardian: Thousands of ‘genes’ found in parts of genome dismissed as junk DNA

Not Rocket Science: ENCODE: the rough guide to the human genome

NPR: Scientists Unveil ‘Google Maps’ For Human Genome

NYT: Far From ‘Junk,’ DNA Dark Matter Plays Crucial Role

NHGRI: RT @genome_gov: ENCODE, a multi-year effort by more than 440 researchers, has yielded astounding genomic insights.

NBC News: New DNA project shows us living beyond our genes

BBC Video: Human genome ‘more active than thought’

Guardian Video: What the Encode project tells us about the human genome and ‘junk DNA’ – video

BBC story: Detailed map of genome function

Science NOW: Human Genome Is Much More Than Just Genes

Ars Technica: Cataloging the controlled chaos of the human genome

Wired: New DNA Encyclopedia Attempts to Map Function of Entire Human Genome

CNN: DNA project interprets ‘book of life’

CBC: ‘Junk DNA’ has a purpose, new map of human genome reveals

The Telegraph: Worldwide army of scientists cracks the ‘junk DNA’ code

LA Times: ENCODE project sheds light on human DNA and disease

The Economist weighs in. (Hm.) The new world of DNA

Wall Street Journal: ‘Junk DNA’ Debunked

Gizmodo: The Human Genome Is Far More Complex Than Scientists Thought

Slashdot: ENCODE DNA Project: Big Data to Solve Genome Mysteries

Cosmos: Decade-long DNA project prompts ‘gene’ redefinition

Most bizarre title spin so far: “Occupy” comes to DNA: A genome for the 99 percent

Snorf: Everything??  Gigantic New Study Changes Everything We Knew About Human Genes

John Timmer (Ars Technica again): Most of what you read was wrong: how press releases rewrote scientific history

Maggie Koerth-Baker: ENCODE, the media, and what we really know about the human genome

Elizabeth Finkel: Aussie geneticist wins wager over junk DNA

Faye Flam: Skeptical Takes on Elevation of Junk DNA and Other Claims from ENCODE Project








Giggle II:

Giggle III:

GenomeTV from NHGRI:


And when you are ready to look around at the data yourself, do come back for our tutorials on ENCODE:

ENCODE Foundations (first tutorial on early ENCODE data):

ENCODE Data Available through the UCSC Genome Browser II:


[I'm going to keep this as an ongoing repository of items I'm seeing. May be edited frequently over the next week or so.]

Enjoying the 2012 NAR Web Server Issue & a Cup of Coffee

In hunting for something to feature for this week’s tip, I noticed that Nucleic Acids Research had released their 2012 Web Server Issue back in July. As many of you are might be aware, the Nucleic Acids Research journal is a forum where developers can present computational biology papers that describe the development of biologically relevant algorithms, novel usage of existing algorithms, or that report the development of biological databases & their usage. The web server issue is an annual special issue focused specifically on web-based software resources for analysis and visualization of molecular biology data.

This year marks their 10th web server issue & I decided to check it out. In order to devote full attention to the issue, I began by pouring myself a big cup of coffee in one of my favorite mugs, which somehow makes it taste better. Then I set out to enjoy the issue – every year I always begin by reading the opening editorial & then the article on the bioinformatics links directory. The editorial usually explains special emphasis for the issue (this year it is analysis of next-generation sequencing data), and is written by the executive editor of the issue, Gary Benson. For me, the editorial sets the tone of the issue, so to speak.

Next I consume the directory article, along with a couple of sips of my java. What interests me in the article is multifold. First is the discussion of trends that they see in the development of tools and resources, which is important for us here at OpenHelix. Figure 6 provides an interesting look at the categories and counts of resources from each annual issue – I am curious as to why all but one category decline in 2008. Table 1 also provides interesting data on tool trends.

I am also interested in the content of the list itself – it is a great list being developed by people that we have a lot of respect for. I was especially interested in this sentence from their article:

“The Bioinformatics Links Directory has also initiated active curation of its content, removing dead content and correcting content errors, which has resulted in more accurate although occasionally smaller counts for 2012.”

The emphasis is mine in the quote above. In my opinion this is a very important aspect of any list. If you remember, Mary posted on the idea of “Obituaries for bioinformatics tools.” and started a BioStar post to collect this information. The BioStar post generated significant comment & looks like it may have helped inspire the Bioinformatics Links Directory team, from the comments. But it makes sense that you need not just collect information but to continue to maintain and filter that data so that it remains relevant – I mean if the forest is cluttered with dead wood, the useful “live trees” (ok, resources) are obscured from users, right?

The problem is that keeping any list (or documentation or tutorials, etc.) up-to-date is a hard, labor intensive activity. Here at OpenHelix we also keep a list of biology-relevant resources that can be searched through for free, without registering, from our homepage. We currently have a summer intern culling through a list of over 5,000 resources and tools that we know of. She is eliminating duplicate entries in our database by finding and collecting alternative URLs – it is amazing how many resources have multiple entryways, each with their own URL. But different doors don’t make a different resource or utility so we eliminate them form our list. Then we will tackle the dead resources, the listings that just go to a tiny tool internal to a main resource, or to a pre-formatted PubMed search for something.

Creating AND maintaining a high quality list is not a trivial effort. In their paper the Bioinformatics Links Directory team describes remaining current as a “future challenge” and says:

“Although necessary to remain current and to advance the utility of the Bioinformatics Links Directory, these improvements will only prove useful if driven by the community. As a community-driven repository, everyone in the research or bioinformatics community has the opportunity to help make the collection better and more meaningful. “

I truly wish them better luck at “community curation” than many resources have had in the past, & hope they succeed. In our experience it works best with stable, sufficient funding because as they say: “you get what you pay for”.

OK, next post will be on actual resources in the web server issue, I promise! :)

Quick links:

2012 NAR Web Server Issue:

Bioinformatics Links Directory:

OpenHelix Homepage & Search Portal:

Gary Benson (2012). Editorial: NUCLEIC ACIDS RESEARCH ANNUAL WEB SERVER ISSUE IN 2012 Nucleic Acids Research, 40 (W1) DOI: 10.1093/nar/gks607

Michelle D. Brazas, David Yim, Winston Yeung, & B. F. Francis Ouellette (2012). A decade of web server updates at the bioinformatics links directory: 2003–2012 Nucleic Acids Research, 40 (W1) DOI: 10.1093/nar/gks632

Paul Nurse: Family Trees Can Be Dangerous

One of the points that I have always made about the advent of personal genomics was that we are going to find out some family secrets that have been under wraps for a very long time. This may not always be a bad thing. But there are going to be some cases where the participants may not be quite so prepared to handle the information. Here Paul Nurse tells a tale of ancestry and genetics that illustrates some of that complexity. It’s only 10 minutes–and it’s quite funny. Have a look.

Direct to the YouTube in case you want that:

RCSB PDB Webinar follow-up post: further questions?

The February 15th “How to use the RCSB Protein Data Bank” webinar may generate some questions that we won’t have time to cover or there may be further questions that arise. So we’ll leave this blog post open for follow-up on issues that arise from our discussion. Feel free to add your questions as comments to the post.

The slides from that webinar are available here:

RCSB PDB tutorial and training materials

If you didn’t have a chance to fill out the survey, here’s the link:

Questions from the webinar chat:

1.  We were asked about differences in what I was showing in my screenshots, and what people were seeing on RCSB PDB pages. There are two possible reasons that this might be.

The first reason is that not all structures are associated with all types of information. For example, if you compare the 2ara structure summary page to the 2arc structure summary page you will notice that the 2ara page is the apo form of the molecule and does not have a ligand, a “Ligand Chemical Component” area, or an area for External Ligand Annotation information. The 2arc structure is bound to its ligand, alpha-L-arabinose (or ARA) and so it does have both Ligand Chemical Component area and an area for External Ligand Annotation on its page. The information that you see on various RCSB PDB pages will depend on what is available for the specific entry that you are looking at.

The second reason is a bit more trivial, but equally important – RCSB PDB pages are constantly being updated, and so while the screenshots in our training materials do accurately represent the overall organization and features of resource pages, specific numbers, reference lists, etc. may change over time from what our screenshots show. We work closely with the RCSB PDB developers to keep our materials up-to-date with major software updates that would dramatically change functionality of the tools we present.

2. Another great set of questions that we got during the webinar is whether people could use our materials for their students, and where to download our free training materials on the RCSB PDB.

We would be delighted if our materials were used to teach students, or were presented as a group meeting, etc – really any non-profit use of our freely available materials is fine by us – as long as you don’t remove our copyright notices. We often “train the trainer” in our live seminars or webinars, who then go out and further the reach of our materials by using them to train others at their institutions. And we’ve heard that our materials are great to use in a classroom because not only do we offer the online modular tutorial movie that can be watched online from any browser, we also provide PowerPoint slides with the full script text, handouts for note taking, and an exercise document that can be handed out as a class assignment, or used for a classroom “walk through” discussion.

All that leads to the next part of the question, where can I download OpenHelix’s free training materials on the RCSB PDB? You can download materials free of charge, and without registering at our RCSB PDB tutorial landing page. Just click any of the buttons on the page to 1) launch the movie, 2) download any of our materials, or 3) visit the RCSB PDB resource.


Quick note: Misha Angrist interview on Skeptically Speaking tonight

Hey folks–just a quick reminder of an interview with Misha this Sunday evening (North America). Other times found here: Event Time Announcer. If you can’t make it, the recording goes up a bit later in the week and you can check it out then.

Here’s the blurb from Skeptically Speaking, on Google+ (and you can find me and Misha in that conversation too).

Live, Sunday at 6 pm MT, we’ll discuss DNA, genetics, and personal genomics with Dr. +Misha Angrist, Assistant Professor at the Duke Institute for Genome Sciences & Policy, and author of Here Is a Human Being: At the Dawn of Personal Genomics. Email questions to, or join us live in the chat!

To listen live go here: Skeptically Speaking #143 Here is a Human Being.

My “View from Your Window” photo in Casablanca

Just a fluff post, tangentially related to genomics because we are in Casablanca recently and it’s about a photo I took there :D.

I read Andrew Sullivan often. Every Saturday he has a “View from your window contest” where he posts a photo a reader has submitted that was taken from a window. Readers have till the next Tuesday to send him their guess.  He posts these photos daily and has a book of them, which he gives to the winner of the contest.

Well, on our recent trip to Morocco , I took a photo out of our hotel room window. I took one look at it and thought “VFYW contest!” So, I submitted it. I was pleasantly surprised to find that he used my photo this Saturday. I eagerly awaited the readers answers today, and wasn’t disappointed. There were several guesses, several which got “Casablanca” and the right hotel. The winner actually got the right floor (6th).

Two ‘regrets’… one, in my email to him sent from my phone, it autocorrected “meet” with “moment” so the sentence “my family moments me tomorrow to take a trip to Marrakech, Fes and the Sahara.” should read “my family MEETS me”. Oh well. ANd if I took that photo with an eye to the contest, I would have shifted a bit to the left so “TEGIC” didn’t show up :D. It was a give away.

Though, I can NEVER get these. Sometimes I get the right continent, but his readers are pretty good at plant species, car models, architectural history, google earth and a bunch more. I’m amazed at how well they guess every week.

World tour of workshops, recent stop: Morocco, Africa

Trainers & organizers

Last year I had the opportunity to give a workshop in Ifrane Morocco (UCSC Genome and Table browsers, Galaxy) at Al Akhawayn University. This year, Mary and I returned for a longer 3-day workshop at University Hassan II in Mohammadia. OpenHelix was a co-sponsor of the workshop (donating our time, materials and expertise). The workshop covered a plethora of topics from a world tour of resources (tutorial-free) and introductory UCSC  Genome Browser (tutorial-free) and ENCODE (tutorial-free) to genome variation analysis in dbSNP (tutorial-subscription) and analysis using Galaxy (tutorial-subscription). You can see the full schedule of the topics Mohammadia Workshop Schedule here (pdf).

As last year, we were impressed with the students (there were 117 total, about 50/50 gender ratio). English is their 3rd or 4th language in most cases, Moroccan Arabic, French or various African languages being their language of choice. Yet, they were attentive and asked very perceptive and fascinating questions. They were also very enthusiastic

The workshop students

learners. It was a delight to teach them.

We’d like to thank Mohammed Bourdi at NIH, who spent large amounts of time and financial resources to organize this (and last year’s) workshop. We hope to repeat and expand these for next year and perhaps years to come. We will be looking for sponsors.

Several questions were asked at the workshop we’d like to reiterate the answers here and seek some answers from our readers:

*One student was looking for wheat genome resources for designing primers. The wheat genome is as yet incomplete, but there are some resources to get started:
Wheat Genome Sequencing Consortium
Gramene’s wheat resources
Wheat Genetic and Genomic Resource Center @ Kansas State
Perhaps also COGE for conserved sequences
edited to add:
CerealsDB and
James’ post on the wheat draft sequence might give some insight into that huge genome.
*Another student asked about dotplot tools:
Galaxy offers a large collection of EMBOSS tools including dotplot analysis, as does EBI Emboss tool

* Another question concerned finding a ‘dynamic programming’ (optimal solution) multiple sequence alignment tool as opposed to a heuristic one. The issue with this is the complexity of the search space of dynamic programming solution, this slide set might help with the understanding, particularly slides 1-5 and 17-22. It is too computationally intensive. That said, the student might want to check out MSAProps and this list at Wikipedia.

Do our readers have any other guidance on this?

Teaching moment

* Another student asked  if we know how to find DC-area internships in biological sciences. Another student (mathematician from Mali) was looking for something in the US in bioinformatics. Any ideas of programs to bring African biology students to the US or Canada?

If our Moroccan students (or anyone else) have any additional questions, please feel free to ask them here!


ANd a side note. Last year I had all of 3 hours to tour Fes. This year I took advantage of my trip. Mary and I spent a few days in Fes and Marrakech. My family joined us in Marrakech and later my family and I toured for 8 days visiting the Atlas mountains, the Sahara and Fes. Needless to say, it was a trip of a lifetime. Morocco is a fascinating and beautiful place. I look forward to visiting again.

Gates and doors of Fes are beautiful

camel excursion to the Sahara





What would bioinformatics professionals do with their personal genome? “I simply don’t want to know.”

Over the long holiday weekend I noticed an interesting item in my twitter feed. A number of people were pointing to the post entitled: My Genome Via E-mail by David Ewing Duncan. Some of you may be familiar with David’s writing and his big project called “Experimental Man“.  He has been exploring all sorts of biomedical tests and investigations about his body, making him probably the case of personalized medicine with the most depth at this point.

Well, he has also taken to genomics as part of this, of course. And now he’s one of the people in the Personal Genome Project and has his full genome sequence in hand. Well, sort of. He has it, but he’s asking for guidance on what to do with it:

This is an appeal: Send me you ideas for how best to interpret my newly sequenced complete genome!

Now, as an exercise over a year ago I thought this through. I have no expectation of having my genome any time soon–but it’s a question people ask me and I thought it was fun to think about. I reviewed that post the other day and I still think that’s what I’d do:

1) Assessment and QC

2) Build a personal genome browser with various tracks, including a literature track for personally curating stuff interesting or relevant to me

3) Look closer at specific medically relevant genes. I know this is looking under the flashlight, but the most knowledge and anything actionable would probably be in this set. I’d also look specifically into family issues (like that allergy/eczema stuff I found in my 23andMe data) and try to learn things there.

But I also thought I’d like to know what some of my peers in bioinformatics/genomics would do. As you may know if you follow this blog, we participate in discussions at BioStar. The participants here are active in genomics research around the world, and they are super-users of the tools of art in this field. Who better to ask? So I posted a question asking what they would do with their personal genome sequence. I offered my skeletal workflow as an example, and expected some thoughts on what they would do.

What would you do with your personal genome data? is my question over there.

To my surprise, the top rated answer at this time says this:

I may be in minority but I’ll say this: right now I simply don’t want to know – Did you ever notice how genomic variation never correlates with good news. It seems there is only bad news. There are no SNPs for happiness, friendship or love….

Um. Ok. That’s one way to approach this. I was surprised, really–I didn’t expect the question to become philosophical. I really wanted a workflow.

For most of the weekend the second rated answer was this:

Whatever you do with it should be up to you to decide, use it for your personalized medicine if you wish. So my sole recommendation is: Keep the data private and well protected and encrypted! Decide in an informed way, whom you grant access to them….

There were real concerns about the security and misuse of this data.

There are a couple of other interesting answers as well. I have to say it was fascinating. It wasn’t what I expected–but it was illuminating for me. I haven’t always been the most enthusiastic participant in the personal genomics debate, as I have real concerns about security and misuse of the information, and the current utility. But it’s certainly coming whether we are ready or not, and I really wanted to know what people would do with it in a concrete way if they had it. And I thought bioinformatics/genomics professionals would have the best leads on this.

“right now I simply don’t want to know”

I’m considering adding a bounty to my question over there. You can add some of your own points to the question for encouragement to obtain an answer. And I’ll still be the highest ranked identified female over there–so I can afford the points.

If you have some thoughts and want to join BioStar, and if you give me a decent workflow, I may award the points to you!  Anyone? Bueller? Bueller? Anyone?