Category Archives: General Science

Official Launch of the new Database of Genomic Variants (DGV)

The Database of Genomic Variants (DGV) has been working on a new site for a while. It’s been available as a beta site to get used to it and kick the tires, but now it’s ready for prime time. They are retiring the existing site and moving to the new version.

As a public service announcement, I’ll paste the text of their email notice here. We’ll update our tutorial soon–we like to give new sites a bit of time to work out the bugs, but then we’ll rework our materials as soon as we can.



We would like to announce the official launch of the new Database of
Genomic Variants! The latest release includes a number of updates and
corrections to the current data, and this completes the transition
from the original ( and the Beta
version ( to the new Database of
Genomic Variants ( ).

We will now host only one version of the database, and the original
site will be retired. We will continue to provide a track of the
original DGV data in the genome browser (gbrowse) which will be
searchable and include details from the original variant details
page. Any links from third party sites and software which use the
“VariationID” to point to the original DGV genome browser or
variant details page will be automatically redirected to the
corresponding entry in our new database. This will ensure that all
data (new and old) will be fully available to users.

We will work with the various partners and websites that provide
links to the original data to update the content to reflect the
information available in the new site.

With this final update, we have included a total of 53 studies,
representing all of the fully curated and accessioned versions of the
original studies, in addition to 10 new datasets. There are a number
of changes to the content and format, and we have summarized these in
the newsletter, which is available at

We have provided additional details in our FAQ and Training Resources
pages including an updated tutorial which can be found at

As always, we appreciate your feedback, and if you have any questions
or concerns please contct us at for help and

Best regards,

The DGV development team

The Database of Genomic Variants

Tip of the Week: Prezi and other nonlinear presentation methods

Unable to display content. Adobe Flash is required.

Though not specifically about genomics, today’s tip of the week will hopefully help you find alternative methods for presenting science that might at times work better than a straight linear slidshow presentation. I have a quick intro above to Prezi, but also see Prezi site for Prezi presentations about Prezi use. Prezi (and other similar programs) are not for every presentation, but it does offer an alternative for some times of presentations that would work better in a less linear presentation style than a slide show (wikipedia article). At least it can be a nice change after a slew of slide presentations.. something to grab the audiences’ attention. Of course, like with slide shows you don’t want to get their attention because it’s a bad presentation.

What is a Prezi? (or similar method). Basically, a prezi is somewhere between a slide show and a white board. It’s a canvas that zoomable. That gives it some advantages (and disadvantages) that slide shows don’t have. First, look at some of the best Prezis of 2012 (as decided by Prezi :) to get an idea of what can be done. Here is one on the 2008 election results.

Prezi’s are sharable in the cloud (or private, or semi-private) and you can work in groups to edit them. Also, you can present them online or download a file that allows you to present it without an internet connection. I’ve found that the file has worked for me regardless of OS or computer.

Prezi is free if all you want to do is create, edit and share presentations, but if you want to create private presentations or use your own logo, etc.. it will cost anywhere from 5-13 dollars a month.

There are alternatives to Prezi that are free or open source. I personally like Prezi best, it has the most features and simple to use, but there are some others that are good. My favorite of the alternatives is Impress.js (downloaded software) and the editor Impressionist (can do 3d!). So if you like the idea of a Prezi-like presentation but want a more open alternative, that might be one to try.

Who’s your daddy?

A new article in Slate describes a case of non-paternity unearthed as a result of a 23andme scan.

Who’s Your Daddy?

The perils of personal genomics.


I expect a bit of chatter from the genoscenti. I’ll collect responses below if I see them. I agree that the actual studies of non-paternity show values that are all over the map. But I suspect that there are going to be a lot of people affected by this who didn’t see it coming. And many of those stories will be quiet and private, and won’t be widely known. Some will turn into Jerry Springer, perhaps.

But I know of cases where this has already had serious impact, like the woman who was thrown out of her tribe as a result of her DNA test. This is a very heated topic in some circles: Tribal Enrollment and Genetic Testing Resources.

Interesting times.

All I could think of was this:

Protip: check the genome of your cell line. HeLa cells are “strikingly aberrant”

This is a paper I’ve been waiting for: the analysis of the HeLa genome. I was aware of a lot of issues with the cell lines and missing or duplicated regions from the ENCODE data that was coming along some time ago: Mining the “big data” is…fascinating. And necessary.

People may be familiar with HeLa cells even if they aren’t in biomedical research because of the great book by Rebecca Skloot: The Immortal Life of Henrietta Lacks which explored the history of these cells and the woman whose terrible cancer led to their existence.

But there were many discussions over the years about how different these cells are from actual tissues, and concerns over how representative they are for actual human research issues. Here are some:

So a new paper has been published that explores this–and it’s at the top of my reading list for later today.

Here’s the paper itself: 

Hat tip Ward Plunet via twitter:
RT @WardPlunet: Havoc in biology’s most-used human cell line: Genome of HeLa cells sequenced for the first time .

Update: A piece from one of the paper’s authors:


Landry JJM, Pyl1 PT, Rausch T, Zichner T, Tekkedil MM, Stütz AM, Jauch A, Aiyar RS, Pau G, Delhomme N, Gagneur J, Korbel JO, Huber W, & Steinmetz LM (2013). The Genomic and Transcriptomic Landscape of a HeLa Cell Line G3 : 10.1534/g3.113.005777

“The Revisionaries” and the Texas Textbook Massacre

I wrote about this film when I saw it at a local festival, but I wanted to alert you (well, the US readers) that it’s going to be shown on PBS soon.

It will be on the Independent Lens show. More here, trailer, etc:

Check your local listings here: and set your DVR. You have to see how this played out, and watch out for it in your own community.

Here’s the original trailer:

The Revisionaries Trailer from Naked Edge Films on Vimeo.

Hat tip Scott Johnson on G+:

Rare photo of me in the wild….

Of downtown Boston, at Tufts Medical Center, singing the praises of IMG and the Integrated Microbial Genomes resources.

I love workshops that only require a trip on the Orange Line.

Today we were doing the World Tour of Genomics Resources. Tomorrow it is UCSC Genome Browser (intro + advanced), and Thursday ENCODE. So if you want to workshop vicariously you can check out all of our tutorials on those. The slides, handouts, and exercises are all over there for you to download if you’d like.

As much as I love the online training and webinars and all, you really do get important information about the needs of folks in the room that you just don’t really get from the intertubz, and I do like to do the material live.

Enjoying the 2012 NAR Web Server Issue & a Cup of Coffee

In hunting for something to feature for this week’s tip, I noticed that Nucleic Acids Research had released their 2012 Web Server Issue back in July. As many of you are might be aware, the Nucleic Acids Research journal is a forum where developers can present computational biology papers that describe the development of biologically relevant algorithms, novel usage of existing algorithms, or that report the development of biological databases & their usage. The web server issue is an annual special issue focused specifically on web-based software resources for analysis and visualization of molecular biology data.

This year marks their 10th web server issue & I decided to check it out. In order to devote full attention to the issue, I began by pouring myself a big cup of coffee in one of my favorite mugs, which somehow makes it taste better. Then I set out to enjoy the issue – every year I always begin by reading the opening editorial & then the article on the bioinformatics links directory. The editorial usually explains special emphasis for the issue (this year it is analysis of next-generation sequencing data), and is written by the executive editor of the issue, Gary Benson. For me, the editorial sets the tone of the issue, so to speak.

Next I consume the directory article, along with a couple of sips of my java. What interests me in the article is multifold. First is the discussion of trends that they see in the development of tools and resources, which is important for us here at OpenHelix. Figure 6 provides an interesting look at the categories and counts of resources from each annual issue – I am curious as to why all but one category decline in 2008. Table 1 also provides interesting data on tool trends.

I am also interested in the content of the list itself – it is a great list being developed by people that we have a lot of respect for. I was especially interested in this sentence from their article:

“The Bioinformatics Links Directory has also initiated active curation of its content, removing dead content and correcting content errors, which has resulted in more accurate although occasionally smaller counts for 2012.”

The emphasis is mine in the quote above. In my opinion this is a very important aspect of any list. If you remember, Mary posted on the idea of “Obituaries for bioinformatics tools.” and started a BioStar post to collect this information. The BioStar post generated significant comment & looks like it may have helped inspire the Bioinformatics Links Directory team, from the comments. But it makes sense that you need not just collect information but to continue to maintain and filter that data so that it remains relevant – I mean if the forest is cluttered with dead wood, the useful “live trees” (ok, resources) are obscured from users, right?

The problem is that keeping any list (or documentation or tutorials, etc.) up-to-date is a hard, labor intensive activity. Here at OpenHelix we also keep a list of biology-relevant resources that can be searched through for free, without registering, from our homepage. We currently have a summer intern culling through a list of over 5,000 resources and tools that we know of. She is eliminating duplicate entries in our database by finding and collecting alternative URLs – it is amazing how many resources have multiple entryways, each with their own URL. But different doors don’t make a different resource or utility so we eliminate them form our list. Then we will tackle the dead resources, the listings that just go to a tiny tool internal to a main resource, or to a pre-formatted PubMed search for something.

Creating AND maintaining a high quality list is not a trivial effort. In their paper the Bioinformatics Links Directory team describes remaining current as a “future challenge” and says:

“Although necessary to remain current and to advance the utility of the Bioinformatics Links Directory, these improvements will only prove useful if driven by the community. As a community-driven repository, everyone in the research or bioinformatics community has the opportunity to help make the collection better and more meaningful. “

I truly wish them better luck at “community curation” than many resources have had in the past, & hope they succeed. In our experience it works best with stable, sufficient funding because as they say: “you get what you pay for”.

OK, next post will be on actual resources in the web server issue, I promise! :)

Quick links:

2012 NAR Web Server Issue:

Bioinformatics Links Directory:

OpenHelix Homepage & Search Portal:

Gary Benson (2012). Editorial: NUCLEIC ACIDS RESEARCH ANNUAL WEB SERVER ISSUE IN 2012 Nucleic Acids Research, 40 (W1) DOI: 10.1093/nar/gks607

Michelle D. Brazas, David Yim, Winston Yeung, & B. F. Francis Ouellette (2012). A decade of web server updates at the bioinformatics links directory: 2003–2012 Nucleic Acids Research, 40 (W1) DOI: 10.1093/nar/gks632