Tip of the Week: File Format conversion

fileformat_thumbMany of us have worked with DNA and protein sequences of course in several different formats: GenBank and FASTA to name two broadly used ones, but there are many others. Different tools and databases will often require different formats. More often than not, converting from one to the other format isn’t too much of a problem, the database will do it for you, or there will be some help documentation. But this isn’t always the case. There are several ways you can convert formats, for example Galaxy has some limited ability to do this and some databases allow you to export sequence in one of several formats, but often you’ll need a bit more help. ReadSeq is a publicly available software package (downloadable from that link) that will do just that. You could download or install, but EBI also has a web interface for ReadSeq (as do some other services). Today’s tip is for those of you somewhat new to sequence formats (or even some of us who aren’t) and need a quick web interface to converting formats.

Treasure Hunts

Thought I’d recommend a little fun treasure hunt using GenBank. It’s a fun (if you are a biology and database geek like me) project that will hone some skills using GenBank, introduce you to a nice tool called ‘Blink” and maybe find some interesting anomalies. It’s all outlined (in several blog posts) by Sandra at her blog “Discovering Biology in a Digital World” (great blog btw) in a post entitled “A general method and good student project for finding interesting anomalies in GenBank.” I’m having fun with it, I’ll tell you (and her) if I find something.

Speaking of finding things, here are some interesting things I’ve come across randomly lately.

I came across this interesting book the other day. Isn’t ‘databases’ or ‘genomics’ per se (ok, so not at all really), but it’s a look at what geological footprint humans and human civilization will have on the Earth some 100 million years from now. What some alien traveler might find from this “Anthropocene” period we are in. Read more about it here and listen to the interview.

PLoS Computational Biology has a paper introducing a new database you might want to check out: mouseNet . As stated at the database site, it is a “functional network for laboratory mouse based on integration of diverse genetic and genomic data…to… predict novel functional assignments and network components.”

Wikification of Genbank

Speaking of Genbank’s 25th, a few weeks ago Science had a news piece “Proposal to ‘Wikify’ Genbank Meets Stiff Resistance.” Apparently, those in the Mycology research community have found many inaccuracies in the Genbank records and wish to see a change that would allow annotations to be made by the community:

a scheme like those used in herbaria and museums, where specimens often have multiple annotations: listing original and new entries side by side. It would be a community operation, like Wikipedia, in which the users themselves update and add information, but not anonymously.

But the idea is meeting resistance from Genbank’s Managers:

GenBank's 25th Anniversary (Highlights)

I know the liveblogging is hard to read–it is really a reflection of my notes as the talks were progressing. I’m going to clean them up a bit, but mostly I’m going to leave them in case people need a quick summary of what was discussed. The videocasts for the talks are still going to be available, but they are unfortunately in giant many-hour chunks with no guidance as to what is in there exactly, or when you might try to find them.

So I’m going to highlight a few things here that I found especially interesting (and indicate where in the videocast you might find it). Of course, you may have other areas of interest and find other things you prefer. Feel free to watch all of them! Details of my choices below, and the approximate time on the video. You can move the slider to get to that approximate time point. Continue reading

Liveblogging the GenBank 25th Anniversary II

I’m preparing to liveblog this event again today, internets permitting:

GenBank: Celebrating 25 years of Service at NCBI: http://www.tech-res.com/GenBank25/ official announcement.

The agenda is here: http://www.tech-res.com/GenBank25/agenda.html

View event:

You will be able to view the event at http://videocast.nih.gov when the event is live.
View event:

You will be able to view the event at http://videocast.nih.gov when the event is live.

Will try to update as often as I can, if I have decent wireless and power.

Session Chair: Steven Salzberg.

Liveblogging the GenBank 25th Anniversary

I’m preparing to liveblog this event, internets permitting:

GenBank: Celebrating 25 years of Service at NCBI: http://www.tech-res.com/GenBank25/ official announcement.

The agenda is here: http://www.tech-res.com/GenBank25/agenda.html

Not being a married person, I didn’t know which one this was. I had to look it up. This is Silver. I can’t think of a decent gift, so I’m not bringing one. Maybe they are registered somewhere??

There is a link to a videocast of the event from the Celebration link, supposedly:

View event: You will be able to view the event at http://videocast.nih.gov when the event is live.
Air date: Monday, April 07, 2008, 9:00:00 AM

Will try to update as often as I can, if I have decent wireless and power.

Welcome remarks

Michael Gottesman: GenBank one of the major accomplishments of the NIH. Major reasons for success: 1. timely, visionary idea. Already a protein seq database (Dayhoff), need for nucleotides as well. 2. International cooperation from the beginning. Support from other US organisations as well. Stable foundation at NIH has been important. 3. Contributions of researchers providing the data has been a third key. 4. Technology improvements in sequening and comparison algorithms. 5. Move from contract basis to NCBI/NLM provided stable support.

Demise of the NCBI Field Guide

For funding reasons, NCBI (home of PubMed, BLAST, dbSNP, OMIM and more) has cut their outreach staff, canceled all onsite training seminars and this has to mean decreased support for online help, documentation and tutorials.

When we wrote our NIH grant, one of the models of success in the bioinformatics training area that we highlighted was the NCBI Field Guide program. For those who may be unfamiliar with it, it is a set of training modules delivered by the outreach team at NCBI. They would come to your site, cover many NCBI tools and do hands-on workshops. Another course (Enhanced Field Guide) drew science librarians and other trainers together to train them, and those folks could go back to their institutions and offer more-and-better searches and training for their constituents. We thought the Guides are a terrific group of people who were interested in people getting their hands on the myriad tools at NCBI and using them effectively. It wasn’t really a competitive situation—their remit was only for NCBI tools, and there were plenty of others out there for us to do. In fact, many people who contacted us for training did so because their local users enjoyed the NCBI training and they wanted similar engagements for other tools.

Recently, though, the calls changed. We found we were getting calls from people who said they weren’t going to be able to get any more Field Guide trainings. NCBI is discontinuing the outreach program. Quite frankly, we were surprised. A sample of the notifications people were getting: http://www.library.uiuc.edu/blog/bicnews/archives/2008/02/ncbi_field_cour.html

Unfortunately, that tremendous training opportunity will NOT occur. Yesterday NCBI Field Guide coordinator, Peter Cooper, sent the following email:

Because of budgetary constraints, NCBI has made reductions in some of its programs, and the education programs are affected. In fact, all outreach education programs (Field Guide, Mini-courses, Structures, PubChem) are terminated effective immediately. At this point we cannot reschedule this course or accept requests for future courses of any kind. This was as much a surprise to me as it is to you. Feel free to contact me if you have questions.

The Field Course, as well as the Mini-Courses and the Structure course, has been tremendously popular and useful (see list of sites where the Field Course has been offered recently), but the NCBI budget situation will not allow NCBI to continue to travel and offer these courses for the foreseeable future.

(emphasis mine)

Here’s a link to a similar letter at another location: http://www.twu.edu/as/bio/NCBI/FieldGuide/

We’ve confirmed this with a number of people directly involved; they have laid off nearly all of the outreach team. Some got reassigned. There can hardly be anyone there to even answer emails to the helpdesk anymore—and they get lots of emails every day.

I’ve been through layoffs before, a few times. It actually feels like a punch to the gut when I hear about it anywhere else—especially among people I know. I expect layoffs at companies, though. But if there was any group that was solidly in place, going to be around for a long time, I would have thought it was the NCBI outreach team. I’m quite sorry to hear that it has been dissolved.

In this time of so many resources & so much need for increased understanding, outreach has become an intregal part of a resource’s success – fewer instructional resources is an unfortunate consequence of decreased funding for science.

How well do we know our genes?

Gene Characterization IndexDo you have some favorite genes? Well, of course you do–you are probably a researcher who has in the past worked on some specific genes, or you are interested in groups of genes or genomic regions. Or maybe classes of genes. There is a new resource that provides you with a score of how well a given protein coding gene is annotated, and possibly therefore understood. The GCI, or Gene Characterization Index, can tell you. http://cisreg.ca/gci/

I love the idea of this project. The team wanted to look at the gene space and understand how well we knew the human genes. They looked at the growth of our knowledge over time, too–which provides an interesting view of our progress–as shown in this figure from their web site. And they wanted to identify the darkness–where don’t we know enough? Where are some great genes to examine that we can learn some really new things?
That’s the kind of project I wanted to do when I was still in academia. I thought you could build a whole lab and crank out students who get assigned an unknown gene, and it is their job over the next few years to analyze and understand the gene. It would be unbiased by a disease area vision, or by the lab director’s preconceptions of what the gene might do. They could try all sorts of techniques to get there. It is probably also entirely unfundable by grant agencies. Alas.

Breaking: Celebrate 25 years of GenBank

Just got this notice from the MGI mailing list:

The National Center for Biotechnology Information (NCBI) at the National Library of Medicine announces a two-day scientific meeting to celebrate the first 25 years of service by GenBank, the NIH nucleotide sequence database. We invite you to attend. The meeting will be held on April 7-8, 2008 in the Natcher Auditorium on the NIH campus in Bethesda, MD. Among the distinguished list of speakers are Rich Roberts, Sydney Brenner, Francis Collins, and Craig Venter.

Although registration is free, seating is limited. Early registration is recommended at: http://www.tech-res.com/GenBank25/

Looks very cool. Lots of people who have had impact on the work we do, and will continue to have impact.