Recently there was much buzz in the #bioinformatics twittersphere over this blog post by Sean Eddy: The next five years of computational genomics at NHGRI
It is a very nice post about some exciting prospects for the future. The idea of planning “explicitly for sustainable exponential growth” is wise. There will be no abatement of the flow of data at this point–it’s no longer a big bolus of one species data, or one type of project. The taps are wide open now, and we just keep adding more taps.
I also love the idea of “democratization“. In part, it includes:
….To enable individual investigators to make effective use of large datasets, we must create an effective infrastructure of data, hardware, and software. NHGRI has extensive experience in big data, and can lead and catalyze across the NIH….
Now, I know this is a snippet of some thoughts–there may be more to it in the actual planning meetings on this. But it pushed my buttons because it sounds a lot like what we always hear about big data projects: build it and they will come.
It got a little better in another segment:
Spur better software development. Traditional academia and funding mechanisms do not reward the development of robust, well-documented research software; at the same time, the history of commercial software viability in a narrow, rapidly-moving research area like computational genomics is not at all encouraging….
Well-documented research software. Sigh. We probably read more documentation than most people. And even the good documentation can be brutal. Dated. And not particularly effective. But still–if nothing else, please reward time spent on documentation….
But what is missing for me from this–and not just this, but most of these big data types of projects–is a real commitment to outreach and support for end users. Formal, organized, supported, rewarded, outreach. Sometimes there is a place to write to with questions. But we probably send in more questions to projects than most people too–and the success rate for answers varies widely. But even when we get good answers–that’s not enough.
I know funding is hard. We can’t fund everything. Databases and software project have to struggle to even persist. Curation is frequently not valued enough. And often curators are expected to do outreach as just one of their tasks…which pushes outreach even further down the priority list. But without dedicated outreach–formal, quality, active outreach–databases and software projects won’t have so many users, and not many effective users. Which will make funding agencies wonder if they should keep supporting them. Which…well, you can see where this spiral goes….
What bugs me, I guess, is essentially this: Nobody speaks for the end users. There’s really no one in these types of meeting that really speaks for the consumers of this software and this data. I mean people who aren’t directly attached to the data production and management. The project teams think they are thinking about the users. They really want users. But ur not doin’ it rite.
I would like to see outreach and end user support valued, required, and really done right. No matter how much hardware and documentation you throw at these projects, if people 1) don’t know it exists, and 2) have no idea how to use it, the project will not yield all the results that it could. A marker paper is nice. But it’s not sufficient, folks. And it’s nice to have the high-end team members talk at conferences. But that reaches only a tiny subset of the users or potential users. And another thing about that: a lot of times people are hesitant to ask what sound like naive questions to the high-end representatives of these projects. I’m jes’ sayin.
Yes, this is fairly self-serving for me to say. But we see the users when we do outreach. They crave it. They love it. We’ve been lucky to be a part of some great projects that do outreach right. We have seen it work. It should be Standard Operating Procedure on software and database projects. Not an afterthought.