There is a great discussion on Big Data today that I found on the twittosphere. Hat tip to Paul Blaser on the tweet that got my attention. I have posted a comment over there, but I decided as I was writing it that I wanted to bring it over here as well. (I also added some links here that I couldn’t add over there since without preview I hate to not be able to test them.)
I suspect that we do actually agree on much of the concept. But like a lot of things, I think more downstream about the implementation of the topic on the ground. And my thoughts on that are below, which I posted as a comment over there.
Hmmm…I certainly agree with large chunks of this. But I don’t agree that this should be the domain of some kind of data scientist. Or–more specifically–it does need to have their hands to some point. But I think it still needs to be accessible to the handful-of-genes bench biologists.
The idea of the multi-functional team is terrific, when it is possible. But we see a lot of people who are not getting that kind of support from their local “bioinformatics” club–for a couple of reasons: if you have some big-data folks on site, they have their own project to worry about. They are not eager to hand-hold others on the way in to the data. It’s not their job. It’s not what they are supported to do, and it doesn’t help them with their next grant.
If you have some kind of dedicated bioinformatics core support, the quality of the support varies widely: the kinds of things they do, the skills they have, the interest in actual support.
We have seen some great examples. For example, it seems to me the team at CHOP in Philly provides this kind of support: in house tools to support the researchers, bringing in the right tools to add more support, training everyone up to some level so they are at least aware of what the tools can do. (Samples of CHOP tools, team, and training.)
On the other hand, we’ve been to some major institutions–many with “big data” projects, who are getting next to zero interaction with anyone who could help them. You’d be stunned if I told you who these people are.
Then there are those who don’t even have a shot at this. People trying to keep up, and write new grants with hot new data, that are in some mid-western campus that really just doesn’t even have someone to ask. I talked to one woman once that needed a really simple thing out of the UCSC Genome Browser. It took me roughly 5 minutes to build the right query, pull the data out of the table browser, and hand it to her. I thought she was going to kiss me. She told me she had expected that to take her 6 months of benchwork.
I would hate to see this strategy create a tier of biologists who are nearly locked out the data. Because it is also still imminently clear that we can throw a lot of big data at project, but the crucial details require the “small people” to look closely at them. And many of them feel excluded from the club already.