We train on publicly available databases and resources. For our purposes on deciding when to develop training, the definition is relatively straightforward: Can the academic researcher access the data without cost or license restriction? If the answer is yes, our next step is to determine if we can develop training materials based on the resource without cost or license restriction and to ask the providers specifically for permission to do so. We ask permission for several reasons: let the developer know what we are doing, verify the restrictions or lack there of, build good relationships, etc.
There is an attempt to offer a definition of “open” using the Creative Commons license.
Science Commons has a FAQ on how a database can apply a Creative Commons license, giving databases and biological resources a single, standardized, easier understood definition of publicly available. Even that can be a bit complicated as Ethan Zuckerman of “My Heart’s in Accra” states:
…a wonderfully complex FAQ on applying Creative Commons licenses to databases – the first question read “Can a Creative Commons license be applied to a database?” After a six paragraph answer to that question, the third question read, “So, a Creative Commons license can be applied to a database?”
…the complexities of asking scientists to release their data under Creative Commons licenses was so severe that Science Commons has ended up advocating for data to be released public domain, under the auspices of their protocol, instead.
Science Commons has found that opening the data is not quite that simple and the criteria across databases and resources can be quite different. Their goal is to make it simpler and thus more open.
Melanie Dulong de Rosnay has been doing research in this very area: “how open is the data”. Her work to date can be seen in this Nature Precedings article, outlining which databases fit the criteria of technically and legally open access by determining the following :
The website provides a file transfer protocol or a link to download the whole dataset without registration. The ability to download the whole dataset without registration constitute the double requirement to be considered as technically accessible.
2. TECHNICAL RESTRICTION: the database can be accessed only through registration, batch or query-based system. Technical accessibility is not achieved.
The list does have some other issues*, but for the most part it is a great start.
Even if Ensembl, GeneID and GOA are incorrectly eliminated as databases that fulfill the protocol (and the jury is out on this, I could very well be reading this incorrectly), this is a great start. I am hopeful that it will lead to more standardization for database openness and get database developers thinking along those lines.
The researchers have their work cut out for them in building and maintaining this list. There are over 2,000 publicly available databases. These are changing constantly. The databases themselves change, new ones are born and often fade away. Even if they only do a small percentage of those databases available, it is starting the discussion. Already I’ve been looking at several databases like UCSC Genome Browser to see if they fit the Science Commons Protocol (yes, in my estimation :).
**HUGE hat tip to Bora and Blog around the Clock for pointing this all out.