Recently, I ran into a situation that demonstrated, at least to me, how far we have to go in opening up scientific data. I won’t give any specifics, but it boiled down to this. When asked if a particular, rather interesting data set could be opened up to the public so that they could try and probe it with novel algorithms, perhaps even help clean up the raw data, the project PI essentially said that a lot of money has been spent on the project, money awarded to him, so why should the data be made available (eventually they will need to make the data publicly available as they are taxpayer funded). That really got me thinking. The PI has a point in that there are certain discoveries that they would like to make as this is a result of their hard work, and it’s fine that they get that chance. On the other hand, there is so much data here, and for the good of science, it would be beneficial to have the data available to the public where others can probe parts of it that might not be of interest to the group with the grant to collect all the data.
And here is where we either need work, but perhaps more importantly a change in outlook. I don’t think that most people are out to steal someone’s ideas (unfortunately the reward system does encourage scientific exclusivity as it were), but there need to be mechanisms in place that help with attribution which in the end is what the PI cared about most. Attribution not just to one person necessarily, but perhaps multiple people (data producers, algo developers, etc). But the data needs to be out there. Of that I am convinced. The PI was concerned that making the data available to all would be mostly a meaningless exercise since his people would have to work to do that, but no useful work would be done on the data as it was not easy. I would argue that while that is always a possibility, there is enough evidence that more eyeballs, especially from those genuinely interested in solving problems only helps in the long run.
The situation is not black or white, especially in this case, since the data collection is non-trivial, but it provides an example of a situation where if the community comes together to come up with informal (or perhaps semi-formal) guidelines and practices around data sharing and attribution, perhaps in the framework of Creative Commons or some other mechanism, then we can actually become more comfortable moving forward. The encouraging sign was that this reticence to share data seemed to be generational. Others in similar projects seemed to be far more willing, and in fact rather eager, to share data.
http://www.opendatacommons.org/ are leading the charge in providing the legal tools to open data. These tools are specific for databases, which have some special protection in, e.g. EU law.
Interesting post! I can see both sides, but I believe that information wants to be free. I would be curious if the funding agency is ok with the data not being released especially if it is based on publishing to help the scientific community at large. The sad part is that if the data is truly helpful then more time and money will have to be spent generating the data again.
Resisting openness
Recently, I ran into a situation that demonstrated, at least to me, how far we have to go in opening up scientific data. I won’t give any specifics, but it boiled down to this. When asked if a particular, rather interesting data set could be opened up to the public so that they could try and probe it with novel algorithms, perhaps even help clean up the raw data, the project PI essentially said that a lot of money has been spent on the project, money awarded to him, so why should the data be made available (eventually they will need to make the data publicly available as they are taxpayer funded). That really got me thinking. The PI has a point in that there are certain discoveries that they would like to make as this is a result of their hard work, and it’s fine that they get that chance. On the other hand, there is so much data here, and for the good of science, it would be beneficial to have the data available to the public where others can probe parts of it that might not be of interest to the group with the grant to collect all the data.
And here is where we either need work, but perhaps more importantly a change in outlook. I don’t think that most people are out to steal someone’s ideas (unfortunately the reward system does encourage scientific exclusivity as it were), but there need to be mechanisms in place that help with attribution which in the end is what the PI cared about most. Attribution not just to one person necessarily, but perhaps multiple people (data producers, algo developers, etc). But the data needs to be out there. Of that I am convinced. The PI was concerned that making the data available to all would be mostly a meaningless exercise since his people would have to work to do that, but no useful work would be done on the data as it was not easy. I would argue that while that is always a possibility, there is enough evidence that more eyeballs, especially from those genuinely interested in solving problems only helps in the long run.
The situation is not black or white, especially in this case, since the data collection is non-trivial, but it provides an example of a situation where if the community comes together to come up with informal (or perhaps semi-formal) guidelines and practices around data sharing and attribution, perhaps in the framework of Creative Commons or some other mechanism, then we can actually become more comfortable moving forward. The encouraging sign was that this reticence to share data seemed to be generational. Others in similar projects seemed to be far more willing, and in fact rather eager, to share data.
Related articles by Zemanta