One of my favorite sessions at ScienceOnline’09 was the one on Semantic Web in Science moderated by John Wilbanks. In some ways this was the most traditional session at the event, but it worked, since John brings a level of credibility to the subject few can, with this background in science, technology and policy. There was a lot of Q&A related to the Semantic Web in general and discussions around policy. But the meat for me was John’s talk itself. I believe that John has probably done the best job of articulating the role of the public domain for scientific data, and he brought a new twist to it in this talk, at least the first time I’ve heard someone talk about data like this. That was the concept of download, mirror, fork, which makes so much sense, that I am mad that I haven’t thought about it before. Perhaps it’s the success of Github and the Rails/Merb merger that has this top of mind, but the concept of forking is not something data producers and consumers think about. What we are missing are some of the platforms and tools that make this process easier, but it’s clear that we are on our way to really understanding and maximizing the data commons. Recent posts by Mike Driscoll and Paul Miller speak to this path we seem to be going down. The Semantic Web is one core resource, but the concept of download, mirror, fork is what I will take away from this session.
Archives
Lijit Search
Lijit SearchDisclaimer
All opinions on this blog are my own and do not reflect those of my employers, past or present-
![Reblog this post [with Zemanta]](http://img.zemanta.com/reblog_b.png?x-id=620d1b0f-7dd1-495d-ace6-c17d6db29750)



4 Comments
Deepak, it's not clear to me what is meant by “fork” in the context of open data. Do you mean, for example, to take raw microarray data under Scientific Commons, normalize it, run a biclustering algorithm on it, identify the most interesting biclusters, and publish that analysis data under Scientific Commons?
Pretty much. Take data sets, modify them, perhaps update them and then merge them in. Look at it this way. Almost everyone keeps a copy of Refseq, etc, and often with improvements in how they format the data or the kind of metadata they capture, but that never goes back into Refseq
Deepak, it's not clear to me what is meant by “fork” in the context of open data. Do you mean, for example, to take raw microarray data under Scientific Commons, normalize it, run a biclustering algorithm on it, identify the most interesting biclusters, and publish that analysis data under Scientific Commons?
Pretty much. Take data sets, modify them, perhaps update them and then merge them in. Look at it this way. Almost everyone keeps a copy of Refseq, etc, and often with improvements in how they format the data or the kind of metadata they capture, but that never goes back into Refseq