Fork me on GitHub

download, mirror, fork

John WilbanksOne of my favorite sessions at ScienceOnline’09 was the one on Semantic Web in Science moderated by John Wilbanks. In some ways this was the most traditional session at the event, but it worked, since John brings a level of credibility to the subject few can, with this background in science, technology and policy. There was a lot of Q&A related to the Semantic Web in general and discussions around policy. But the meat for me was John’s talk itself. I believe that John has probably done the best job of articulating the role of the public domain for scientific data, and he brought a new twist to it in this talk, at least the first time I’ve heard someone talk about data like this. That was the concept of download, mirror, fork, which makes so much sense, that I am mad that I haven’t thought about it before. Perhaps it’s the success of Github and the Rails/Merb merger that has this top of mind, but the concept of forking is not something data producers and consumers think about. What we are missing are some of the platforms and tools that make this process easier, but it’s clear that we are on our way to really understanding and maximizing the data commons. Recent posts by Mike Driscoll and Paul Miller speak to this path we seem to be going down. The Semantic Web is one core resource, but the concept of download, mirror, fork is what I will take away from this session.

Semweb Scionline09
View more presentations or upload your own. (tags: sci09 semantic web)
Reblog this post [with Zemanta]

This entry was posted in Big Data, Event, Informatics, Semantic Web. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

4 Comments

  1. Posted January 20, 2009 at 02:45 | Permalink

    Deepak, it's not clear to me what is meant by “fork” in the context of open data. Do you mean, for example, to take raw microarray data under Scientific Commons, normalize it, run a biclustering algorithm on it, identify the most interesting biclusters, and publish that analysis data under Scientific Commons?

  2. Posted January 20, 2009 at 04:00 | Permalink

    Pretty much. Take data sets, modify them, perhaps update them and then merge them in. Look at it this way. Almost everyone keeps a copy of Refseq, etc, and often with improvements in how they format the data or the kind of metadata they capture, but that never goes back into Refseq

  3. Posted January 20, 2009 at 07:45 | Permalink

    Deepak, it's not clear to me what is meant by “fork” in the context of open data. Do you mean, for example, to take raw microarray data under Scientific Commons, normalize it, run a biclustering algorithm on it, identify the most interesting biclusters, and publish that analysis data under Scientific Commons?

  4. Posted January 20, 2009 at 09:00 | Permalink

    Pretty much. Take data sets, modify them, perhaps update them and then merge them in. Look at it this way. Almost everyone keeps a copy of Refseq, etc, and often with improvements in how they format the data or the kind of metadata they capture, but that never goes back into Refseq

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

blog comments powered by Disqus
  • Archives

  • Disclaimer

    All opinions on this blog are my own and do not reflect those of my employers, past or present