The web as platform: A science Data Commons
May 24, 2008
Cameron Neylon has a wonderful post on how we can build a Data Commons for the sciences. Cameron brings together two intricately interwoven concepts. The Data Commons and the tools required to record and process all this scientific information. To a degree its not too far away from the WWW, where we have simple protocols connecting pieces and tools (e.g. search engines) that bring all this together. For an open data web, the Semantic Web takes on a level of importance that most people don’t appreciate, but that’s not what this post is about.
Cameron proposes a model in his post. As Cameron notes, repositories already exist for most data types and the majority are open. Where the Google’s and Amazon’s can jump in is to enable these repositories, especially with next-gen sequencing and other data types pushing the scientific communities knowledge and capabilities. Very rightly though he pushes the idea of long tail science, i.e. not repositories for structures, etc, but all the information we are streaming out of our labs. What will be the infrastructure that will handle these days. The problem, as Cameron notes, is data capture and perhaps most important, data re-use, for which capturing the associated metadata is critical, and having tools that allow you to consume the data are even more critical.
There are a lot more details in the post. My preference would be that these are driven by need and intention rather than by formal committees. The internet provides protocol standards, the Semantic Web stack is essentially complete. In various scientific domains we have efforts on data formats and standards. As we start playing around with the data, the ones that resonate will bubble to the top. The key is to make sure that we as a community come together to realize that this needs to be done. The technology will follow.
Image via Wikipedia
Technorati Tags: Open Data, Data Commons, Scifoo, Cameron Neylon



