Web as platform: Bret Taylor on Open Data
April 9, 2008
I think my bullishness for FriendFeed just went up a notch after reading Bret Taylor’s blog. For those who don’t know Bret is one of the co-founders of FriendFeed and an ex-Googler. The other day he started his first blog, and guess how he did so. His last project at Google was App Engine and his first project after App Engine got released was to develop a blog platform deployed there (hence the appspot.com address). Anyway, apart from being impressed by his coding skills and experience, I was equally intrigued by his latest post; We need a Wikipedia for data. In it, Bret writes (all emphasis mine)
I think all of these barriers to data are holding back innovation at a scale that few people realize. The most important part of an environment that encourages innovation is low barriers to entry. The moment a contract and lawyers are involved, you inherently restrict the set of people who can work on a problem to well-funded companies with a profitable product. Likewise, companies that sell data have to protect their investments, so permitted uses for the data are almost always explicitly enumerated in contracts. The entire system is designed to restrict the data to be used in product categories that already exist.
He continues
The interesting thing is, almost every internet company would benefit if this data were freely available. Most internet companies have embraced open source operating systems because every company needs an operating system, and no company wants their OS to be a competitive advantage - they just want it to work. I would argue we are all in the same boat with these factual data sources. No one really wants factual data accuracy and completeness to be their competitive advantage; we all want the best data possible to build the best products possible, and discrepancies in data quality are artifacts of the extremely inefficient economy of buying and selling data we currently live in. If everyone had the same, high quality data, all of our products would be better for it.
I could end this post here and say “I rest my case”, but there is one area I differ. He argues that we should have a Wikipedia for data, a global database of data sources that anyone can use. I disagree. I believe that we should have a web of data, to be precise linked data, with each data point and data set an addressable resource. In the comments some, including myself, mention Freebase and dbpedia. There is also Swivel where you can upload datasets. Fellow scifoo Aaron Swartz has theinfo.org, a resource for really large datasets. That’s all fine, but do we really want a centralized repository of data? Shouldn’t genomic data stay in GenBank and structural data stay at the PDB. If instead we just made our data open, and in formats that can be slurped up or used for the kinds of innovations that Bret talks about, that would be the ideal situation. Then we could use the data our way, act upon it, apply algorithms, etc.
What do you think? Do we need a Wikipedia of data? Or do you think that the web itself should be our open data commons?
Further Reading
Using data for better results
Data should be set free
The value of information
Image via Wikipedia
Technorati Tags: Open Data, Linked Data, Data Commons



