I have repeatedly talked about the value of data and the need to keep it open. While the analogy to bioinformatics is not direct, recent discussions about OpenSocial are stressing the need to set the data free.
As Tim notes … “we want applications that can use data from multiple social networks”
Similarly, in the life sciences, we want to access information across the spectrum of data sources (data networks?). Not a different app on each network or for each interpretation or data “standard”
We should keep the data free. We should have the capability to use data from one place and re-use it elsewhere. In the pharma industry, that could be within different parts of the same organization, especially in globally distributed ones. Tim also talks about “small pieces loosely joined”. Another perfect use case for the power being in the hands of those who learn how to harness the data, but not lock it in.
Just some of the concepts that I cover in the oft embedded slideshow on “searching science”.
Further reading:
TechCrunch
Technorati Tags: Open Data, Tim O’Reilly




3 Trackbacks
[...] In the article, Jim goes on to focus on two concepts, deep web mining and web 2.0, both subjects of much interest in these parts. I’ve referred to deep search in the past, and I completely agree that deep search has to be part of the biopharma intelligence effort, both internally and externally. That said, I think there is a lot of information that can be captured by mining the second part of the two concepts, what Jim calls web 2.0, and what I like calling the World Wide Web, and what will one day become known as the Giant Global Graph. Snarky comments notwithstanding, any pharma company not mining the world’s blogs, social bookmarking sites, etc is doing itself a disservice. Services like Lijit are part of my toolkit as a strategist for a reason. Combined with various mashup tools, one doesn’t have to invest in too much enterprise IT to get tons of information. Of course, if we can combing mining the edges of the web with powerful, smart querying, entity extraction and analytics, then we can be that much smarter about the information we get and the knowledge we derive. [...]
[...] Update: A post by Kaitlin Thaney on the Science Commons blog confirms that Nature chose a cc-by-nc-sa license and suggests that Nature should have gone for an Attribution license. I talk back what I said earlier about cc-by-nc-sa making sense. It doesn’t and goes against my own mantra of the value lies in the interpretation of the data and not the data itself. One thought (as a content publisher), and a question for people like Peter Suber, Peter Murray-Rust, Kaitlin Thaney and Bill Hooker. Should authors in journals be allowed to choose the license under which the work is published? Why or why not? [...]
[...] I have often talking about the value of data coming from what we do with it, rather than the raw data itself. It would appear that Google sees the incredibly useful 1-800-GOOG-411 service as a resource for gathering data for better speech recognition for improving speech-to-text solutions (which most of us pretty much assumed anyway, so I don’t understand the paranoia). [...]