The current meme seems to be data to rule them all
A bunch of blog posts out there discussing the importance of data, all pretty much screaming out for the importance of open data. First a couple of posts on Nodalities. In A data-centric view, Zach Beauvais talks about a rather fascinating blog post over at Flickr, where they have used the vast amount of geotagged data to generate mostly accurate contours for various places. The part that Zach homes in on the fact that the folks at Flickr didn’t go around planning this project. It sort of fell out from the data available to them. Rather “they re-used data they were already capturing, and brought out something very interesting indeed.”
This is why open data in science is so important. Not everyone gets the same ideas when presented with the same information. We are all capturing data. Why not let others think about that data and do something creative with it. As Zach says
The good stuff is where the data are
In another post on network effects, Justin Leavesley talks about direct and indirect network effects, especially in the context of cloud computing and Platform-as-a-Service. That’s not the part that I want to talk about though. The interesting bit was this line
The data is where the users are, the software is where the developers are
It’s a somewhat obvious statement, but it also talks about what users care about, and why people developing software need to think about the data and realize that our users, scientists of all types, like to look at the data in ways that allows them to look at it from their perspective. They don’t want to spend time generating the data necessarily, or using complex tools to come up with the stats or dynamics trajectories, or predicted structures, or what have you, but rather, they need to ability to make those results their own in a manner that allows them to do what they do best. Come up with what the data are telling them and how to plan the next experiment (whether in a wet lab or in silico).
A well posed question is one which, given an appropriate dataset, can be answered by easily prepared and comprehensible code
Is it? I think data speaks. Looking at data is a starting point to come up with all kinds of questions Perhaps what Cameron means is that when we understand our data, we can pose appropriate questions, which can then be expressed as beautiful code. In that sense, the code expresses the science.
This is hardly the end of this thought process, but it only speaks to this fascination confluence of theory and data and how they feed each other. Watch this space (and many others)
Data to rule them all
The current meme seems to be data to rule them all
A bunch of blog posts out there discussing the importance of data, all pretty much screaming out for the importance of open data. First a couple of posts on Nodalities. In A data-centric view, Zach Beauvais talks about a rather fascinating blog post over at Flickr, where they have used the vast amount of geotagged data to generate mostly accurate contours for various places. The part that Zach homes in on the fact that the folks at Flickr didn’t go around planning this project. It sort of fell out from the data available to them. Rather “they re-used data they were already capturing, and brought out something very interesting indeed.”
This is why open data in science is so important. Not everyone gets the same ideas when presented with the same information. We are all capturing data. Why not let others think about that data and do something creative with it. As Zach says
In another post on network effects, Justin Leavesley talks about direct and indirect network effects, especially in the context of cloud computing and Platform-as-a-Service. That’s not the part that I want to talk about though. The interesting bit was this line
It’s a somewhat obvious statement, but it also talks about what users care about, and why people developing software need to think about the data and realize that our users, scientists of all types, like to look at the data in ways that allows them to look at it from their perspective. They don’t want to spend time generating the data necessarily, or using complex tools to come up with the stats or dynamics trajectories, or predicted structures, or what have you, but rather, they need to ability to make those results their own in a manner that allows them to do what they do best. Come up with what the data are telling them and how to plan the next experiment (whether in a wet lab or in silico).
All this brings us to Cameron’s post on data, code and the well-posed question. Cameron wonders if the following is appropriate
Is it? I think data speaks. Looking at data is a starting point to come up with all kinds of questions Perhaps what Cameron means is that when we understand our data, we can pose appropriate questions, which can then be expressed as beautiful code. In that sense, the code expresses the science.
This is hardly the end of this thought process, but it only speaks to this fascination confluence of theory and data and how they feed each other. Watch this space (and many others)
Related articles by Zemanta