Fork me on GitHub

Data to rule them all

The current meme seems to be data to rule them all

A bunch of blog posts out there discussing the importance of data, all pretty much screaming out for the importance of open data. First a couple of posts on Nodalities. In A data-centric view, Zach Beauvais talks about a rather fascinating blog post over at Flickr, where they have used the vast amount of geotagged data to generate mostly accurate contours for various places. The part that Zach homes in on the fact that the folks at Flickr didn’t go around planning this project. It sort of fell out from the data available to them. Rather “they re-used data they were already capturing, and brought out something very interesting indeed.”

This is why open data in science is so important. Not everyone gets the same ideas when presented with the same information. We are all capturing data. Why not let others think about that data and do something creative with it. As Zach says

The good stuff is where the data are

In another post on network effects, Justin Leavesley talks about direct and indirect network effects, especially in the context of cloud computing and Platform-as-a-Service. That’s not the part that I want to talk about though. The interesting bit was this line

The data is where the users are, the software is where the developers are

It’s a somewhat obvious statement, but it also talks about what users care about, and why people developing software need to think about the data and realize that our users, scientists of all types, like to look at the data in ways that allows them to look at it from their perspective. They don’t want to spend time generating the data necessarily, or using complex tools to come up with the stats or dynamics trajectories, or predicted structures, or what have you, but rather, they need to ability to make those results their own in a manner that allows them to do what they do best. Come up with what the data are telling them and how to plan the next experiment (whether in a wet lab or in silico).

All this brings us to Cameron’s post on data, code and the well-posed question. Cameron wonders if the following is appropriate

A well posed question is one which, given an appropriate dataset, can be answered by easily prepared and comprehensible code

Is it? I think data speaks. Looking at data is a starting point to come up with all kinds of questions Perhaps what Cameron means is that when we understand our data, we can pose appropriate questions, which can then be expressed as beautiful code. In that sense, the code expresses the science.

This is hardly the end of this thought process, but it only speaks to this fascination confluence of theory and data and how they feed each other. Watch this space (and many others)

Reblog this post [with Zemanta]

This entry was posted in Informatics, Infotech, Open Science, Programming. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

blog comments powered by Disqus
  • Archives

  • Disclaimer

    All opinions on this blog are my own and do not reflect those of my employers, past or present