Fork me on GitHub

Matt’s manifesto for a science data platform

There are a select few people whose every word I try and absorb and chew on because I have great respect for their thinking and intelligence. Matt Wood is one of those people, and today he decided to tweet a manifesto. The whole series started with

I’m starting a manifesto. There are no technical, political or funding reasons why an open data platform for science couldn’t excel

He then followed that up with five tweets (Matt’s Twitter stream). I don’t know if that’s the entire manifesto, but I reproduce those tweets below, a series entitled Towards a science data platform

  • Easy, flexible retrieval and reuse above all else
  • A laser sharp focus on scientific productivity and progress
  • Scalability and speed are not mutually exclusive
  • Well designed, high quality programming interfaces are a prerequisite
  • Be effortless to do the right thing: provenance capture, reproducibility, portability

This is definitely the developer view, and the one I can relate to. Just earlier today, I was thinking about the lack of innovation in scientific software, not necessarily on the algorithm side, but on the pure software and platform side, with a limited number of platforms out there. This happened during a search for the number of cool tools to launch and manage clusters of EC2 instances (and other computational resources), and speaks to Matt’s fourth point (well, designed interfaces). The part I agree with most, there are no reasons anymore not to have a scientific data platform, just excuses.

I hope Matt starts a website that will add to this train of thought. Would love to participate.

This entry was posted in Big Data, Bytes. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

4 Comments

  1. Posted October 29, 2009 at 04:22 | Permalink

    One reason for the lack of innovation in scientific software is the lack of software development skills among scientists: a small minority are as good as anyone in the world, but as we found last year [1,2], the vast majority are too busy learning and doing science to pick these skills up on their own, much less to create the “well designed, high quality programming interfaces” that Matt feels we need. I think this is the biggest roadblock to wider adoption of cloud computing and anything with “peta-scale” in its name.

    [1] http://www.cs.utoronto.ca/~gvwilson/articles/am...
    [2] http://www.cs.utoronto.ca/~gvwilson/articles/ho...

  2. jamcquay
    Posted October 30, 2009 at 06:28 | Permalink

    There is a lot of working going on in this area at the moment. I'm currently a member of a working group at NIST developing an new data format for analytical data – AnIML (Analytical Information Markup Language).

    One piece of advice when it comes to scientific data formats, look for existing efforts and contribute your skills there. In the analytical data world we have ANDI, JCAMP, MzData etc… the AnIML project's goal is to take the good points from there standards and at the same time correcting their shortfalls.

    At the moment we have gained the attention of most of the big instrumentation manufactures and are in the process of wrapping up version 1.0 of the standard.

    This data format will be an ASTM standard when completed. The standard will be free to use (i.e. “open”), license is still pending but likely LGPL.

    If you forge out on your own to start a new format please (please!) at least get in touch with the existing groups. A lot of leg work has been done in this field and most people are willing to share.

  3. Jason Morrison
    Posted October 30, 2009 at 15:57 | Permalink

    Hey folks,

    I'm really thrilled to see discussion coming together on this topic, and am trying to come up to speed on all the existing technologies, projects, and ontologies. Jamie, thanks very much for the links and information about AnIML – looks fantastic, and I'll get in touch after I do some reading.

    I'm *particularly* pumped about this statement: “At the moment we have gained the attention of most of the big instrumentation manufactures and are in the process of wrapping up version 1.0 of the standard.”

    I was also quite excited to read Cameron Neylon's “Head in the clouds: Re-imagining the experimental laboratory record for the web-based networked world” at http://www.aejournal.net/content/1/1/3 – thoughts?

  4. Jason Morrison
    Posted October 30, 2009 at 22:57 | Permalink

    Hey folks,

    I'm really thrilled to see discussion coming together on this topic, and am trying to come up to speed on all the existing technologies, projects, and ontologies. Jamie, thanks very much for the links and information about AnIML – looks fantastic, and I'll get in touch after I do some reading.

    I'm *particularly* pumped about this statement: “At the moment we have gained the attention of most of the big instrumentation manufactures and are in the process of wrapping up version 1.0 of the standard.”

    I was also quite excited to read Cameron Neylon's “Head in the clouds: Re-imagining the experimental laboratory record for the web-based networked world” at http://www.aejournal.net/content/1/1/3 – thoughts?

5 Trackbacks

  1. [...] 1. Talks from SC09 2. Data platforms for science 3. Matt’s Scidata manifesto) [...]

  2. [...] written about the distributed self and science data platforms. A lot of the former was around the notion of pubsub, and pushing data to various places. Now [...]

  3. By Jealous of Geo (no not gene expression) on March 20, 2010 at 23:50

    [...] can do some pretty cool things with data. I wonder if the problem is that we first need to have a science data platform. I don’t think our current data sources do a good job of being a platform for people to build [...]

  4. By Abstractions on April 26, 2010 at 06:02

    [...] or Galaxy as a framework to embed these tools, or use Pipeline Pilot or Taverna. To build good science data platforms, we need to leverage abstractions. What is key is making sure that every layer of abstraction can [...]

  5. By Data geeks and biology on August 8, 2010 at 11:31

    [...] Matt’s manifesto for a science data platform [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

blog comments powered by Disqus
  • Archives

  • Disclaimer

    All opinions on this blog are my own and do not reflect those of my employers, past or present