Fork me on GitHub

The Canonical model for scientific software

Wubi
Image via Wikipedia

This is one of those blog posts that gets inspired by something I hear in a podcast. Also one of those posts that I wish there were easy ways to link to specific pieces of audio and just those pieces of audio.

Earlier this morning I was listening to episode 77 of Floss Weekly, an interview with Jono Bacon about Ubuntu. In a great podcast, one number that Jono, who is the community manager, mentioned was the number of active contributors to Ubuntu, somewhere between 50000 – 100000. Now that’s a large number, more than I had expected, and it got me thinking about the role Canonical plays in the Ubuntu world

If there is one area where I can realistically stand up with a straight face and say I have some authority it’s in scientific software. It’s what I have spent almost all my career (except for the past year) doing. One of the questions I have always asked myself is about models that result in sustainable, high quality software that can be accessible to people to do good science, and it’s been a tough question to answer as there are so many variables, so many user types and so many needs, coupled with tremendous pricing pressures and, in most cases, a small addressable market (when it comes to pharma usually a shrinking market). But the podcast really got me thinking about some possible models. Now to put some biases in play here. From my years as a product manager and subsequently observing different development models, I believe that you get most success either from small, very focussed, teams or from distributed teams (the latter would be the Ubuntu approach). For some scientific projects, I believe the latter has some interesting twists based on the Canonical model.

Now to be clear, the Linux operating system and scientific software are not the same thing, not even close. But what if there were pieces of software (R comes to mind and actually seems to use many of those principles) for which one could get a few 1000 active contributors, or a type of software for which there was a decently large user bases (many 100′s or 1000′s). What if there was an entity or entities whose role was not too dissimilar to the role Canonical has taken for Ubuntu and the goal of developing a best of breed Linux distro. The reason I say this is that a lot of good scientific software is open source, or should be, with a core group of developers and access to the wider body of scientific programmers with interest in the field. The entity can fund and focus on the key aspects, e.g. scalability, and provide value adds like customization and turnkey solutions. It could even be a non-profit organization (an idea that a former colleague was really intrigued by) focussed on solving some of the core challenges. Regardless of the business model here, the concept I am trying to focus on is the ability of an organization to facilitate and foster open source development, and build quality, sustainable software. In many science projects, there tends to be a lack of coordination, since it is managed by a single lab and a postdoc or two (this seems to be changing a little). Or you have software where for a fee you can get the source, but there is a lot of fragmentation of the code base and no sustained effort to solve problems which impact a community, since most scientists write code for parts that solve a problem particular to them. What makes this especially interesting is the rise of distributed version control systems, which make this whole process of managing source and projects, forking code, etc that much easier to handle.

I don’t think this model works for all scientific projects. In fact, just for a few. But there are enough classes of software where if you had an organization or company fostering open source software, managing and sustaining it, IMO we’d get some good quality software out there, and less fragmentation (we don’t need 100 alignment algo’s or MM apps that do exactly the same thing). How the business aspects would work is something that needs to be analyzed, but it’s not like scientific software companies are laughing their way to the bank. Already you are seeing examples of companies pushing and supporting the development of open source software systems like Agilent just did with Cytoscape (subscription required) through their philanthropic arm. If you are smart, you know that in the end that is likely to help your own cause.

Essentially what I seem to be gravitating towards is a departure from the traditional in-licensing model of scientific software companies (if you are doing your own development that falls into the small focussed effort category) and a model where the company essentially fills the role Canonical does for Ubuntu. You could have a couple of companies out there in that role, and find the appropriate balance between supporting open source and building a business, or you could go the foundation route. A topic to be re-visited.

Reblog this post [with Zemanta]

This entry was posted in Innovation, Open Source, Software & Internet. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

2 Comments

  1. martijnvaniersel
    Posted July 23, 2009 at 12:50 | Permalink

    It's an interesting question, how can the quality of scientific software be increased? I think it has something to do with increasing the cohesion of projects. It's something that R/Bioconductor does particularly well, with a centrally managed site and easy software installation. Also Cytoscape with its plugin ecosystem. There is something of a network effect there, once a few good plugins arise, it attracts attention of other plugin developers. R/Bioconductor is not funded by a commercial company AFAIK, and Cytoscape was already doing well before the backing of Agilent. So I don't think that's a requirement.

    Comparing to Ubuntu, there is also simply a matter of scale. The Ubuntu eco system is potentially much larger, being much broader in appeal. Probably many Ubuntu projects are low quality but their long tail is much longer…

  2. Posted July 23, 2009 at 14:04 | Permalink

    I think one way to improve the quality is just adopting good software development practices. In the cases that work, like R, a strong core team, and appropriate practices ensure that is the case. On the other hand, in a lot of cases, you get sphagetti code, or the main driving postdoc moves on, and you get left with pretty darn bad code.

    Commercial support is not a requirement, but it helps, not necessarily commercial, but an organization that identifies key challenges and weaknesses, perhaps the kinds of thing an academic group is just not incented enough to do. Obviously, linux and a scientific software package are a different beast, but this approach beats the traditional, “lets just in-license” the software and package it up and sell it.

One Trackback

  1. [...] The canonical model of software development [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

blog comments powered by Disqus
  • Archives

  • Disclaimer

    All opinions on this blog are my own and do not reflect those of my employers, past or present