Fork me on GitHub

Bioinformatics and software development – yet again

The struggle between trying to please your committee and provide adequate support for the software you create remains. Is this a problem with the focus of graduate studies, the funding bodies, or the expectation of users?

These words come from a post by Nils Homer up on Anthony Fejes’ blog. Nils talks about some of the challenges of being a software developer in the world of bioinformatics. In his post he quotes a friend

… the software engineering and implementation of several of the methods consumed significantly more time and energy than the original research and paper writing. This is an important but less recognized component of methods development, as it prevents the work from remaining just interesting ideas, but puts them into practice

For reasons Nils articulates so well, research code is often brittle, poorly documented and not sustainable at all. There are always exceptions, but that is the norm. The world is full of aligners and scripts and small apps with no documentation and the person who wrote the code far removed and unable to provide any help. Nils asks “what are the benefits of creating usable software and to support users who are not the ones provide funding?”

Here I will deviate from the post and provide my views, some of which do match Nils’. The number one benefit is much the same as any experimental procedure, protocol or technique. Software is how computational thoughts and ideas are implemented. Being able to capture, optimize and share such ideas and protocols is good for science. Good, well documented, well supported and well understood software might also result in less software bloat and repeated implementations of exactly the same piece of software. In a perfect world, the software would be open sourced, and a community would develop for it, resulting in improvements to the software over time. For all that I complain about the sometime closed communities around molecular dynamics code, they do benefit the overall functionality and direction of the codes.

But I also think there is place in the life science world for the professional software developer. Someone who can implement algorithms robustly, think about things like database optimization and multi-tenancy, write good UI’s etc. I’ve seen too many examples of this working and strongly believe that while all bioinformaticians should be good programmers today and write robust algorithms, you need software developers, or at least folks whose focus is not on cranking out papers to develop applications, data management systems, robust pipelines, etc. Those are as much part of modern science as the sequence search algo.

It’s good to see programming and code get more attention these days. In a recent blog post following an NHGRI workshop, Sean Eddy wrote

A program that funds developers in much the same way that HHMI or the NIH Pioneer Awards fund people not projects. NHGRI could allocate stable long-term funding to a small but influential number of individual developers. The history of the field is that the best software in the field is often an unplanned labor of love from a single investigator; the history of software development shows that the disparity between the best developers and average ones is enormous, so business studies recommend models that enable highly skilled developers to focus on what they do best; and the best developers are often quirky people who don’t write grants well.

This is the kind of effort we need. As Steve Ballmer once (in)famously chanted Developers, Developers, Developers. We need more, and better ones, or perhaps simple, better appreciated ones.

Reblog this post [with Zemanta]

This entry was posted in Bytes, Informatics, Life Science, Programming, Software & Internet. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

5 Comments

  1. anon
    Posted May 22, 2010 at 09:16 | Permalink

    Outsource the boring software development and atleast some if not all of the maintenance?! even at the RA level…

  2. Posted May 22, 2010 at 10:01 | Permalink

    Maintenance is actually the key. Most researchers don't write maintainable software and unfortunately that's rarely a focus.

  3. Posted May 22, 2010 at 10:01 | Permalink

    Maintenance is actually the key. Most researchers don't write maintainable software and unfortunately that's rarely a focus.

  4. Posted May 23, 2010 at 06:54 | Permalink

    I write both research code and application interfaces. It takes much longer to write a good interface than it does to write the core algorithm. However, the time is worth it if an algorithm is going to be used more than once.

    I have observed that much research code is not well-written, -connected, -tested, and -documented. Much of the code is ad hoc, and there is also the occasional manual step (data cleaning, sorting, manipulation in Excel files, etc.) I suspect that if the analysis on the data were performed again, the analysis would generate different results because the original analysis could no be repeated step by step.

  5. Posted May 23, 2010 at 13:54 | Permalink

    I write both research code and application interfaces. It takes much longer to write a good interface than it does to write the core algorithm. However, the time is worth it if an algorithm is going to be used more than once.

    I have observed that much research code is not well-written, -connected, -tested, and -documented. Much of the code is ad hoc, and there is also the occasional manual step (data cleaning, sorting, manipulation in Excel files, etc.) I suspect that if the analysis on the data were performed again, the analysis would generate different results because the original analysis could no be repeated step by step.

2 Trackbacks

  1. By Friday SNPpets | The OpenHelix Blog on May 27, 2010 at 21:30

    [...] Singh blogs about the need for developers and more support and appreciation for developers and the software they develop. (can I add… documentation, training and user support? which I [...]

  2. By Planning for the short term on May 31, 2010 at 23:33

    [...] Bioinformatics and software development – yet again (mndoci.com) [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

blog comments powered by Disqus
  • Archives

  • Disclaimer

    All opinions on this blog are my own and do not reflect those of my employers, past or present