A middle-out web services infrastructure for systems biology

Institute for Systems Biology as viewed from L...Image via WikipediaOne of the thoughts I’ve always had over the years in life science software was the need to adjust to constantly changing needs and technologies. In the world of “omics” software, where the technologies generating the data, as well as the data types keep evolving, it gets pretty darn complicated to build solid software projects. So, a paper on Systems biology driven software design for the research enterprise naturally caught my interest. The paper, from John Boyle et al discusses the informatics infrastructure being developed at the Institute for Systems Biology. What struck me as I was going through the first part was that there was no mention of RESTful architectures, at least explicitly, which was somewhat of a disappointment.

The architecture they use is a Service Oriented Architecture, pretty much an essential in such systems. There are other examples of service-based systems in informatics, e.g. CARMEN and, as I recently found out, EColiHub. Both I3 (the ISB system) and CARMEN use SOAP-based web services. EColiHub uses REST (not sure if that’s in production yet), so you know which one I am biased towards. It should be noted that in a lot os such systems Taverna and BioMoby are being used or on the cards, so there is a definite move towards a “lets not reinvent the wheel” mindset, a very good thing. In addition to these, there are workflow systems, like Pipeline Pilot, that are ideally suited to developing and deploying service oriented architectures (also usually SOAP based)

The paper definitely highlights many of the challenges that any such system needs to address. It’s a constant challenge, especially when a lot of the underlying scientific methods and technologies are not at the same level of robustness. In a research environment, you can’t really lock down best practices either, i.e. there needs to be flexibility to explore, and allow people to do things their way. The philosophy is seems to be, as is the preference these days in focusing on the middle layer, and allowing people to develop their tools and methods and providing them a common integration environment. Now here’s the part I like. The authors have chosen to use a LSID based system, and it’s pretty easy to see the system being used as a Semantic Web platform.

I am no guru on architecture, and this is a somewhat formal paper, so I won’t necessarily go there. The key aspect for me is seeing a trend, even in academia to at least think about formal software development and think about developing architectures and deployment environments that can evolve and be maintained over a period of time. On the other hand, there is the danger of making things too formal and resource heavy. Is this how I might have done it? Given that the architecture is stateless, I still wish it had been developed under a resource oriented architecture. While I do like workflows, in an academic setting, I am not convinced that they are the right paradigm, and I am not sure they should go there in the future.

The take home message is one that there is a place for deployment frameworks even in a research setting, and for using good, pragmatic software development practices. Matt Wood could probably give you a long lecture on the latter.

How do you think software resources in research settings should be made available?

Zemanta Pixie

This entry was posted in BioIT, Informatics, Software & Internet. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.
  • Scott Tabar
    Good find! I enjoyed the read of both your blog and the original PDF too.

    One thing that I want to point out is that REST and SOAP are not mutually exclusive technologies, but can play together quite well. Matter of fact, I3 (Systems biology driven software design for the research enterprise, page 11) uses REST combined with SOAP as one mechanism for browsing and querying an image repository, or LSID can be used for a more direct access to the data: "Once the experiment information is available… it can be browsed and queried using two different mechanisms: a SOAP/REST based interface …; and an LSID endpoint … ."

    Just as John Boyle, et al, points out that the field of life sciences is a highly dynamic and evolving field; I tend to feel the same way about computer science on a whole with its use of evolving technology and associated methodologies. Given a few years, hot buzzwords (and their associated technological counterparts to which they refer) tend to drop to the wayside; either through complete obsolescence or through an acceptance to where the newer technologies build upon them. If they are lucky, they may become the defacto standard, but even if they do not, their contribution to the newer technology can be upstaged or even obscured altogether.

    The point that is interesting is that even though a project may adopt a bottom-up or middle-out approach, which may cause some water-fall purest to cringe, the final application can be very stable. I feel that generally the attributed stability of these projects arise from building blocks that find their roots in mature and stable open source projects and/or a clearly defined and adopted set of standards. These two key components, open source and standards, give the final application stability that even a top-down approach 3 to 5 years ago would have had great difficulty in matching. On a whole, I feel that open source and the wide adoption of standards has contributed to the functional reality of bottom-up and middle-out programming models.

    With all this said, the next major challenge in the computer science and life science application development will be the move to support parallel processing, of which the industry on a whole is scrambling to find a solution that will scale nicely and insulate the applications from developer's limitations. The search for the new parallel holy-grail will most likely lead to a highbred blend of current parallel programming techniques along with novel advancements from both the software and hardware camps. Many corporations and organizations project that in the next 3 to 5 years that the multi-core processors will require a new approach so that tomorrow's software can just keep up with Moore's Law. This is evident in both the flood of announcements of these new endeavors, and by the millions of dollars that are going in to research within just the last six months. This new paradigm may cause yet another shift in programming in ways in which we can neither foresee nor predict today, but it will undoubtedly lead to the continuing evolution in both hardware and software.
blog comments powered by Disqus
  • Archives