Another post that I started writing a while ago, but never quite managed to pin down. Well, a post by Neil Saunders expressing frustration with the state of bioinformatics services was just the inspiration required.
Historically, centralized data repositories like the NCBI, EBI, PDB, etc have been sources of data, but have also provided the most commonly used search interfaces and web services that people use to access that data. A number of services built on local copies of the data have been developed, often for internal use at companies (and I’ve been part of some fantastic ones), and while APIs are available, the trend to provide documented, usable APIs pervasive in the tech world these days is not quite the norm in the life sciences. Assuming that we have excellent public data repositories, with rich APIs and data structures, it would be nice if a mix of application developers, designers and data geeks could start developing visual experiences and web services that enhance the utility of these sites. Unfortunately, as Neil’s and Hari’s experiences have shown, that is simply not the case.
In my own experience, from conferences, etc, it is clear that the world of bioinformatics (all life science informatics actually) faces a major problem. One where too much time is spent moving data back and forth and in formatting/reformatting and just in work that I would call “grunt work”. A decade ago that might have been somewhat acceptable, as the field was still young, but not when bioinformatics becomes a core part of research. It is critical that various biological resources need to do a better job of allowing their customers (and I use the word deliberately) to be more effective using their resources. One of the best comments about Pipeline Pilot came from the head of informatics at a pharma company. He said that using it had made it possible for his informaticians to focus on developing new methods and deploying them to other scientists, since Pipeline Pilot did such a good job of gluing things together. We need to make this process even more simple, and allow the Neil’s of the world to focus on data analysis, software development and methodology and not data munging.
Let me take this thought one step further. I believe that there is a business model to be explored here as well. Philosophically, I believe that knowledge lies in what can be done with data, rather than the data itself. If everyone has equal access to the data, monetizing processes that generate useful information from the data is perfectly fair and square. The one caveat, and perhaps someone can share their thoughts on this, is whether the data producers should be compensated somehow, or is that addressed by the funding, etc they get? Alternatively, data produces are well placed to develop services on top of the data as they have intimate scientific knowledge. And I am not just talking about the AJAX-ification of genome browsers. It is a well known fact that Google and others have built their empire on top of open source software. Others have leveraged services and APIs to provide useful services, e.g. Lijit uses Google Custom search and one of the genome browsers mentioned above uses the Google maps API. Would it be appropriate to take publicly available services, and using them as a backend, develop commercial services? If yes, what are the kinds of businesses that can be built on top of that? What kind of licensing policies would be prevalent? Food for thought and the subject of another post some day.
Technorati Tags: Workflows, Bioinformatics, Data Munging, API, Business Model



4 Comments
I couldn’t agree more. Would suggest folks interested in open, standardized APIs check out some of the cancer Biomedical Informatics Grid (caBIG) software tools. caBIG has established conventions for creating standardized object-oriented APIs for any system, regardless of what kind of life science data it is hosting. The tools that currently implement these APIS are are listed at https://cabig.nci.nih.gov/guidelines_documentation/Silver_Review/#silver.
I couldn't agree more. Would suggest folks interested in open, standardized APIs check out some of the cancer Biomedical Informatics Grid (caBIG) software tools. caBIG has established conventions for creating standardized object-oriented APIs for any system, regardless of what kind of life science data it is hosting. The tools that currently implement these APIS are are listed at https://cabig.nci.nih.gov/guidelines_documentat....
I’ll second Peter’s opinion, and tell you that our company realized from its inception the value of an API and integration with public resources such as NCBI and Ensembl. As someone who straddles both worlds (molecular oncology/biomedical informatics) it is imperative that scientists have good access to BI/IT resources and that tools such as standardized APIs actually become..standard. We’ve all done our fair share of “grunt work” however, and I am not sure it is going to go away anytime soon; standardization in life sciences in general lags behind a lot of other industries. If you are interested in checking out some good data management software tools to help standardize data collection for translational research/biobanking/biomarker discovery, look us up at http://www.biofortis.com.
I'll second Peter's opinion, and tell you that our company realized from its inception the value of an API and integration with public resources such as NCBI and Ensembl. As someone who straddles both worlds (molecular oncology/biomedical informatics) it is imperative that scientists have good access to BI/IT resources and that tools such as standardized APIs actually become..standard. We've all done our fair share of “grunt work” however, and I am not sure it is going to go away anytime soon; standardization in life sciences in general lags behind a lot of other industries. If you are interested in checking out some good data management software tools to help standardize data collection for translational research/biobanking/biomarker discovery, look us up at http://www.biofortis.com.
4 Trackbacks
[...] The modern web is programmable. There are those who would say it’s always been that way, but I beg to differ. The biggest development in recent years, has been our ability to use the entire web as our platform through all the mechanisms available to us, REST APIs, RSS delivery, microformats, etc. I tend to look at the life science web as an independent subsystem living within the WWW. I just wish it were more programmable. The tide is beginning to change (the NCBI Universal Resource Locator would be one example), but as Duncan mentioned last year and I lamented earlier, the general trend, where available, is towards big web services. Given the amount of changes seen in life science data, WS-* interfaces are likely to break and usually just to “heavy” to be readily programmable. In other words, while there are a small group of bioinformaticians who are harnessing the web the way it should be harnessed, by and large, we are still limited due to the lack of simple REST APIs. With all due respect to IBM and the lovers of service oriented architectures, we could do things much faster, cheaper and more effectively if we could harness data and focus on deriving knowledge from it. [...]
[...] It’s a sign of things to come and the kind of interoperability that makes web apps as powerful as they are. I can see all kinds of interesting paradigms, e.g. linking your account on NCBI with a bioinformatics web app, so that you can run on the fly BLAST queries, or something along those lines. It’s a very rich experience (you have to try Picnick) Remember what I said about APIs. [...]
[...] Further reading Learning from tech – Better APIs The value of feature extraction Building a global tagspace [...]
[...] Further reading Programmable Web – Better APIs [...]