Fork me on GitHub

Collective Intelligence in the life sciences

Here is the Table of Contents of Programming Collective Intelligence (a great book by the way)

  • Introduction to Collective Intelligence
  • Making Recommendations
  • Discovering Groups
  • Searching and Ranking
  • Optimization
  • Document Filtering
  • Modeling with Decision Trees
  • Building Price Models
  • Advanced Classification: Kernel Methods and SVMs
  • Finding Independent Features
  • EVOLVING INTELLIGENCE
  • Algorithm Summary
  • Third-Party Libraries
  • Mathematical Formulas

I was going through this the other day, and it occured to me that bioinformaticians/computational biologists program collective intelligence every day. The collective intelligence comes from evolution and the intricate evolutionary relationships that drive our knowledge about the world around us; the knowledge built into biology. We use many of the same methods described above, and many of the same paradigms. In a way, you can say that bioinformaticians are way ahead of the web 2.0 crowd (although we still don’t know how to write intuitive interfaces). The description of the book on Amazon says

This fascinating book demonstrates how you can build Web 2.0 applications to mine the enormous amount of data created by people on the Internet. With the sophisticated algorithms in this book, you can write smart programs to access interesting datasets from other web sites, collect data from users of your own applications, and analyze and understand the data once you’ve found it.

Over the years, life scientists have generated tons of data, and deposited it on the internet. While we do a great job os trying to leverage evolutionary relationships with cutting edge algorithms, we do a relatively poor job of harnessing the collective intelligence from the biological data distributed around the web.

The modern web is programmable. There are those who would say it’s always been that way, but I beg to differ. The biggest development in recent years, has been our ability to use the entire web as our platform through all the mechanisms available to us, REST APIs, RSS delivery, microformats, etc. I tend to look at the life science web as an independent subsystem living within the WWW. I just wish it were more programmable. The tide is beginning to change (the NCBI Universal Resource Locator would be one example), but as Duncan mentioned last year and I lamented earlier, the general trend, where available, is towards big web services. Given the amount of changes seen in life science data, WS-* interfaces are likely to break and usually just to “heavy” to be readily programmable. In other words, while there are a small group of bioinformaticians who are harnessing the web the way it should be harnessed, by and large, we are still limited due to the lack of simple REST APIs. With all due respect to IBM and the lovers of service oriented architectures (which are primarly WS-* based), we could do things much faster, cheaper and more effectively if we could harness data and focus on deriving knowledge from it.

So let’s move towards a new form of collective intelligence, where we can harness the power of evolution and of the network of public and private resource that are available via simple API’s, microformats and what have you. Let us empower the creative developers out there to make the life science web even more powerful.

Further reading
Will data be our undoing
Biology, computing and web services
Biological content, access and monetization

Technorati Tags: , ,

This entry was posted in BioIT, Informatics, Software & Internet. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

8 Comments

  1. Posted October 12, 2007 at 16:24 | Permalink

    Perhaps you will be pleasantly unsurprised to learn that the author Toby Segaran has been the Director of Software Development at Genstruct an AI based biotech.

  2. Posted October 12, 2007 at 19:24 | Permalink

    Perhaps you will be pleasantly unsurprised to learn that the author Toby Segaran has been the Director of Software Development at Genstruct an AI based biotech.

  3. Posted October 12, 2007 at 17:55 | Permalink

    I didn’t know. Funny thing is that I gave a presentation on biosimulation companies just last week which included a mention of GeneStruct and their methodology :)

  4. Posted October 12, 2007 at 20:55 | Permalink

    I didn't know. Funny thing is that I gave a presentation on biosimulation companies just last week which included a mention of GeneStruct and their methodology :)

  5. Posted October 24, 2007 at 06:55 | Permalink

    Hi Deepak, I agree whole heartedly with simple APIs. SOAP-and-WSDL seem inappropriate for a lot of bioinformatics. This O’Reilly book on REST is a great howto, been passing it around the lab here, its a good read if you’re into that sort of thing. Cheers. Duncan

  6. Posted October 24, 2007 at 07:17 | Permalink

    The RESTful book, which Andrew Walkingshaw recommended to me, happens to be sitting just inches away :) . It’s a must read actually if you’re even remotely into that sort of thing.

  7. Posted October 24, 2007 at 09:55 | Permalink

    Hi Deepak, I agree whole heartedly with simple APIs. SOAP-and-WSDL seem inappropriate for a lot of bioinformatics. This O'Reilly book on REST is a great howto, been passing it around the lab here, its a good read if you're into that sort of thing. Cheers. Duncan

  8. Posted October 24, 2007 at 10:17 | Permalink

    The RESTful book, which Andrew Walkingshaw recommended to me, happens to be sitting just inches away :) . It's a must read actually if you're even remotely into that sort of thing.

2 Trackbacks

  1. [...] What does this imply about collective intelligence? Doesn’t the concept of collective intelligence contradict Tim and myself? I am not sure it does. Methods that leverage publicly available datasets are in essence capturing the collective intelligence in those data sets, but mining them for information. In an expert setting, like the life sciences, the intelligence in the myriads of publications is available to us. Those who learn to glean maximal information from those resources then have a choice. To keep it out in the public domain, or keep it private. In either case there are business models to be explored. [...]

  2. [...] Collective Intelligence in the life sciences [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

blog comments powered by Disqus
  • Archives

  • Disclaimer

    All opinions on this blog are my own and do not reflect those of my employers, past or present