Fork me on GitHub

Biological content: Access and monetization

I spend almost every evening going through all my non-science related feeds in Google Reader. The bulk of these feeds follow the tech industry or the business of the tech industry. As any self respected reader of techmeme will tell you, a fair bit of the chatter on the web these days is about user-generated content, something at the heart of Web 2.0. A related subject is the monetization of the aforementioned content. The success of MySpace, YouTube and the likely billion dollar buyout of Facebook is just an example of the commercial value of online content.

On many occasions, I find my thoughts wandering on to the question of scientific content. To me that is the ultimate form of user-generated content, which makes it somewhat disconcerting that there has been no systematic effort to organize that data, at least none that really draws the community together to make this happen. A good chunk of biological data rightfully belongs in the public domain. The challenge remains: How do we organize all the data in a form can be queried, mined, and monetized appropriately? Right now, the data are distributed all over the web, without any real organization or framework. All this always brings me back to the W3C. Sir Tim Berners-Lee and colleagues have had a ton of experience and success at developing a framework that enables both content providers and consumers develop the relationships that have made the Web as successful as it has been. Yes, semantic standards haven’t quite become the norm, but blogs, content management systems, etc are paving the way for a web where information is increasingly being semantically linked. It is a world that people at Nature Publishing Group have recognized, as demostrated by their recent effort in publishing standards. It would be interesting if some organization, perhaps the Life Science group at the W3C in conjunction with organizations like NCBI, EBI, RCSB, MGED, HUPO, etc can spearhead such an effort. The W3C life sciences group has not gained much traction, at least to my knowledge, and perhaps the reason is that the community-wide dialogue is not happening. Maybe a Google could get involved? I am sure they would love to index the “biological web”.

Which brings us back to monetization? Some would argue that biological information belongs in the public domain and should be accessible freely. I agree, but it depends on how one defines information. Gene sequences, protein structures, etc, do belong in the public domain, but using that information to make decisions, products, and come to all kinds of conclusions is where the fun lies. Some of these aspects are monetizable and should be. If there was a framework that allowed people to upload biological content to the web, and then search, mine, and analyze that content, then I am certain that reasonable monetizaton models can be developed. That can include discoveries that could remain proprietary (a new drug for instance, or a process to perturb an interaction network, etc). I have a few ideas on the kind of monetization models that could succeed, but with the framework being so far away, that is somewhat moot at this point.

I might completely change my mind on this subject in a few months, but an article by Eric K. Neumann in the October 2006 issue of Bio-IT World treads the same waters. In that article Eric talks about a drug safety commons and uses some similar arguments, or so I think, for making that data available in a Commons. He argues that while generating the information does not come for free, the lack of re-usability of that information only increases drug costs (both time and money). I couldn’t agree more. This is an era of knowledge, and how that knowledge is used is what describes success, rather than the knowledge itself.

So what do those in the community think? I realize that the view presented above is a little utopian, but bloggers are allowed their flights of fancy. If you perchance read this, I would be more than interested in your opinion, preferably in the form of a comment.

Update:I had totally forgotten about science commons, but thanks to a new post in my feed reader, I would like to point people there. The latest post is about an interview with David de Graaf, director of systems biology at Pfizer. I would like to point out that Teranode (where Eric works) is involved with science commons as well.

Further Reading
The advantages of a drug safety commons

Technorati Tags: , , , , , , ,

This entry was posted in Admin, BioIT, Informatics, Infotech, Innovation, Life Science, Omics, Search, Software & Internet. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

15 Comments

  1. Posted November 2, 2006 at 02:28 | Permalink

    I think that the lack of top down organization in scientific data dissemination is to large extent a good thing and something that will continue to evolve in this way because of the low barriers to doing so. A primary reason that I am reading your blog and engaging in a dialogue with you is that you did not seek the blessings of a third party to review your content and organize it into a standard format. I came across it either through a recommendation of a like-minded information producer or through some type of general purpose search engine for the blogosphere. In the same way someone searching for a protocol to distill phenylacetaldehyde yesterday came across a report from my lab via Google that was still being perfected but contained hopefully useful cautionary information: http://usefulchem.wikispaces.com/Exp037
    If someone creates a database to organize some of this information then lets duplicate it but there is no point in waiting for the organizational system to publish and share. Now monetizing is a interesting question in this environment – I suspect that people are going to be less and less willing to pay just for information.

  2. Posted November 2, 2006 at 05:28 | Permalink

    I think that the lack of top down organization in scientific data dissemination is to large extent a good thing and something that will continue to evolve in this way because of the low barriers to doing so. A primary reason that I am reading your blog and engaging in a dialogue with you is that you did not seek the blessings of a third party to review your content and organize it into a standard format. I came across it either through a recommendation of a like-minded information producer or through some type of general purpose search engine for the blogosphere. In the same way someone searching for a protocol to distill phenylacetaldehyde yesterday came across a report from my lab via Google that was still being perfected but contained hopefully useful cautionary information: http://usefulchem.wikispaces.com/Exp037
    If someone creates a database to organize some of this information then lets duplicate it but there is no point in waiting for the organizational system to publish and share. Now monetizing is a interesting question in this environment – I suspect that people are going to be less and less willing to pay just for information.

  3. Posted November 2, 2006 at 08:34 | Permalink

    You’re point is well taken. However, the reason we are being able to communicate via this blog, and the fact that you could come across it, is that there is a framework which allows you to do so. Information itself should be distributed across the web and certainly not centralized. Nor should anyone need permission to publish. However, to make the kind of access possible that todays web provides requires the development of some standards and vocabularies. It is the latter that is needed, but it can’t happen unless some of the key organizations come together to maintain and agree to those standards.

    On the issue of monetization, people are already less willing to pay for information itself.

  4. Posted November 2, 2006 at 23:55 | Permalink

    Deepak – the framework that you use for your blog is the same that we use for our blog to disseminate scientific information. But what I meant by a lack of top down organization is that you have complete freedom to communicate any way you wish. If you had to wait for a standard vocabulary to express your thoughts you would not have been able to do so.

  5. Posted November 3, 2006 at 02:55 | Permalink

    Deepak – the framework that you use for your blog is the same that we use for our blog to disseminate scientific information. But what I meant by a lack of top down organization is that you have complete freedom to communicate any way you wish. If you had to wait for a standard vocabulary to express your thoughts you would not have been able to do so.

  6. Posted November 3, 2006 at 00:04 | Permalink

    I don’t quite agree. The framework provided by my blog software and the WWW, i.e. XHTML, CSS, PHP, is what makes this blog possible. The content is mine, but to disseminate and share this content, there is a framework. My argument is not for the content, but a means to allow the content to be indexed and search. I am open to ideas on other ways to do this, but I haven’t been able to think of any. To some extent, we seem to be talking semantics here. Perhaps our definition of framework is not the same or something along those lines.

  7. Posted November 7, 2006 at 06:31 | Permalink

    This conversation reminds me of one I had yesterday. My colleague was very excited about a $500k product that would make “IT support unneccesary!” I asked – is it a software-as-a-service platform over the web? He said, oh no, it’s a server preloaded with a proprietary algorithm. I asked who would be taking the data from the box and integrating it into our user platform, who would be doing maintenence and service, upgrades, backups and all that on the “box.” He sort of sheepishly went “Oh….” Alot of the technological framework of today’s web, Web 2.0 and the emerging semantic web are all built on technologies largely invisible to users (thankfully!). But somewhere, there are W3C committees, GooglePlexes, Cisco engineers making the whole thing possible. Here’s to the Plumbers of the Internet and may they be semantically empowered someday soon!

  8. Posted November 7, 2006 at 06:52 | Permalink

    I was hoping you’d read this post. I am convinced that the future of data is some form of loose structure, which will allow indexing services and semantic engines to query the web. In addition, giving a service like postgenomic a plug, memes can be tracked more easily and the likes. In fact the latter is an excellent example of why we need some structure in our data (microformats).

  9. Posted November 8, 2006 at 06:34 | Permalink

    Did you see the article in BioIT: Shaping a Web That Better Serves Humanity by Catherine Varmazis? It describes the formation of a Web Sciences Research Initiative out of MIT to do research into web technologies but also the cognitive/user aspects of how to mine the web for information. I agree with you that this is probably the “value add” part of the web’s aggregate content.

  10. Posted November 8, 2006 at 08:56 | Permalink

    I read other versions of that article. I presume you are referring to the Tim Berners-Lee led initiative.

    We are a long way from maximizing what we can do with loosely distributed networks of information. Google and the likes were just the start. The “search” is only the beginning :)

  11. Posted November 8, 2006 at 11:56 | Permalink

    I read other versions of that article. I presume you are referring to the Tim Berners-Lee led initiative.

    We are a long way from maximizing what we can do with loosely distributed networks of information. Google and the likes were just the start. The “search” is only the beginning :)

  12. Posted November 9, 2006 at 08:44 | Permalink

    I am compiling data on known phosphorylation sites in human proteins. Yesterday I went trough a paper that had in supplementary materials a list of phosphorylated sites linked to proteins via IDs. They used 3 different ID types in the same field that I had to parse to my own favorite human id. I then downloaded a database that was at least well formated (in XML) but I still had to match their own ID to the one I was interested by blasting the protein sequences … I sill have at least two more databases to go.

    How I wish that there were a more standardized form of communicating scientific data. It is more important that we spend resources thinking of what to do with the information than on how to get it all together.

  13. Posted November 9, 2006 at 11:44 | Permalink

    I am compiling data on known phosphorylation sites in human proteins. Yesterday I went trough a paper that had in supplementary materials a list of phosphorylated sites linked to proteins via IDs. They used 3 different ID types in the same field that I had to parse to my own favorite human id. I then downloaded a database that was at least well formated (in XML) but I still had to match their own ID to the one I was interested by blasting the protein sequences … I sill have at least two more databases to go.

    How I wish that there were a more standardized form of communicating scientific data. It is more important that we spend resources thinking of what to do with the information than on how to get it all together.

  14. Posted November 9, 2006 at 17:22 | Permalink

    Pedro

    Your last sentence should be framed and sent to every life scientist in the world. If we don’t do something about it soon, we will only be hurting ourselves.

  15. Posted November 9, 2006 at 20:22 | Permalink

    Pedro

    Your last sentence should be framed and sent to every life scientist in the world. If we don't do something about it soon, we will only be hurting ourselves.

11 Trackbacks

  1. [...] However, the overarching theme of the article is in the right ballpark. Scientists, in an increasingly cross-disciplinary world, must collaborate more, and the publish or perish system must change. The whole outlook towards IP needs a rethink as well. Where does the intrinsic value of scientific discovery lie? That is a question I would love to debate with a panel of my blogging peers some day and I am not sure that there is an easy answer. Efforts such as Innocentive (scientific crowdsourcing), Mechanical Turk (not science per sé, but you can see the possibilities), OpenWetWare, Science Commons, PLoS One and efforts by Nature Publishing Group are only the start of what I hope is a new, open era in science. We have to embrace it, understand it and as has been stated previously, learn to monetize it in a reasonable manner as well. [...]

  2. By business|bytes|genes|molecules on January 22, 2007 at 20:21

    [...] Further readingAn open scientific futureBiology, search and UdellBiological content: Access and monetization [...]

  3. By business|bytes|genes|molecules on January 22, 2007 at 20:40

    [...] The world of biological information is not too different from that. Content is generated by the terabytes, by labs all over the world, using all kinds of methods that are often not well documented. It was for some of these reasons that the PDB stopped including theoretical models in its main database. When I was working as an informatician, getting confidence scores on all the data that we had was very important, and sometimes rather challenging. This is going to be the biggest challenge for the biological community going forward. As access to biological data gets more loosely distributed to the edges, and accessing those edges becomes one of our favorite pursuits, an equal amount of time will have to be spent making sure that data actually means something. Perhaps when I talked about monetizing biological data what I was really alluding to was the ability of companies/people to develop algorithms/platforms that can successfully mine the vast expanse of data and find what is really needed without compromising quality. In addition to all the other wonderful research going on out there, I wonder what kind of innovations are being made in assessing the quality of random biological information? I haven’t seen too much, but admittedly, I haven’t been looking. [...]

  4. By business|bytes|genes|molecules on February 25, 2007 at 02:41

    [...] Further ReadingIt’s not just the EBIBiology, search and UdellBiological content: Access and monetization [...]

  5. By business|bytes|genes|molecules on July 13, 2007 at 08:14

    [...] Further reading: Biological content: Access and monetization [...]

  6. [...] Further reading Will data be our undoing Biology, computing and web services Biological content, access and monetization [...]

  7. [...] About a year ago, I wrote a post on access and monetization of biological content. In that post, I talked about the value of scientific content and the possibilities of monetizing that content. Which brings us back to monetization? Some would argue that biological information belongs in the public domain and should be accessible freely. I agree, but it depends on how one defines information. Gene sequences, protein structures, etc, do belong in the public domain, but using that information to make decisions, products, and come to all kinds of conclusions is where the fun lies. [...]

  8. [...] And yes, this is an opportunity for the life sciences to explore new revenue models [...]

  9. [...] months ago, I started talking about the monetization of biological data, a theme that’s been present throughout the history of bbgm. In general, I have maintained [...]

  10. [...] months ago, I started talking about the monetization of biological data, a theme that’s been present throughout the history of bbgm. In general, I have maintained [...]

  11. [...] term monetization prospects? Actually I think that’s the easy part, and I’ve covered it many times [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

blog comments powered by Disqus
  • Archives

  • Disclaimer

    All opinions on this blog are my own and do not reflect those of my employers, past or present