Fork me on GitHub

Calais, Reuters and the changing value of data

As mentioned a few times in the past, I really enjoy listening to all the interviews that Paul Miller and Danny Ayers conduct for Talis. One I heard recently was an interview with Barak Pridor, CEO of ClearForest (whose Gnosis Firefox plugin has been covered in the past). In the interview Barak talks about Calais. Calais is a web service that automatically attaches rich semantic metadata to submitted content. I am planning to try it out (it returns RDF) with content from bbgm at first. The hope is to create a graph between people and organizations listed on bbgm. Anyway, in the interview Barak says something that resonated quite a bit (not too surprisingly). His words were along the lines of

Value is shifting from raw data/content to analysis and tools built on top of the underlying content

This is one of the central theories of the bbgm philosophy. It’s good to hear an organization like Reuters (ClearForest was acquired by Reuters last year) adhere to that philosophy.

Technorati Tags: , , ,

This entry was posted in Semantic Web, Software & Internet. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.
  • Decided to leave a video comment about Freebase :)

    [viddler_video=49c7b76b]
  • Yes, having had a look at the Calais site it seems to be a beefed up version of the ClearForest web service. What seems to be missing to me is the ability to extract partially structured data from web sites. Not recognising people or organisations but recognising things that have been structured as key and value. We have a lot of structured information in the lab blog but extracting it is not straightforward.and then presenting it is a whole other problem.
  • Twine is (at least not yet) not a developer platform, and is more consumer focussed. Calais on the other hand is a text analytics platform for entity extraction. I am not sure it works for the kind of information you want to pull out, but you should check out out.

    Of course, you could always try Freebase as well
  • Hmm, now this looks interesting, and possibly closer to what we want than twine for our purposes. Our big issue at the moment is how to take our 'semi-semantic' (We desperately need a better term than that) material from the LaBLog and actually capture a snapshot of the embedded information in a more formal framework. This looks like it might do the job but will have to look closer.
blog comments powered by Disqus
  • Archives

  • Disclaimer

    All opinions on this blog are my own and do not reflect those of my employers, past or present