Big things come in small packages

May 30, 2007

I fly a lot, and what do I miss most on a long flight? Access to my feed reader. Well that just changed. I can’t wait to fly to Indianapolis this weekend with a working offline feed reader, thanks to Google Gears

Further reading:
Google blog

Technorati Tags: , , ,

When is web 2.0 really that?

May 30, 2007

Interesting article in Bio-IT World on some of the impact that “Web 2.0″ applications can have on storage requirements at life science companies. As someone who shakes his head at how little life science companies take advantage of modern internet technologies and trends, I was a little skeptical of the premise of the article, and at the end of it, I must admit to being a little annoyed.

The article talks about how web 2.0 applications, e.g. the practice of tagging image data, is requiring companies to plan their storage needs better. To some extent I can picture how this process works. In a very early life at a bioinformatics company, way before anything became 2.0, we used to capture a lot of metadata and do a lot of manual annotation of results. That in itself lead to the need for storage capacity. That doesn’t seem so different from what is described in the article. Improved semantic annotation just makes it easier to crawl through files and let the machines do a better job of the heavy lifting than 6-7 years ago (thinking back at some of those Perl scripts to parse data sets brings back memories).

The word “web 2.0″ in the article is a little jargony, and frankly at times confused with the semantic web (where life science companies do a better job than most). After all if people are generating images, solving structures, etc, it is all “user generated” content and always has been. The difference is that now we have better metadata, and a lot of it. People are beginning to realize that they can capture it, store it, mine it and use it to generate information and relationships. Where current internet trends might come in is that due to more interactive and collaborative web sites and applications, more people might look at any data sets and add their own annotations and interpretations, not just the original data generators/collectors

Technorati Tags: , ,

Open Source Strategies For Science

May 28, 2007

Found a couple of interesting videos at the Berkman center channel at blip.tv

Technorati Tags: , , ,

Things I noticed #28

May 27, 2007

Issue #28 of Things I Noticed. Here you go.

Evotec launches an innovation center for Fragment-based drug design

Lee Hood takes the guest chair on Futures in Biotech

Rick begins reviewing life science websites

A cool new blog on open data and data sharing. Check it out

Interesting paper on visualizing Wikipedia (via New Florence. New Renaissance.). Some of the more popular subjects shouldn’t come as a surprise.

Second Brain looks very interesting. Sounds like something I could use along the lines of Adaptive Blue

Bioclipse as a disruptive technology

Bad Science at the BBC

Technorati Tags: , , , , , ,

A thesaurus, wikis and text mining

May 25, 2007

From Bioinform (sub reqd) we learn about a proposal from Aaron Cohen and William Hersh of the Oregon Health and Science University’s Department of Medical Informatics and Clinical Epidemiology to develop a wiki-style thesaurus that would contain disease names, types, and variations. According to the scientists, the thesaurus would be based on a relational data model that connect all the data types.

The goal of the project is to discover relationships and associations and eventually therapeutic mechanisms. The hope is to bring as many as 20,000 bioinformaticians and researchers into the project. From where I stand, just limiting the proposal to a relationship-oriented wiki might be too narrow. Here is what would be really cool

1. A wiki-style system as proposed built using a Freebase-style structured model. (maybe they can use the Freebase API or something like that) One can use existing ontologyies as a starting point, the way many sites start with Wikipedia content
2. A Natural Language processor to identify and mine relationships
3. Killer visualization which would draw from the above two resources

The second point is already on the minds of OHSU, since they “plan to use the thesauri generated from this collaborative project to extend thesauri that we license, while building completely new thesauri as needed for use in Linguamatics I2E, Inforsense KDE and other unnamed analytics and internal text analytics projects and internal text search engines”. Including a built in natural language processor that can be enhanced by access to others would be a great idea. (Aside … the number of bioinformatics text mining tools that are available now is a little mind boggling and somewhat self defeating. I wonder how good they all are?)

With all the data we are collecting these days, being able to develop these kinds of resources is becoming critical. My only request, keep the resource open, and with an open API. Projects like this one and the Encyclopedia of Life (EoL) are ambitious and difficult. Closed projects at this scale will be difficult to sustain and are likely to result in a glut of “me too” projects. Hopefully the OHSU project will get sufficient support, cause it doesn’t quite have the visibility of EoL. It sure deserves it.

Further reading
Freebase
Encyclopedia of life
Wikipedia and science

Technorati Tags: , , ,

Next Page »