A thesaurus, wikis and text mining
May 25, 2007
From Bioinform (sub reqd) we learn about a proposal from Aaron Cohen and William Hersh of the Oregon Health and Science University’s Department of Medical Informatics and Clinical Epidemiology to develop a wiki-style thesaurus that would contain disease names, types, and variations. According to the scientists, the thesaurus would be based on a relational data model that connect all the data types.
The goal of the project is to discover relationships and associations and eventually therapeutic mechanisms. The hope is to bring as many as 20,000 bioinformaticians and researchers into the project. From where I stand, just limiting the proposal to a relationship-oriented wiki might be too narrow. Here is what would be really cool
1. A wiki-style system as proposed built using a Freebase-style structured model. (maybe they can use the Freebase API or something like that) One can use existing ontologyies as a starting point, the way many sites start with Wikipedia content
2. A Natural Language processor to identify and mine relationships
3. Killer visualization which would draw from the above two resources
The second point is already on the minds of OHSU, since they “plan to use the thesauri generated from this collaborative project to extend thesauri that we license, while building completely new thesauri as needed for use in Linguamatics I2E, Inforsense KDE and other unnamed analytics and internal text analytics projects and internal text search engines”. Including a built in natural language processor that can be enhanced by access to others would be a great idea. (Aside … the number of bioinformatics text mining tools that are available now is a little mind boggling and somewhat self defeating. I wonder how good they all are?)
With all the data we are collecting these days, being able to develop these kinds of resources is becoming critical. My only request, keep the resource open, and with an open API. Projects like this one and the Encyclopedia of Life (EoL) are ambitious and difficult. Closed projects at this scale will be difficult to sustain and are likely to result in a glut of “me too” projects. Hopefully the OHSU project will get sufficient support, cause it doesn’t quite have the visibility of EoL. It sure deserves it.
Further reading
Freebase
Encyclopedia of life
Wikipedia and science
Technorati Tags: OHSU, Wiki, Thesaurus, Bioinformatics



Add New Comment
Viewing 3 Comments
Thanks. Your comment is awaiting approval by a moderator.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Add New Comment
Trackbacks