From Bioinform (sub reqd) we learn about a proposal from Aaron Cohen and William Hersh of the Oregon Health and Science University’s Department of Medical Informatics and Clinical Epidemiology to develop a wiki-style thesaurus that would contain disease names, types, and variations. According to the scientists, the thesaurus would be based on a relational data model that connect all the data types.
The goal of the project is to discover relationships and associations and eventually therapeutic mechanisms. The hope is to bring as many as 20,000 bioinformaticians and researchers into the project. From where I stand, just limiting the proposal to a relationship-oriented wiki might be too narrow. Here is what would be really cool
1. A wiki-style system as proposed built using a Freebase-style structured model. (maybe they can use the Freebase API or something like that) One can use existing ontologyies as a starting point, the way many sites start with Wikipedia content
2. A Natural Language processor to identify and mine relationships
3. Killer visualization which would draw from the above two resources
The second point is already on the minds of OHSU, since they “plan to use the thesauri generated from this collaborative project to extend thesauri that we license, while building completely new thesauri as needed for use in Linguamatics I2E, Inforsense KDE and other unnamed analytics and internal text analytics projects and internal text search engines”. Including a built in natural language processor that can be enhanced by access to others would be a great idea. (Aside … the number of bioinformatics text mining tools that are available now is a little mind boggling and somewhat self defeating. I wonder how good they all are?)
With all the data we are collecting these days, being able to develop these kinds of resources is becoming critical. My only request, keep the resource open, and with an open API. Projects like this one and the Encyclopedia of Life (EoL) are ambitious and difficult. Closed projects at this scale will be difficult to sustain and are likely to result in a glut of “me too” projects. Hopefully the OHSU project will get sufficient support, cause it doesn’t quite have the visibility of EoL. It sure deserves it.
Further reading
Freebase
Encyclopedia of life
Wikipedia and science
Technorati Tags: OHSU, Wiki, Thesaurus, Bioinformatics



5 Comments
Yet another medical wiki:
http://scienceoftheinvisible.blogspot.com/2007/03/askdrwiki.html
You’re quite right, this project will not be successful, especially since it deviates from the current open source ethos.
Rather than a Freebase approach, which is also proprietary, why not use the Semantic MediaWiki. This is an open source application that adds a semantic layer to the Mediawiki application that runs Wikipedia. We have had great success using it as a system on which people build and collaborate around their learning plans. The RDF can be exported and used for other purposes in more robust systems. We are experimenting with usng RDF feeds from one of our semantic wikis to build OWL models.
Rather than a Freebase approach, which is also proprietary, why not use the Semantic MediaWiki. This is an open source application that adds a semantic layer to the Mediawiki application that runs Wikipedia. We have had great success using it as a system on which people build and collaborate around their learning plans. The RDF can be exported and used for other purposes in more robust systems. We are experimenting with usng RDF feeds from one of our semantic wikis to build OWL models.
I use Freebase as an example since I am somewhat familiar with it and quite like the model that it uses. In general, you can replace “Freebase” with an approach that allows the average researcher/scientist with domain knowledge to add structure on top of data and then use this structure to determine additional relationships (which is what SemanticMediaWiki seems to do as well). I will definitely check it out.
I use Freebase as an example since I am somewhat familiar with it and quite like the model that it uses. In general, you can replace “Freebase” with an approach that allows the average researcher/scientist with domain knowledge to add structure on top of data and then use this structure to determine additional relationships (which is what SemanticMediaWiki seems to do as well). I will definitely check it out.
One Trackback
[...] Then don’t miss A thesaurus, wikis and text mining post at business|bytes|genes|molecules. The goal of the project is to discover relationships and associations and eventually therapeutic mechanisms. The hope is to bring as many as 20,000 bioinformaticians and researchers into the project. From where I stand, just limiting the proposal to a relationship-oriented wiki might be too narrow. [...]