Curating annotations in for biological informatics is hard. There are companies that have many people searching through literature looking for protein-protein interactions and other similar relationships. A large part of what makes some databases more attractive is the level of curation, manual or otherwise. What if we could change the way that this goal is achieved, perhaps via some of the same resources that we are using the tech world? Many of us have talked about Freebase and its possibilities. Earlier today I was reading up a little on Phrasetrain, a local Seattle company that is using peer production to create a natural language technology. I am not quite sure how it works, although I have signed up for the beta. From their About page (emphasis mine)
Phrasetrain is a small new technology company in Seattle, Washington. Our vision is to draw on the power of peer production to create a new kind of natural language technology, and to use that technology to improve text search on the web.Some natural language companies claim to create artificial intelligence. We make no such claim. One of our core principles is that the genius of language is in people, not machines. We want to aggregate the linguistic intelligence of our users in a way that benefits all of them.
Now, I am not sure this would work at all, but combining the two concepts mentioned above with the knowledge of the life science community could help us create community powered resources, open to all, with some level of quality. Perhaps something like this exists, but not quite like what I have in mind. In the comments to Pierre’s post on Freebase, Pedro remarks “Most of the boring job in bioinformatics is getting hold of the data, …”. What if we could put the data in Freebase, or something similar, and then combine the semantic power of that system with a community-generated approach, our collective intelligence as it were, to curate and annotate the data. Wouldn’t be perfect, but it might just work.
Normally, I am not the biggest supporter of the wisdom of crowds, and perhaps I am to vain, naive or both, but if there was a collective wisdom, perhaps it might just be found in the scientific community.
Further reading:
Bringing the wisdom of crowds to peer review
Chris Anderson on the “Wisdom of Crowds”
Freebase – The scientists perscpective
Technorati Tags: Wisdom of Crowds, Phrasetrain, Freebase, Bioinformatics



4 Comments
We’ve already had some (fairly) large-scale loads of bio-oriented information. From one of our users, Dan Ruderman:
“I’ve uploaded data from the human genome project and some annotations. These include the genes and their locations on the genome (when known). The Gene Ontology groups and hierarchy are also now online, with membership info for human genes and evidence codes. Better links for citations will be added soon (e.g. links out to public web pages for genes, Pubmed for publications).”
If anybody would like an invitation to Freebase, post a reply (if it’s OK with the host) and I’ll send you one.
Robert
First of all, I am looking forward to meeting you in person at Scifoo in a couple of weeks. I didn’t know that Dan had uploaded all that data. I need to go back in and start playing.
Those who want Robert to send you an invite, please leave comments here.
Also feel free to email me directly if you want to get an invite. I have a few spare
I was about to write that I would love to consider Freebase for some life science data, but did not have access… maybe someone could float an invite my way? ryan.raaum (at) gmail
I was about to write that I would love to consider Freebase for some life science data, but did not have access… maybe someone could float an invite my way? ryan.raaum (at) gmail