The web as platform: WikiProteins
May 28, 2008
WikiProteins is all over the web, including BoingBoing and Ars Technica (and of course all over my FriendFeed). This is the first project by WikiProfessional, essentially a Wikipedia for specific content (not unlike the idea of Wikipedia focussing on scientific topics at a high level and pointing to other sites for more technical, domain-specific detail). Sound familiar? WikiProfessional has some of the same ideas as Google’s Knol project (where is that?). The idea is to build a concept web of knowlets. In order to achieve that, MediaWiki has been extended to help some of those underlying relationships to be captured. What I think is missing (and I am not a 100% sure about this) is a true RDF backend, which would really make this phenomenal. The cool part, the current Concept Web as they call it, is all about the life sciences.
To a great degree, this is what the web and science should be all about; Pulling in data from different sources to build a new resource. WikiProteins pulls in data from other sources, e.g. Pubmed. This is why, IMO, every biological content site should have a RESTful API. Let me go one step further and say that every biological content site should provide access to the data in RDF, then we can truly say we have a linked data web.
WikiProteins comes from some heavyweights. Anytime your PI is Amos Bairoch, and Jimmy Wales is a co-author, you know this is serious stuff, and I really like what they’ve done. In some ways, this is better than the Encyclopedia of Life, at least when it comes to making things accessible and available. Here is the abstract for the paper in Genome Biology
WikiProteins enables community annotation in a Wiki-based system. Extracts of major data sources have been fused into an editable environment that links out to the original sources. Data from community edits create automatic copies of the original data. Semantic technology captures concepts co-occurring in one sentence and thus potential factual statements. In addition, indirect associations between concepts have been calculated. We call on a ‘million minds’ to annotate a ‘million concepts’ and to collect facts from the literature with the reward of collaborative knowledge discovery. The system is available for beta testing at http://www.wikiprofessional.org.
Sounds just like what the doctor ordered in some ways, especially for a protein person like yours truly. A search from one of my favorite proteins, bacteriorhodosin, yields a knowlet, already populated with a ton of info (note that the information has not been added manually, but automatically, but once there, “experts” can edit the information). The knowlet is information rich, although it is sorely missing structural information. The publications chosen are also not necessarily the first ones that come to mind. I wonder how they select relevancy? There is a nice visual histogram which allows you to select various pieces of information extracted from the underlying data, concepts, and classifying them as well (whether they are predictive, factural or a co-occurence)

This is probably a good time to describe knowlets and concepts. From the paper
In WikiProteins each concept can be edited by the community. Each concept page is hyperlinked to the Knowlets of all concepts mentioned in that page. A Knowlet stores relationships between a given source concept and individual target concepts. The various relationships (F, C and A) between two concepts are computed into a single composite value, named the ’semantic association’. The technology allows the coupling of all Knowlets into a larger, dynamic ontology called the ‘concept space’
The paper has a nice figure showing how they arrive at these concepts. The next section though is what really gets me excited (emphasis mine)
Knowlets and their connections can be exported into standard ontology and web languages such as the Resource Description Framework (RDF) and the Web Ontology Language (OWL). Therefore, any application using these languages will enable the use of Knowlet output for reasoning and querying with programmes such as the SPARQL Protocol and RDF Query Language. The concept space is provided in open access. The system performs a recalculation of the semantic relationships in the entire biomedical concept space at regular intervals.
Also take a look at the linker, which adds concept web capabilities to a number of resources, including PubMed
That’s all I have time for right now. More later, after I’ve had a chance to play.
Technorati Tags: WikiProteins, WikiProfessional, Semantic Web, Proteins, Ontologies




Add New Comment
Viewing 1 Comment
Thanks. Your comment is awaiting approval by a moderator.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Add New Comment
Trackbacks