Fork me on GitHub

Mining for images in bioscience publications

The latest Bioinform has an article about a team in UW-Milwaukee that is developing methods to retrieve images via text-based queries (sub required).

According the article, other like Phil Bourne are engaged in similar activities (bbgm has written about the BioText project in the past). It is clear that at this time, projects like these are required, since historical online literature lacks the relevant structure and metadata to make our task easier, but it is time that publishers thought ahead about some of the advantages of online publishing. Whether it is via Semantic Web methods, or simpler markup, it is possible for us to capture structure, metadata and connectivity to derive relationships between images, image captions, and text. However, before we even get there, journals need to start thinking of the web as their primary publishing platform (e.g. PLoS). One has to think of scientific literature as a linked data web. In fact the more I think of it, I think of scientific publishing as a formalized distributed Wiki. In other words, we should have the ability to modify information to our original source material if we find additional information that makes an impact. That say science always stays up to date. Perhaps that is not practical, and it removes the ability to publish multiple papers, but until we think of the web of science as a web of linked data, we are really hurting our own ability to maximize the benefit from scientific research. Especially in the life sciences, with networks of connectivity between proteins, genes, function, drugs, often published in different papers, being able to construct a graph without hiring people to curate all that data by hand is not only desirable, but in my book required.

Take home message: Journals better adopt publishing platforms rich in structure and metadata and publish for the web first and print secondarily. We are almost in 2010, time to start thinking like that.

Picture via Eskimoblood under a CC license

Technorati Tags: , , ,

This entry was posted in Open Science, Semantic Web. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

blog comments powered by Disqus
  • Archives

  • Disclaimer

    All opinions on this blog are my own and do not reflect those of my employers, past or present