Bio::Blogs #16 in all its pumpkin glory
October 31, 2007
A new host and Pawel does a fine job. Did I mention that his header is a surface rendering
Technorati Tags: Bio::Blogs
A Bio::Blog-y Halloween
October 30, 2007
I should come with with a bio-themed pumpkin, but way too lazy to do so.
Bio::Blogs #16 is due by the end of day, October 31, and will be hosted at Freelancing Science (Do they have Halloween in Iceland Poland?)
Technorati Tags: Bio::Blogs
Some random titbits
October 30, 2007
Hey it’s not going to be all long form posts and video.
Oh so true (emphasis mine)
Even though the buzz about e-science often focuses on massive hardware, user interfaces, storage capacity and other technical issues, in the end, the ability of e-science to serve the needs of scientific research teams boils down to people: the ability of the builders of the infrastructure to communicate with its users and understand their needs and the realities of their work cultures.
Maybe this is what the UMPC was destined to be
WIRED Science has a Twitter feed
By way of Sandra Porter, I learned about the Research Blogging icon
Technorati Tags: Research Blogging, Wired Science, UMPC, e-science
The value of feature extraction
October 30, 2007
I haven’t mentioned Twine in this space, which may come as a shock to some, none more so than myself. Well, there are good reasons for everything. For one, I haven’t had the time to sit down and take stock. Secondly, I wasn’t been able to elucidate my thoughts in any coherent manner. So let us do this the hard way, by taking a step back.
Semantic analysis has been a staple of bbgm. I am a strong believer in the power of machine readable data, and the potential power it can provide. Whether it be a system like Freebase that allows you to add structure to data, or the kinds of entity extraction and data contexts that Jon Udell and Jeff Jonas talk about, making your data more “intelligent” is something that we should strive for, whether it be in the world of business, or the world of life science (or as Andrew Walkingshaw might point out, materials science). My old colleagues at Scitegic have a motto - “Ask more of your data”. That has always resonated with me. So lets take yet another look at how to make your data smarter, especially when the semantic web seems to be inching towards more mainstream acceptance.
Let’s start with a quote from a talk on Ambient Findability
- For every search on cancer.gov, there are over 100 cancer-related searches on public search engines.
- Of these searches, 70% are on specific types of cancer.
There is another statement of interest in the same talk
… the ability to find anyone or anything from anywhere at anytime
The above statements bring to mind the subject of context. Let us agree that “data finds the data“. In that case we must also agree that data must be found in the correct context. . Don’t believe me, just ask Jeff Jonas. In my mind, if machines are to do this, semantic markup of some sort is the only way. Extracting information from documents, regardless of format, whether they be text, images, video, is one of the key challenges of our times. In the life sciences, right now, I don’t really know of any ways (if someone knows of any, let me know) that someone can extract the meta-data from an image or a video, and correlate it to meta-data in a set of text files and automatically come to a conclusion about the potential context of the two observations. I talked about Persistent Context for the life sciences in the past. Let me steal another of Jeff’s ideas, that of Sequence Neutrality. Essentially, “context engines must constantly be on the lookout for new observations that change earlier assertions – and if a new observation provides such evidence – the invalidated assertions from the past must be remedied.“. Context and feature extraction together make a very powerful mix, which can help pharma companies find better, safer drugs faster. This is especially critical in the kind of healthcare environment taking root today, with an emphasis on pharmacovigilance, early safety assessment, etc. If we can continuously update our safety databases based on new data, we are likely to identify adverse events faster, and essentially could carry out constant meta-analysis.
Jon Udell in a post commenting on Tim O’Reilly’s review of Twine talks about entity extraction and a firefox plugin called Gnosis. I had heard about Gnosis before, but only looked at it askance. However, Jon’s post made me take a second look, and all I can say is WOW. Take a look at the screenshot below. It shows the features that Gnosis extracted from my blog post on pharma futurology. The interesting thing is not the actual results, but the concept. If you could do the Freebase thing, and add additional information which gets stored in a dictionary somewhere, you have that much power available to you. Just as a note, for more complex pages, Gnosis is not always accurate, but the potential is obvious. You can also perform additional queries based on the extracted features. There will come a time when the options available will be that much more powerful. Adaptive Blue’s BlueOrganizer also takes a similar approach, recognizing books, websites, etc.
Which brings us to Twine, Radar Networks to be released semantic web solution. Announced at the Web 2.0 Summit, Twine is at heart an information manager, but with the potential to be a lot more. Like Gnosis, Twine performs entity extraction on documents. Unlike Gnosis, at least as far as I can make out, the power of Twine comes from the crowd effects. Once a group of people have collected a large set of documents, you can use the tags associated with a document to screen through everything else that your collection, or twine, might contain about that tag. Twine also bases it’s extraction based on your browsing behavior and can handle various media types. Of course, the devil lies in the details. Without testing Twine, I am not sure how useful it can really be, but it is definitely promising. Since it uses natural language processing, machine learning and network effects, I almost think about it in the same way as might look at systems from Linguamatics or Biowisdom, which perform text analytics, except that Twine does so much more. Another thing I like about Twine is that it is queryable using web standards (RDF, SPARQL, OWL, etc).
I see a lot of potential for Twine. In a RRW post, Nova Spivack of Radar Networks mentions that content providers have expressed interest in Twine. I presume Twine can become aware of the various life science ontologies, the markup in PLoS journals, OTMI, etc. In fact both Gnosis and Twine have a lot of potential in the publication area. A semantic plugin that allows you to find papers by an author, similar articles, or use a keyword to seed a search could be quite useful.
The possibilities of collaborative research, finding data types and relationships across an organization, etc become a very real and somewhat simple possibility. Now Twine is just an example. Conceptually, this is easy to grok. The success of these applications lie in the implementation and how they can be made accessible for a wider (read non-developer) audience.
One of the problems I see is that there are several models out there. Freebase is a queryable datastore, on top of which you can build structure. Twine is a smart entity extraction and behavioral analysis system. At Web 2.0, Nova Spivack suggested that Twine would tie into Freebase, and that I think is very important. By themselves, all we end up with are silos again. If Twine, Freebase, Gnosis, etc can somehow be linked together, we have a lot of power available to us, especially if we throw into the mix text analytics packages. I am inclined to agree with Spivack (and disagree with Danny Hillis) here; the web is the platform. Platforms like Freebase should only be built on top of it.
Further reading:
Freebase - The scientists perspective
The semantic web goes mainstream
New era of semantic apps
When web sites become web services
Technorati Tags: Twine, Freebase, Semantic Web, Jeff Jonas, Jon Udell
Trendspotting: The future of biopharma
October 29, 2007
George Laszlo has a post on his blog on a report on the future of the biopharmaceutical industry by BT …. yes BT. So what is the report all about. To try and come up with an unbiased view, I actually didn’t read Laszlo’s post, but decided to read the report and voice any opinions.
First of all, the report is definitely an interesting read, and if you were wondering why BT would be interested in commissioning such a study, the reasons become obvious fairly early. BT is betting that telecommunication is going to play a major role in our electronic/digital healthcare futures, and they want to understand what their role might be. Secondly, there is a lot in the report, in that it is likely to give rise to follow up posts on a bunch of subjects. Here I want to do two things (a) provide a sense of the assumptions and conclusions in the report, and (b) continue with my experiments in using new ways to communicate thoughts and ideas. Lets see how it all works out.
The report is entitled Pharma Futurology. Joined up healthcare . As the name suggests, the report is definitely futuristic. Just to define the term (from the report)
Futurology – the study of the future – involves critical reasoning about the way things will develop based on observations of the present, while considering the path that development has taken to get to this point.
In this case, the future is 2016. Like many reports of this nature, this report also tends to overestimate the pace at which new technologies are making an impact on healthcare. The focus of the report is actually one that makes a lot of sense. Pharma is on trouble, for many reasons varying from a lack of public trust, a lack of pipelines and changes in the way healthcare is likely to be administered in the future. The core argument of the report is that the biopharma industry needs to change (something I believe most people will agree on) by becoming one part of the healthcare ecosystem, one driven by communication technologies and the increased digitization of patient information.
While I tend to agree with many of the premises, and even the solutions, there are some words of caution. In silico modeling and simulation will make a huge impact on bringing drugs to market faster some day, but that day is not a decade away. At least not at a level where a significant chunk of our drugs will be developed with biosimulation and other computational methods playing significant roles in reducing trial costs and safety testing. Too much has to change for that to happen, especially in a world where it still takes a decade to bring a drug to market.
I do believe that communication technologies, EMRs, PHRs, etc will start having an impact in this timespan. Not everywhere, but the use will be at a level that the average consumer will start feeling it. In a way there are two challenges here. While we will have a more educated patient population, one that hopefully will be better positioned to make decisions, we must not forget about the doctors. In an environment where patients will demand more participation in their treatment regimens, where sensors and diagnostics will help in improved decision making, physicians will need to redefine their roles. I still worry about what this role might be. The biggest fear is that patients, perhaps being marketed to directly, will “think” they have the answers, but will lack sufficient medical knowledge to come up with the right decisions. Call me quaint, but I still believe that the medical process needs to become a conversation between the patient and the physician about the best options, not one where the role of the doctor is just figuring out dosage and making sure that any diagnostics are being interpreted properly.
As expected, personalized medicine is prominent throughout the report. I have been on record as saying that the pharma industry has to think outside the blockbuster model. The report seems to agree with the idea that a portfolio of drugs treating the same disease for different subpopulations seems to be the inevitable way to go (perhaps slight different formulations or compounds). My only disagreement with that hypothesis is the timeline. I think true personalized treatment regimens are at least 20 years away from being the majority of drugs being released. However, during that time we will be able to refine our methods, and gain experience from understanding drug response and getting used to the idea that a drug that treats 30% of the population need not be a failure. The achievement will be to reduce the cost of developing targeted drugs to make the economics more attractive (it’s pointless spending the kind of money being spent on drugs today to end up with one that’s only going to work on a 3rd of the potential market).
Perhaps the most interesting idea in the entire report was that of outcomes-based payment. In other words, a drug that does not work will not be reimbursed. It’s something I haven’t thought about sufficiently to comment further, but hope to do so sometime in the future.
In the end, we are faced with the reality that healthcare will change. I do not necessarily buy into the idea of nationalized healthcare or a single payer system, but the ecosystem of healthcare, one where discovery, development, trials, treatment, and post-market intelligence are inexorably linked does not fit into the current payment model, nor do personalized medicine or targeted treatments. The take home message; that the pharma industry cannot afford to become isolated from this healthcare network also rings true.
I encourage everyone to read the report. If nothing else, it provides an insight into a future, a future that is almost certainly further away than 2016, but closer than some of us might realize. There are very interesting sections on implantable sensors, and sensors in clothing, etc, as well as discussions on data mining and semantic analysis. There are discussions around IP, open data, etc. There are also some amusing moments like the following
incentivising patients to remain on their schedule, e.g. with a free ringtone or a free game download for children
Somehow, I think that is not going to work.
There are a lot of interesting anecdotes and insets in the report. In the following video, I talk about a tele-care pilot in Liverpool, the “cloud”, big brother and why it brought targeted advertising to my mind.
Direct Link for RSS Readers
Future discussions around this report will focus on specific trends in tele-medicine, forecasting, and personalized medicine and patient participation.
Further reading:
The future of scientific computing
Microsoft 2020 report
IBM Healthcare 2015
IBM Pharma 2010 - Silicon reality
Technorati Tags: British Telecom, Pharma, Futurology, Trendspotting, Healthcare



