Electronic notebooks are cool, and so is RDF

July 31, 2008

Had a conversation earlier today, all about RDF and linked data. I am a big believer, which is why posts like this one by Cameron Neylon on A new way of looking at science? bring a smile.

Andrew Milsted, a PhD student, enabled an RDF dump of the content in the lab notebook used by Cameron’s group (and others I suspect). The result, a graph that shows each post in the notebook as a node and links between posts as edges. It is a universe of the work going on in the lab, and how that work interacts. It would be interesting to see the dynamics of this graph evolve, and various other ways of visualizing the underlying data and relationships. It would also be cool to put this up on the web as linked data and link it to data outside Cameron’s lab. Might even lead to some very interesting observations and relationships.

This is a simple example, but highlights why it is so important to be able to put data into machine readable formats. RDF is a naturally good model, since it highlights relationships within the underlying data.

Zemanta Pixie

Industry watching: The serendipitous (and lean) future of pharma

July 31, 2008

A statue of Asclepius. The Glypotek, Copenhagen.Image via WikipediaMy graduate advisor’s favorite word, or at least one of the more popular ones, was serendipity. He was a firm believer in the role of serendipity in science, and personally I believe that serendipity plays a big role in discovery, of any kind. So when Richard Jones pointed to a story in the Financial Times entitled Drug Research Needs Serendipity, it naturally piqued my interest.

The article has an ominous start

The molecular revolution was supposed to enable drug discovery to evolve from chance observation into rational design, yet dwindling pipelines threaten the survival of the pharmaceutical industry. What went wrong?

Lest you think that this is some journalist writing a story, look at who wrote the piece. David Shaywitz (a well known writer covering health and medicine and a physician) and Nassim Nicholas Taleb are not lightweights. The high level answer comes right away

The answer, we suggest, is the mismeasure of uncertainty, as academic researchers underestimated the fragility of their scientific knowledge while pharmaceuticals executives overestimated their ability to domesticate scientific research.

Much as I criticize Singularitarians, etc for underestimating how limited our knowledge of human biology is, it is true for the scientific community as well, although I will add that a big chunk of scientists, perhaps most of them, realize how little we really know. Unfortunately, in our eagerness to get grants, venture capital and improved stock prices, many, tend to look past the reality that I suspect we are well aware of.

I forget who it was, either Pedro or Jonathan Eisen, who noted once that we have done a great job developing analytic technologies that generate data, but need to do a better job making sense of all that data. I’ll add that we not only need to do a better job making sense of the data, but converting all that data into actionable information.

And this is where I take umbrage at some of the language in the article and call out the authors for the fragility of their own understanding. They write (emphasis mine)

Medical research is particularly hampered by the scarcity of good animal models for most human disease, as well as by the tendency of academic science to focus on the “bits and pieces” of life – DNA, proteins, cultured cells – rather than on the integrative analysis of entire organisms, which can be more difficult to study.

The authors note that the “integrative analysis of entire organisms” is difficult. What they fail to note is that academic scientists are not simply focused on the bits and pieces, and haven’t been for a while. A good chunk of the community is trying to understand biology at a much more holistic level, but it is HARD. What is the level at which we need to study entire organisms when there are so many gaps in our knowledge. I would argue that people do “integrative analysis” just for the sake of it from time to time, without really understanding why they want to or need to do it. Contrary to what Chris Anderson might think, a lot of data does not lead to better science.

I also take umbrage by the slighting of scientists in the pharma industry. I have met many. Are pharma companies perfect; hardly. Are they too slow to change; absolutely. But there are some brilliant scientists working there, and many companies are trying their level best to figure out what the best way to utilize this knowledge is, but I would like to remind the authors about what they say themselves, it is hard, and the risks and costs associated with drug development result in the management of these companies being hesitant to throw the kitchen sink at developing drugs exclusively using pharmacogenomics or other techniques enabled by the glut of data and the scientific advances of the past decade. I will agree on one thing though; pharma has to move away from rigid planning faster and towards the proof of concept/first in man approaches that many are now beginning to take.

I really liked the second part of the article, where among other things, the authors advise the industry to embrace serendipity. I completely agree that a number of companies have taking the wrong approach. The declining productivity will not be solved by increased efficiency. That implies that inefficiency is the major cause of that decline. It is a cause, a symptom of bloat, but not the cause. Better drugs will be the result of better science and changes to the models by which the industry, academia and the drug development ecosystem function.

The pharma industry is ripe for disruptive innovation. I continue to believe that the role of big pharma will increasingly become akin to that of a system integrator, with small biotechs, service providers and external, distributed research forming the backbone of the system. Perhaps we need more thinking along the lines of Robin Spencer

While I might not agree with some of the statements made by Shaywitz and Taleb, I agree with the overall message and some of the directions the biopharma industry needs to take. Will it happen before the industry falls down under its own weight?

Zemanta Pixie

The confusion over data rights

July 30, 2008

As a side note, I talked to a colleague who got harassed at the Ichs and Herps meeting for… gasp… downloading sequences from GenBank and using them without asking the author’s permission! Good lord, what is the world coming to? I’m surprised to hear of such active resistance to public availability of information.

Paulo Nuin pointed to a blog post on Phylota on Friendfeeed earlier today. The post is interesting in itself, but paragraph above, which was an aside on the post, blew my mind away. There was a time when I had the naive opinion that academics were all about the open dissemination of science, especially the sharing of basic scientific data. Alas, it turns out that for some the public domain is not exactly that. I suppose that this is a minority opinion, but it is clear that the confusion about scientific data and ownership needs to be resolved and fast. It should be obvious, but it isn’t and even those of us who should know better get confused. In the above case, if there was a paper where the data source had not been cited properly is understandable, but downloading and using sequences; Yowza!!!

There is a distinction between data and content/information. Too many people have trouble making the distinction and as a result there is confusion the ownership rights around the two. Anyway, this issue isn’t going anywhere soon it seems.

Zemanta Pixie

The accelerated world of molecular simulation

July 28, 2008

The PlayStation 3 Folding@home client displays...Image via WikipediaIt’s getting pretty clear that GPUs have a big place in the future of molecular simulation (and the cell processor). NAMD, SimTK, Gromacs via Vijay Pande’s Folding@Home, etc are all pursuing acceleration as a core part of their efforts. I have had informal discussions with a few people actively running GPU-based simulations

For the first time since MM-PBSA came around and people started applying MD to study large scale molecular motions (just see work by Benoit Roux and Georgios Archontis), I am really encouraged about the state of molecular simulation. GPUs, acceleration and large scale distributed computing, coupled with special purpose machines are poised to give the field a short in the arm. I have always been annoyed by the fragmentation in the field, but I really like what I’m seeing over the past several months; a focus on building good engines, writing good software and emphasizing usability and performance. I still maintain that molecular simulation is the scientific field that I have enjoyed the most, but the lack of innovation had made me a little cynical. While I don’t see the impact on drug discovery/design quite yet, we are getting close to that point in time. Will be watching the space like a hawk.

Disclaimer: My day job is very much all about large-scale distributed computing

Zemanta Pixie

Getting geeky with Ruby and Python on Friendfeed

July 27, 2008

RubyImage via WikipediaFriendFeed might be a subject of silly Silicon Valley blogger debates, but for the life science community it’s a lot more. The Life Scientists is a poster child for microcommunity, the ISMB2008 room was a wonderful example of a group of people coming together to make a conference come to life, and there are other rooms as well, e.g. for BioBarCamp

Now, something a little different, with the launch of two new rooms, Ruby for Bioinformatics and Python for Bioinformatics. More and more Bioinformatics/molecular modeling types are using those two languages. I am busy trying to recast myself as a wannabe Ruby hacker, so the opportunity to learn from people like Matt Wood is something I am looking forward to.

I quite like Ruby. It’s got some great functionality, a number of web frameworks and other associated utilities, and a wonderful community. While in the end, I don’t really think languages are religion, it would be nice to see more Ruby and associated frameworks in the life sciences. Perhaps a room like this will expose more people to the language.

Someone just pointed out that there is a room for R for Bioinformatics as well, so all you stats types, head there.

Zemanta Pixie

Next Page »