Will data be our undoing?
July 27, 2006
No, not our favorite android from Star Trek:TNG, but all those letters and numbers (and increasingly images) that so dominate the life of anyone involved with the biological sciences, whether as a scientist or from the business end.
In recent days there have been a number of posts around the blogsphere on data in bioinformatics. The one that especially caught my attention was Duncan’s post on nodalpoint where he talks about the questions that Peter Norvig asked Tim Berners-Lee at the AAAI meeting (I confess to being somewhat envious of those who got to watch this Q&A session). Peter is not a big fan of the semantic web, citing a number of reasons for his negative opinion (for more read Duncan’s post). Those objections, Sir Berners-Lee’s response and a couple of other posts on nodalpoint and notes from the biomass (good to see you back Roland), got me thinking about data, information interchange, and standards.
It is no secret that many scientists and software developers have been drive batty ever so often due to the inability to read certain file formats in their favorite application. Since the sequencing of the genome, the number and variety of databases has proliferated. Unfortunately, all this has done is result in a mushrooming of the number of data formats and databases with similar or complementary data types that don’t really speak the same language. I would wager that a number of life science projects undertaken by integration service providers like IBM and SAIC deal with data integration related issues. Similarly, novel data pipelining applications have also found a lot of use in trying to integrate disparate data sources in real time. The question that keeps coming up is … why?.
It would seem rather obvious that using open data and communication standards (I still like the concept of semantic web approaches) would make bioinformaticians (and cheminformaticians) the world over a lot more productive. The value of data lies in the interpretations one can make, and by not adopting standards, I feel that the entire field is selling itself short. As we add more “omics” to our lexicon, it becomes even more important to develop a universal ontology for biological (and chemical) data (or a small set of core ontologies). Is there room for a non-profit, non-partisan organization that takes over the responsibility of maintaining these standards for the field at large?. I have heard some ideas being tossed around for a while but nothing has really caught on. In some cases, a format becomes the de facto standard, e.g. the FASTA format. In other cases organizations can agree to merge standards, or some of the leaders in the field form an organization to facilitate sharing and establish standards. There have been cases where the standard that an organization wants to adopt runs into difficulty. As the PDB has found out, moving from an existing, familiar, standard to a new one (pdb to mmCIF) is not as simple as it sounds, since in the end it is the user community that makes the decision to support one format over the other.
I fear that the future will only be more confusing. Biosimulation, pharmacogenomics, systems biology, etc are only going to muddy the waters further. What does this mean? The time is now. Data standards and communication are obviously on the community’s radar, but the field is still a long way from making life easy for its own members. For our own sakes we need to sit down as a community and decide to take action to make communication between different data, experiment types and applications as easy and logical as possible.
My personal biases are towards XML-based formats and approaches, which can be developed in collaboration with the W3C. Biology is becoming increasingly web-based and communicating in the language of the web would appear to be the most logical approach. Whatever we do, we should try our best to avoid binary data formats.
Footnote: I hav a few words to add to the whole document vs. database discussion. I agree with one of the comments on nodalpoint that part of the problem is an unfamiliarity with databases. It is a different mindset which might be easier to reach with more usable tools. With my business hat on, I find that I work best when I query a database to return the rows and columns I want, which I can then pull into excel for further analysis. More recently, I have automated some of these processes, but Iit is not too much of a leap to imagine how intimidating it might be for many people.
Further Reading:
The Gene Ontology
An ontology for macromolecular structure
Are the current ontologies in biology good ontologies?
An ontology for bioinformatics applications
Technorati Tags: Bioinformatics, Ontology, Data, Data Standards, XML, Semantic Web, Science
Protein simulation: At a crossroads
July 22, 2006
Almost ten years ago, I ran my first molecular dynamics simulation of a protein (bacteriorhodopsin). In these ten years the field has changed a lot. Simulations were routinely run for a few 100 picoseconds, in vacuo. Today, a few nanoseconds in a box of water molecules is typical (using such improved methods as Particle Mesh Ewald for improved treatment of the electrostatics.
There are many reasons why this has happened. New features, improved programming, competition between programs like CHARMM, AMBER, NAMD, GROMACS, etc. The most significant change however has been the commoditization of high performance computing. Moore’s law, steadily decreasing costs, and the Linux cluster have allowed scientists to pursue new methods and theories, which would have been far too costly to run in the past. However, despite all these advances, I sometimes wonder if the field has stagnated.
While it is true that today academics are running larger and larger simulations to try and understand the structure and function of biomolecules, but in industry, many scientists are still far too comfortable using lower levels of theory to get results faster. There has been some acceptance of higher order methods, but not to the extent that one might have expected. The field of structure-based drug design is fertile ground for scientists to develop new theories and methods, and to some extent this has happened. There is a rich body of work that has been published in recent years on “physics-based” methods to evaluate protein-ligand interactions. However, this has not translated into success at the industrial level, where such methods can really make a difference. And strangely enough, I think this is because computers are not fast enough. The kind of throughput required by pharma requires some shortcuts to be taken, compromising the quality of the results. This means that the best techniques are still not being developed, and more approximate methods are being pursued. Are these methods useful? Absolutely!!!! I spent quite a bit of time looking at methods such as MM-PB(GB)SA and LIE, two of the more commonly used techniques. These methods just scratch the surface of the utility of higher order methods (taking a number of shortcuts), but demonstrate how more expensive methods can improve the results from in silico approaches, if used appropriately. However, to change the name of the game, we need to rethink how we are taking advantage of modern hardware. Multicore CPU’s, FPGA’s, efforts like Blue Gene or the computer being built by D.E. Shaw, are but steps towards bringing a new generating of scientific computing to researchers. These are not commodity machines, but perhaps it is necessary to spend some money on special resources to get special results. Researchers, developers and hardware vendors need to work hand-in-hand to identify core needs and develop the appropriate hardware and software. The costs that the market can bear will be very critical to these developments, which leads me to believe that there will be two “camps”
Ultra-specialized hardware: Machines like Blue Gene, and machines from Fujitsu and D.E. Shaw come to mind. All have specialized or modified software that take advantage the architecture of the machines. These machines are (or will be in the case of the Shaw machine) expensive, but will find a niche for special projects, especially if on-demand computing catches on in the community. This would be somewhat of a return to the expensive supercomputers of the 90’s where users had to purchase time to run longer simulations
Plug n’ play hardware: This term applies to the kinds of hardware manufactured by companies like ClearSpeed and the MDGRAPE card from RIKEN. While these are yet to find common user, perhaps it is time that such hardware became more prevalent as these can be combined with commodity hardware to create superfast machines for specific applications. Of course, I am still waiting for someone to figure out how to use graphics cards for MD simulations.
Technorati Tags: Scientific Computing, High Performance Computer, Hardware, Software, Architectures, Molecular Dynamics, Protein Simulation
The new rules: A pathway to success in the biotech industry?
July 21, 2006
By now, I am sure almost everyone has seen the article in Fortune about tearing up the Jack Welch playbook (found via the Business Innovation Insider). Like many others, I have admired Jack Welch and read some of his books, but over the past few years, I have become an even bigger believer in what is now often refered to as The Long Tail, the term coined by Chris Anderson to describe business models that focused on a niche market. If you look carefully at the Fortune article, it may come as no surprise that the new rules are very much in keeping with Long Tail principles.
Which brings us to the world I live in, the wonderfully complex and diverse life science industry. In a landscape characterized by a few large pharmaceutical companies, some established biotechs and a number of small biotech and pharma startups, can these new business rules work? Or perhaps the question that should be posed is how small companies can apply these new rules to building successful businesses. My advice, for whatever it is worth, is to look at rule #2, rule #4 and rule #5. It is also very important for companies to know who they are. Much too often a company loses its identity as it begins to grow, leaving the people working there confused and uncertain, and often potential customers as well. Good leadership should ensure that does not happen, especially in a VC driven industry where investors are always looking for short term wins. The biggest risk is in jumping onto a bandwagon, something VC’s are apt to support. While this may sound like a contradiction of rule #4, I think that the context is very different. A company’s identity is defined by its leadership and the people who work there. There should be a common goal or spirit wihich the people working there can identify with, perhaps something as vague as “we know more about diabetes than any other company on earth”
In the end a young life science company (any company) should be able to answer these questions:
1. What makes us different? In the late 90’s there were so many companies offering similar, genomics-based target discovery services, that it was difficult to tell one apart from the other. I think easy VC money and the failure to look outward, i.e. an emphasis on the technology itself, rather than focusing on customer impact was a big reason for many of the failures that were seen.
2. Do we have a long term growth strategy? At some point a company’s core technology is not going to be sustainable and organic growth will no longer be sufficient. Rather than following a predictable path, a company should look at providing unique solutions/products. This requires keeping a finger on the pulse of the customer, recognizing needs, and filling a new or expanded niche. In other words, the growth strategy should include the ability to innovate and satisfy market needs. Sometimes company can be so innovative as to define a market, but those situations are very rare.
3. Are we willing to realize that a chosen path might be the wrong one and adapt? This is best done early in the process, and recognizing such errors is part of a company’s growth and success
I am sure there is a lot more questions that others can suggest, and I would love to hear debate on whether niche solutions will work in the crowded life science space, where the barrier to entry is often low (and too much innovation is of the “me too” variety).
Further Reading
Business Innovation Insider
Deal Architect
Technorati Tags: Business, Management, Innovation, Business Models, Long Tail, Jack Welch, Fortune Magazine, Biotechnology, Biotech, Life Science
Non-profit organizations can be original
July 17, 2006
If you had told me that American Apparel had opened a virtual retail store within Second Life, I would not be surprised in the least (which of course they have). However, I did not expect the American Cancer Society to tap Second Life for fundraising (Source: Micro Persuasion), I have only briefly flirted with the virtual world of Second Life, but people are definitely using it in some rather interesting ways; and if you are hip enough to get a cover story on Business Week, it definitely means that Second Life can’t be treated as just another MMORPG
Technorati Tags: Second Life, Advertising, MMORPG, Business 2.0, Communication, Cancer
Its all about the mistakes stupid!!!
July 14, 2006
One of the recent cover stories in businessweek focused on how the best companies embrace failure, That got me thinking about failure and science. I am sure everyone who has ever spent time doing any sort of research knows the taste of failure. If I start counting the number of successful experiments and the number that failed from my graduate career, I suspect the latter would be the greater number, especially in the beginning. I would go as far as to argue that one learns more from scientific failures than from success. Failure in the scientific context means going back to the drawing board and trying to understand what went wrong. It is that in-depth understanding that begets success. Maybe more scientists should be captains of industry, since for us, that understanding and examination of past failure is part of who we are.
Like every situation there are caveats. I can remember times where I should have realized that rethinking my approach was the appropriate path, but the stubborn desire to make something work ended up with a couple of months that could have been better utilized. However, that lesson learned in graduate school has served me well in my professional life. There is nothing wrong with failure .. just fail as early as possible (and don’t make the same mistake twice!!!)
Technorati Tags: Science, Success, Businessweek, Research


