Strides in protein design
March 9, 2008
Tech Review has an article highlighting work by David Baker and colleagues. Baker and co-workers have de novo designed a handful of non-natural enzymes that successfully catalyze a specific chemical reaction, one with no naturally known enzyme. The work, published in Science, uses a pretty elegant procedure. I will update this post after I get a chance to grab the paper tomorrow.
1. First a putative active site was designed using geometric and chemical restrictions
2. Following active site design, a set of proteins containing such functionality were designed (the Baker groups call to fame). The proteins were scored and ranked on their ability to accommodate the reactants
The group followed up by synthesizing 72 of the designed enzymes and found that 32 of the proteins were able to catalyze the reaction with the best getting a 10^4x speedup. While that number is significantly lower than naturally occurring enzymes, just the fact that the Baker group was able to do so and with a pretty decent success rate (very decent in my book) is saying something. Like Baker, I think a combination of directed evolution and in silico prediction might help design more effective proteins. In addition, at least for this particular example, for the computational methods to be really successful, there has to be some accounting for electronic structure.
It’s good to see the field take some steps after being relatively stagnant in recent years. We need to look at problems in two ways, de novo, and by modifying/enhancing existing proteins.
Technorati Tags: David Baker, Protein Design
Ligand docking memories
February 29, 2008
While I was at Accelrys one of the things that I tried to push was an increased use of force fields, MD and more physical approaches for molecular recognition problems like ligand docking. It’s always good to see some of those thoughts and early proofs-of-concept become reality

Technorati Tags: CHARMM, Ligand Docking, Flexible Docking
Using sparse structural data for better structures
February 7, 2008
Over the years, people have tried to use sparse structural data as constraints for better MD simulations or to improve structure prediction with varying degrees of success. It’s definitely an intriguing area of research, with lots of potential. I have also long been interested in low-resolution modeling (given that I got my start in industry doing that, it’s not surprising). So I found this recent paper by Schröder, Brunger and Levitt (nice lineup of authors) on Combining Efficient Conformational Sampling with a Deformable Elastic Network Model Facilitates Structure Refinement at Low Resolution rather intriguing.
So what’s the paper about? The authors have developed a general geometry-based algorithm to sample conformational space under constraints imposed by low-resolution density maps obtained from electron microscopy or X-ray crystallography experiments. A deformable elastic network (DEN) is used to restrain the sampling to prior knowledge of an approximate structure. The goal is essentially to try and make up for the lack of information from experimental procedures to come up with a reasonable structure. The conformational sampling approach allows exploration of a conformational space that fits an experimental low-resolution density map and, thus, yields a whole ensemble of possible solutions. The combined use of the sampling algorithm and the DEN method prevents the ensemble from containing structures that are over-fitted, which often happens in more traditional approaches where certain degrees of freedom are constrained, e.g. bond lengths/angles.
The approach in the paper (and the applications used) is definitely from the point of view of a crystallographer, but it doesn’t take a leap of faith to look at it from a slightly different point of view. For example, it would be interesting to see, if one could do a retrospective study of homology models and try and figure out if you can take a homology model and make it more native like by applying some sparse experimental constraints and doing simulated annealing or just long MD simulations. Back when I tried that out with some NMR restraints things didn’t really work, although that was a long time ago
I’ll add a rant to the end of this paper, something I’ve said before. Structure biology and protein structure prediction seem to have stagnated. There haven’t been any quantum leaps in capabilities for a while. A lot of the improvements have come from access to more compute power rather than new methodology.
Technorati Tags: elastic network models, low-resolution modeling, protein structure prediction
Is visualization science?
February 6, 2008
Is visualization science? I suspect Pawel will come hunt me down for even putting that question out there, but it’s a question that I have asked myself before. Not quite in those words, but to try and understand whether a pretty picture really does anything to add to knowledge. Please note that you are talking to someone who for years steadfastly refused to run any kind of computation from a GUI, unless you consider a Unix shell one.
Perhaps in the early days of visualization, images of proteins and other molecules were just pretty pictures, meant for publications, but rarely used on the research side unless you were a crystallographer or something. But we’re in a different world now. The compute power we have access to, the GPUs around, all make “visual science” a critical component of our understanding of systems. In fact I would argue that we have not really tapped into that aspect.
The Epiphany
A few years ago I was at Watson research labs and was seeing the results of a simulation of rhodopsin. A long, massive simulation performed on a blue gene. The result was an animation of a MD trajectory, the kind that I recall someone describing as wiggling molecules. Only this one was far more than a movie. With the compute power of a Blue Gene, suddenly the animation took on a life of its own. What you could see was a model for an actual process, along realistic timescales. I stood transfixed. I was similarly transfixed when I saw someone at Riken do real time docking using a joystick with haptic response on a machine running a GRAPE card. Visualization has come a long way from being useful to pretty up research papers and powerpoint presentations.
Code in pictures
You want to automate a homology modeling procedure. You want to pull in homologous sequences, align them, create a consensus profile and build a homology model and then evaluate it, perhaps launching Rasmol or something at the end to visualize what you ended up with. I’ve done it. Required a lot of badly written Perl, some Fortran, and a folder full of scripts and I/O directories. Show that to a non-specialist and you won’t get much of a response.
Now do the same thing in Taverna, or Pipeline Pilot. You can visualize your code, your workflow, and its not in a flowchart or an abstract architectural diagram. it’s right there in front of you. In a pharma or similar setting, being able to develop and define a data pipeline or workflow takes on added importance. And with the visual tangibility, your scientific workflows take on a life of their own
Visualizing connectivity
The last example is that of Cytoscape, a fantastic open source tool to visualize and analyze network data. Those network graphs and connectivities wouldn’t make half as much sense without the ability to visualize the interactions in a highly interactive way. In other words, the visualization is science, not just a tool to help decipher scientific data.
Those are but some examples of how the role of visualization has been enhanced by access to fast computers and quality graphics cards. And it’s only going to get better and more accessible.
Credits
Haemoglobin picture: Paweł Szczęsny under a CC license
Other images from the SciTegic and Cytoscape sites
Technorati Tags: Science, Visualization, Graphics
Is 57% enough?
January 5, 2008
Interesting paper in J. Chem. Inf. Model. on Predicting Key Example Compounds in Competitors’ Patent Applications Using Structural Information Alone. That’s actually a pretty cool concept and something many pharma companies are either already doing or would be quite interested in.
The authors method is based on the assumption that medicinal chemists usually carry out extensive structure-activity relationship (SAR) studies around key compounds. Using that assumption, the method identifies compounds located at the centers of densely populated regions in the chemical space of patent examples (represented by Extended Connectivity Fingerprints (ECFPs)). The authors had a success rate of 57% for their test sets. Call me old and jaded, but percentages like that just don’t impress me anymore. I think the entire informatics/computational modeling field needs to understand that for methods to be applicable and accepted, they need to radically reduce time or error rates or both. During the days that I was evangelizing physics based approaches for protein-ligand evaluation, I remember someone from a big pharma company telling me. You have to achieve throughput better than X compounds a week, because otherwise, we’re just going to make them. You had to be significantly better than their med chemists WITH sufficient quality.
Technorati Tags: Cheminformatics, Drug Discovery





