Structure prediction has a long way to go - The PDB says “no” to computational models

August 27, 2006

I read this first in Bioinform, where it was reported that from October 15, 2006 the PDB will no longer accept theoretical models. The decision was the result of a workshop held last fall, the results of which have been published in Structure

As someone long involved with protein structure prediction, I should be apalled at this slight, but that is not the case. Kevin Karplus states the obvious when he states that the PDB should focus on quality and not on quantity, the latter somewhat of an offshoot of various genome sequencing projects and the  Protein Structure Initiative where homology modeling became an excellent tool to model vast numbers of newly sequenced proteins with unknown structure. While structure prediction has made great leaps in the past decade, with the ability to model very distant homologs, the development of threading techniques, and the wonderful work done by David Baker on ab initio structure prediction, the quality of structures is a far cry from that required for many applications. Fold prediction is a good method for classification, and structure prediction is useful for developing functional hypotheses. The field would benefit greatly from improved methods for loop and sidechain prediction, the former being a serious issue.

It’s clear that computation has played and will have an even greater role to play in structural and functional biology, especially as we try and understand allosteric effects, design better drugs, and evaluate protein-protein interactions. Therefore it is important that the community come together, not in the competitive framework of CASP, but rather in a more congenial, cooperative framework, where the discussion should focus on standards, quality assessment criteria, and a centralized resource for computationally derived structures.

Where do I think the field will go? The late 90’s saw a huge jump in the quality and performance of protein structure predictions. Another quantum jump is required to take the field to the next level. A peer-reviewed knowledgebase linking computed structures to the PDB and other relevant classifications would be a good first step, and it looks like the Protein Structure Initiative might take the lead on this issue. I don’t have the paper, so I can’t confirm how much of the workshop focussed on methods.  In terms of methodology, the focus will shift (is shifting) from throughput to quality, with a focus on molecular interactions and an increased use of physical potentials for improving structural quality. Improved treatment of the micro environment of sidechains and methods to generate more native-like conformations will gradually begin to hit the mainstream. Perhaps novel search strategies and algorithms that take advantage on new computer architectures and performance will become a focal point for some research groups. Computers are going to get faster and better. Our methods should adjust accordingly, and perhaps it makes sense to take a tiered approach with the method selection being dependent on the application.

Edit:: I forgot to add this part. One telling sign of the lack of trust in computational models is that whenever people create non-redundant structure libraries for comparative modeling, one of the first things that are tossed out as part of the QC procedure are any PDB entries that are computational.

Further Reading
Managing structural genomics data

Technorati Tags: , , , , ,

powered by performancing firefox

Comments

6 Responses to “Structure prediction has a long way to go - The PDB says “no” to computational models”

  1. Protein models and the PDB « memomics on August 28th, 2006 7:42 pm

    […] A post by Deepak alerts that the PDB will no longer be accepting computational models of protein structures as of this October, and notes the pressing need for a peer-reviewed knowledgebase of protein models to exist alongside the PDB. The field of “structural genomics” has envisaged a marriage of experimental and theoretical determinations since its beginnings in 1998, but since then one gets the impression that the US-funded Protein Structure Initiative (PSI) has been sidetracked by the difficulties involved in experimentally determining protein structures in a number that approaches its initial somewhat optimistic forecasts. It is encouraging to see via these press releases that computational modeling is still within the realm of the PSI. Large scale funding for protein structure modeling such as this may deliver the kind of quality protein model databases that are needed in the light of the decision of the PDB. […]

  2. Steve_B on August 31st, 2006 6:47 am

    When the rubber meets the road, a model is a model and an experimentally determined structure is a “real” structure — with all the inherent caveats that come with concentrating a protein to the extent required for structure determination.

    I’ve used models, I LIKE models — they’re helpful and thought-provoking. But if the PDB’s mission is as a structure database, it shouldn’t be littered (and I do mean littered) with models.

    What’s wrong with a separate db of homology models? Would researchers turn to it if Their Favorite Protein had not been determined experimentally? I think so, but no one wants to populate the second banana database — just ask the BMRB folks.

  3. Deepak Singh on August 31st, 2006 7:01 am

    Strictly speaking even a crystal structure is a model, since it fits the structure to experimentally observations, i.e. the electron density. .. and therein lies the problem. Take space travel for instance. All the calculations that figure out how we get to the moon, go around, it, etc are all “models”, but the underlying physics is very well defined. When it gets down to molecular level detail, life gets a lot more complicated. We are some way away from theory that can describe molecular systems the size of proteins at the desired level of accuracy.

  4. Quality Control at business|bytes|genes|molecules on December 16th, 2006 2:48 pm

    […] The world of biological information is not too different from that. Content is generated by the terabytes, by labs all over the world, using all kinds of methods that are often not well documented. It was for some of these reasons that the PDB stopped including theoretical models in its main database. When I was working as an informatician, getting confidence scores on all the data that we had was very important, and sometimes rather challenging. This is going to be the biggest challenge for the biological community going forward. As access to biological data gets more loosely distributed to the edges, and accessing those edges becomes one of our favorite pursuits, an equal amount of time will have to be spent making sure that data actually means something. Perhaps when I talked about monetizing biological data what I was really alluding to was the ability of companies/people to develop algorithms/platforms that can successfully mine the vast expanse of data and find what is really needed without compromising quality. In addition to all the other wonderful research going on out there, I wonder what kind of innovations are being made in assessing the quality of random biological information? I haven’t seen too much, but admittedly, I haven’t been looking. […]

  5. TechBizMedia » Blog Archive » Bhageerath: Structure prediction pipeline on January 8th, 2007 9:26 pm

    […] Further reading: A new model for CASP Structure prediction has a long way to go […]

  6. » Animals and drug development » business|bytes|genes|molecules on December 28th, 2007 11:20 am

    […] Further reading Computing where art thou Structure prediction has a long way to go 2015 or 2025 […]

Got something to say?