Interesting paper in J. Chem. Inf. Model. on Predicting Key Example Compounds in Competitors’ Patent Applications Using Structural Information Alone. That’s actually a pretty cool concept and something many pharma companies are either already doing or would be quite interested in.
The authors method is based on the assumption that medicinal chemists usually carry out extensive structure-activity relationship (SAR) studies around key compounds. Using that assumption, the method identifies compounds located at the centers of densely populated regions in the chemical space of patent examples (represented by Extended Connectivity Fingerprints (ECFPs)). The authors had a success rate of 57% for their test sets. Call me old and jaded, but percentages like that just don’t impress me anymore. I think the entire informatics/computational modeling field needs to understand that for methods to be applicable and accepted, they need to radically reduce time or error rates or both. During the days that I was evangelizing physics based approaches for protein-ligand evaluation, I remember someone from a big pharma company telling me. You have to achieve throughput better than X compounds a week, because otherwise, we’re just going to make them. You had to be significantly better than their med chemists WITH sufficient quality.
Technorati Tags: Cheminformatics, Drug Discovery



10 Comments
What happened in that 43%? Where the structures very incorrect, or actually quite similar to the actual hit? Say, if the hit 95% within the, say, best three hits found with their method, than it would be quite impressive. In other words… a ROC curve would say so much more than the number of 57%… but I am too lazy to look up the paper right now.
Anyway, the article is nothing more than addressing the lack of OpenData, using a bit of molecular similarity.
The nice thing is, however, that expect that pharma will be starting using this approach to guess what others are doing, they will also start using this method to ensure that their best hits are *not* amongst the top five returned with this method
More cold-pharma-war, more money spend on legal departments, more money into defending sensitive information. Consequently, less money into actual research, less time for chemoinformaticians to do other research, longer times before articles pass the legal departments before being able to send it of to a journal…
What happened in that 43%? Where the structures very incorrect, or actually quite similar to the actual hit? Say, if the hit 95% within the, say, best three hits found with their method, than it would be quite impressive. In other words… a ROC curve would say so much more than the number of 57%… but I am too lazy to look up the paper right now.
Anyway, the article is nothing more than addressing the lack of OpenData, using a bit of molecular similarity.
The nice thing is, however, that expect that pharma will be starting using this approach to guess what others are doing, they will also start using this method to ensure that their best hits are *not* amongst the top five returned with this method
More cold-pharma-war, more money spend on legal departments, more money into defending sensitive information. Consequently, less money into actual research, less time for chemoinformaticians to do other research, longer times before articles pass the legal departments before being able to send it of to a journal…
Especially that last part.
I should have mentioned ROC curves, since I agree that they are probably the best approach. I don’t have access to the paper from home, so limited to the abstract.
My point is that even in academia, I would say especially with a lot of academics (this is even more true in areas like structural bioinformatics), people keep publishing methods that don’t really improve the state of the art, which really bugs me, especially for a field that’s been around a while.
Especially that last part.
I should have mentioned ROC curves, since I agree that they are probably the best approach. I don't have access to the paper from home, so limited to the abstract.
My point is that even in academia, I would say especially with a lot of academics (this is even more true in areas like structural bioinformatics), people keep publishing methods that don't really improve the state of the art, which really bugs me, especially for a field that's been around a while.
Though I like the term ‘cold-pharma-war’ I am not sure if a general ‘open data’ concept goes hand-in-hand with the idea of creating intellectual property protection by making a patent (invention).
Again, I think that ‘open standards’, also for patents are more important.
Though I like the term 'cold-pharma-war' I am not sure if a general 'open data' concept goes hand-in-hand with the idea of creating intellectual property protection by making a patent (invention).
Again, I think that 'open standards', also for patents are more important.
Joerg, it’s the routing of money that is really what worries me. Every penny not spend on (virtual) synthesis, is a penny lost. It add in efficiency. Surely, that penny has to be earned too; I know that. Problem is that not every America/African can pay every medicine right now. So, the more pennies saved, the cheaper medicine get, the more people can get cured. And, yes, I do think OpenData helps here. Or better, put: ClosedData is most certainly not helping. If not just that Academia, those who can improve existing practices, do not have enough data to properly value new approaches. It’s the lack of OpenData that causes the problem that a lot of current literature is mediocre, which is bugging Duncan, me and you too. At least some time ago…
I don’t ask pharma to tell us every secret, I only would like to ask them to tell us those thing they are not interested in; that do not have value to them, or not any more. It’s an investment: they get better OpenSource in return.
Joerg, it's the routing of money that is really what worries me. Every penny not spend on (virtual) synthesis, is a penny lost. It add in efficiency. Surely, that penny has to be earned too; I know that. Problem is that not every America/African can pay every medicine right now. So, the more pennies saved, the cheaper medicine get, the more people can get cured. And, yes, I do think OpenData helps here. Or better, put: ClosedData is most certainly not helping. If not just that Academia, those who can improve existing practices, do not have enough data to properly value new approaches. It's the lack of OpenData that causes the problem that a lot of current literature is mediocre, which is bugging Duncan, me and you too. At least some time ago…
I don't ask pharma to tell us every secret, I only would like to ask them to tell us those thing they are not interested in; that do not have value to them, or not any more. It's an investment: they get better OpenSource in return.
Egon
You are quite right about some of the issues (OpenData being one of them). It’s one reason there is a move towards sharing safety and toxicology information. I also agree that we would like to see more virtual synthesis, because it cuts down costs. However, its not quite that simple. The reasons are, from my experience
1. A lack of trust in virtual models. This is much better than it used to be, but if you just look at enrichment factors and results from vHTS and it’s clear that the methods are just not there yet.
2. The real cost comes in time. If computational methods help you save any time, you are reducing costs. And these costs can only be measured over the life of a lead candidate. If virtual methods lead you down the wrong path, then you have to go back.
My take is this. We’ll only really get a picture over the next decade. The problem is, companies are not willing to take the risks which will really help them find out if virtual methods are resulting in better drugs at a lower cost.
Egon
You are quite right about some of the issues (OpenData being one of them). It's one reason there is a move towards sharing safety and toxicology information. I also agree that we would like to see more virtual synthesis, because it cuts down costs. However, its not quite that simple. The reasons are, from my experience
1. A lack of trust in virtual models. This is much better than it used to be, but if you just look at enrichment factors and results from vHTS and it's clear that the methods are just not there yet.
2. The real cost comes in time. If computational methods help you save any time, you are reducing costs. And these costs can only be measured over the life of a lead candidate. If virtual methods lead you down the wrong path, then you have to go back.
My take is this. We'll only really get a picture over the next decade. The problem is, companies are not willing to take the risks which will really help them find out if virtual methods are resulting in better drugs at a lower cost.