In prior articles on grid computing, I have voiced my concerns on the potential applications of grid computing for pharma projects. Nodalpoint has an article on grid computing for life science that references articles by Tim Bray from Sun and Jim Gray from Microsoft. At the risk of sounding like a broken record, I maintain that while the grid computing economics that Jim Gray talks about work for certain cases, there are a number of cases where the economics break down. If you are doing routine crunching of genomes on an ongoing basis (annotation, etc) and essentially performing data collection, then grids make a lot of sense, at lease loosely distributed ones. The microarray data analysis that Duncan mentions is an ideal candidate for grid-based deployment. On the other hand, I am still not convinced that all in silico experiments are conducive to grids as opposed to clusters. The latter give you more control, more reliability and in the end probably help you achieve your goals faster.
From personal experience one aspect of grid computing never gets enough thought. What do you do with all the data that grid computing efforts routinely create. Just sifting through all the data can become a nightmare. Which is why, I think grid deployments work best for routine data generation projects, since then the scientist can focus on data analysis and let the grid continuously generate data.
Technorati Tags: grid computing, BioIT, Bioinformatics, Cluster Computing, High Performance Computing, Scientific Computing



2 Comments
I agree with your conclusion about Jim Gray’s paper. While I appreciate lots of what he’s done, he misses a key variable here–how many times do you use that data after you move it?
His conclusion was “BLAST, FASTA … are mobile in the rare case of a 40 CPU-day computation.” But all of his calculations seem to be based upon the premise that each piece of data is only used once. It may not make sense to distribute a human genome, then do one search on it.
If you’re going to do thousands of searches on that data after you’ve moved it, it makes increasing sense with each search you do.
I agree with your conclusion about Jim Gray's paper. While I appreciate lots of what he's done, he misses a key variable here–how many times do you use that data after you move it?
His conclusion was “BLAST, FASTA … are mobile in the rare case of a 40 CPU-day computation.” But all of his calculations seem to be based upon the premise that each piece of data is only used once. It may not make sense to distribute a human genome, then do one search on it.
If you're going to do thousands of searches on that data after you've moved it, it makes increasing sense with each search you do.
One Trackback
[...] I answered the first earlier. I have also been known to be skeptical of grid computing in pharma companies. However, I do believe that one day we will do a lot of computing in the cloud, and through grids. In the meantime, I agree wholeheartedly that screensaver projects like the ones described about are quite useful, as the success of Folding@home has shown. [...]