Fork me on GitHub

Nodalpoint on grid computing for bioinformatics

In prior articles on grid computing, I have voiced my concerns on the potential applications of grid computing for pharma projects. Nodalpoint has an article on grid computing for life science that references articles by Tim Bray from Sun and Jim Gray from Microsoft. At the risk of sounding like a broken record, I maintain that while the grid computing economics that Jim Gray talks about work for certain cases, there are a number of cases where the economics break down. If you are doing routine crunching of genomes on an ongoing basis (annotation, etc) and essentially performing data collection, then grids make a lot of sense, at lease loosely distributed ones. The microarray data analysis that Duncan mentions is an ideal candidate for grid-based deployment. On the other hand, I am still not convinced that all in silico experiments are conducive to grids as opposed to clusters. The latter give you more control, more reliability and in the end probably help you achieve your goals faster.

From personal experience one aspect of grid computing never gets enough thought. What do you do with all the data that grid computing efforts routinely create. Just sifting through all the data can become a nightmare. Which is why, I think grid deployments work best for routine data generation projects, since then the scientist can focus on data analysis and let the grid continuously generate data.

Technorati Tags: , , , , ,

This entry was posted in Admin, BioIT, Computing, Informatics, Infotech, Innovation, Life Science. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

2 Comments

  1. Posted June 13, 2006 at 14:22 | Permalink

    I agree with your conclusion about Jim Gray’s paper. While I appreciate lots of what he’s done, he misses a key variable here–how many times do you use that data after you move it?

    His conclusion was “BLAST, FASTA … are mobile in the rare case of a 40 CPU-day computation.” But all of his calculations seem to be based upon the premise that each piece of data is only used once. It may not make sense to distribute a human genome, then do one search on it.

    If you’re going to do thousands of searches on that data after you’ve moved it, it makes increasing sense with each search you do.

  2. Posted June 13, 2006 at 18:22 | Permalink

    I agree with your conclusion about Jim Gray's paper. While I appreciate lots of what he's done, he misses a key variable here–how many times do you use that data after you move it?

    His conclusion was “BLAST, FASTA … are mobile in the rare case of a 40 CPU-day computation.” But all of his calculations seem to be based upon the premise that each piece of data is only used once. It may not make sense to distribute a human genome, then do one search on it.

    If you're going to do thousands of searches on that data after you've moved it, it makes increasing sense with each search you do.

One Trackback

  1. [...] I answered the first earlier. I have also been known to be skeptical of grid computing in pharma companies. However, I do believe that one day we will do a lot of computing in the cloud, and through grids. In the meantime, I agree wholeheartedly that screensaver projects like the ones described about are quite useful, as the success of Folding@home has shown. [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

blog comments powered by Disqus
  • Archives

  • Disclaimer

    All opinions on this blog are my own and do not reflect those of my employers, past or present