Programming HPC for the domain

May 11, 2008

Cray designed many supercomputers that used multiprocessing heavily.At Accelrys, a lot of the software I managed was in-licensed from academia. That approach allowed the company to tap into the intellectual resources of some of the smartest academic researchers in the world, but it also created a problem. One was the difference in software development practices. Some of the academic code barely had version control. But that’s the obvious one. In a new post at Computing at Scale, Bill McColl writes about Domain-specific parallel programming. Translating code parallelized for an academic setting, often under the assumption that huge clusters might be available, to an industrial setting where scaling and fault tolerance become critical, where resource availability varies widely, and speed is critical, is always a challenge. This is especially true when you’re trying to shrink wrap software and building interactive interfaces.

So in an era with more scale available, clouds to tap into, accelerators, and new data and distribution models, are we going to see a shift? I still feel that the underlying scientific research has to come from academia. They have the resources, time and incentive to do so, but I think industrial think tanks and expertise can contribute back by working with academia on advanced problems of relevance, e.g. in the area of computing. Will we tap into some of the new domain specific development being done today as a scientific community? It can’t be done by one side or the other. But rather we need to identify approaches as a community and understand what works best, without trying to duplicate efforts. Of course, we need people who understand these new methods and paradigms to implement them.

There will always be a tension between academic research efforts and commercial need. In the life sciences it is especially tough for industry specific apps to be developed from an economic point of view, which is why I believe it will have to be a joint effort.

Your thoughts?

Image via Wikipedia

Technorati Tags: ,

Gamers, get your folding on

May 9, 2008

Protein before and after folding.Technology Review was the first place I saw it, then someone put it up on Friendfeed and now Andrew Perry has a great post on Foldit. Foldit comes out of the lab of a bbgm favorite, David Baker, right here at the University of Washington.

Foldit combines gaming with protein structure prediction. It’s an interesting approach to spreading scientific problems. Folding@home built upon the success of Seti@home and the geek cred of running on gaming consoles and has built quite a following. Will Foldit, which presents a simple, fun interface to get people interested in protein structure (and the existence of Folding@home makes this somewhat familiar to geeks everywhere) be an example of how we can leverage crowdsourcing? Andrew makes some interesting points (which I agree with) on weighting crowdsourcing, although that’s always a hard thing to do, but I’d like to see karma, etc come into play here.

It’s good to see protein structure getting some attention and continuing to be creative. It’s always been my favorite scientific subject. The field lends itself to “pretty pictures”, so getting non-experts involved is a possibility.

The site and server have had connectivity issues since I’ve been trying, so perhaps they need help with web resources, cause lots seem to be interested.

Here is a list of people supporting the project: UW Animation Research Labs, UW Baker Lab, DARPA, Microsoft, and Adobe. Nice list.

Image via Wikipedia

Technorati Tags: , , , , ,

HPC and structure-based drug design

May 5, 2008

Angiotensin-converting enzyme 2Here is the abstract of a paper in Hypertension entitled Structure-based identification of small-molecule angiotensin-converting enzyme 2 activators as novel antihypertensive agents.

Angiotensin-converting enzyme 2 (ACE2) is a key renin-angiotensin system enzyme involved in balancing the adverse effects of angiotensin II on the cardiovascular system, and its overexpression by gene transfer is beneficial in cardiovascular disease. Therefore, our objectives were 2-fold: to identify compounds that enhance ACE2 activity using a novel conformation-based rational drug discovery strategy and to evaluate whether such compounds reverse hypertension-induced pathophysiologies. We used a unique virtual screening approach. In vitro assays revealed 2 compounds (a xanthenone and resorcinolnaphthalein) that enhanced ACE2 activity in a dose-dependent manner. Acute in vivo administration of the xanthenone resulted in a dose-dependent transient and robust decrease in blood pressure (at 10 mg/kg, spontaneously hypertensive rats decreased 71+/-9 mm Hg and Wistar-Kyoto rats decreased 21+/-8 mm Hg; P<0.05). Chronic infusion of the xanthenone (120 microg/day) resulted in a modest decrease in the spontaneously hypertensive rat blood pressure (17 mm Hg; 2-way ANOVA; P<0.05), whereas it had no effect in Wistar-Kyoto rats. Strikingly, the decrease in blood pressure was also associated with improvements in cardiac function and reversal of myocardial, perivascular, and renal fibrosis in the spontaneously hypertensive rats. We conclude that structure-based screening can help identify compounds that activate ACE2, decrease blood pressure, and reverse tissue remodeling. Administration of ACE2 activators may be a valid strategy for antihypertensive therapy.

Here’s the HPCwire story, which really doesn’t tell me much other than really high throughput docking, but they use words like

That in itself is a significant accomplishment because no one has ever specifically identified a compound that enhances the activity of an enzyme using a rational structure-based approach

Anyone have a subscription to Hypertension? I am really curious cause nothing I read screams “unique” to me. Of course, I can just wait till tomorrow and try and get to the paper from work.

Update: Got the paper, and still don’t get the fuss. It’s an elegant virtual screening strategy, but I wouldn’t say it’s revolutionary. I was hoping to see something more advanced, e.g. protein flexibility, better energy functions, etc.

Image via Wikipedia

Technorati Tags: , , ,

Bio-IT World day 2 - iPhones, Virtualization, EC2 and the Semantic Web

April 30, 2008

Cropped version of :Image:IPhone_Release_-_Seattle_(keyboard).A quick report on Day 2 of Bio-IT World.

The day started with a keynote by Josh Boger, founder and CEO of Vertex. His talk spanned several real world examples and some food for thought. Highlights

  • Vertex has made active use of a MedChem ELN, which has been extended to their entire MedChem community, including external partners. In his own words the goal was “enabling the virtual research organization”
  • Metric of success was user adoption and there were some good analytics supporting uptake
  • He spoke at length about the HCV program, where they have used extensive predictive modeling and simulation
  • Clinical data has backed up their predictive modeling (they’re in Phase III now)
  • They have avoided some experiments (carried out by competitors in one case) that their models suggested they avoid
  • He ended by talking a lot about communication and how technology can impact the healthcare system. Much of this section of his talk was around the iPhone. For example how the iPhone can be used to track RFID tagged pill bottles, patient exercise regimens, carry patient records, monitor weight, etc. They’re actually implementing some of these ideas

There were many other talks to attend, and I won’t bore you with some of the details, but I will talk about one talk, a talk by Chris Dagdigian of The BioTeam, a small boutique consulting shop, which readers of this blog will know via mentions of Michael Cariaso. Chris spent a lot his talk discussing the economics of storage and the kinds of storage, etc available these days and trends in storage and computing. Perhaps it shows how much of a geek I am, but this was a dream talk, one full of hardware specs, pictures of data centers, etc. It is clear that virtualization is big; Chris’ preference being Xen. There was a cool slide on meta-virtualization (a virtual machine inside a virtual machine inside a virtual machine). Two thoughts really resonated with me; first was his distaste for classical Grid Computing, which I have long considered impractical for most companies. The second was his strong support for Amazon Web Services, especially EC2. Apparently, every single BioTeam consultant has independently deployed an EC2 solution, i.e. they’ve all come to the same conclusion. Can’t wait to see this talk next year to find out where they’ve gone with AWS. One thing he said which also resonated was to talk about the death of the small cluster. Today and in the future, we will either have multicore (8-16 cores) on our desktops or dial up cloud resources. His slides will be available somewhere. Can’t wait to get my hands on them. This was a GREAT talk.

One of the highlights for me was attending the W3C Semantic Web HCLSIG lunch. I got to meet people I know (Eric Neumann), people I have interacted with online (Vipul Kashyap) and followed (John Wilbanks from Science Commons). And I got to say hello to Sir Tim Berners-Lee, who needs no introduction.

Another highlight for me. I got to finally meet Joe Landman, whose JackRabbit got a good plug in the BioTeam talk as well. It was great to meet Joe with whom I’ve been having a conversation via our respective blogs for quite a while now.

Met several former colleagues and customers as well. Bio-IT World has definitely been one of the better conferences I have had a chance to attend in terms of interest and people.

Image via Wikipedia

Technorati Tags: , , ,

The CluE Initiative

April 24, 2008

Picture of a Beowulf Cluster. This particular cluster is owned by Alex Schenck.

The emergence of extremely large datasets, well beyond the capacity of almost any single computer, has challenged traditional and contemporary methods of analysis in the research world. While a simple spreadsheet or modest database remains sufficient for some research, problems in the domain of “computational science,” which explores mathematical models via computational simulation, require systems that provide huge amounts of data storage and computer processing (current research areas in computational science include climate modeling, gene sequencing, protein mapping, materials science and many more). As an added hurdle, this level of computational infrastructure is often not affordable to research teams, who usually work with significant budgetary restrictions.

Those words introduce the news that the CluE Initiative, a joint Google-IBM-NSF project. posted its official solicitation last week. The goal of the initiative “is to encourage the understanding, further refinement and –importantly– targeted application of the latest distributed computing technology and methods across many academic disciplines.”.

Earlier today I posted the transcript of a discussion around academic funding for distributed, utility computing. Perhaps the NSF/Google/IBM effort is the first wave of such funding efforts. It will be interesting to see how this gets generalized and other compute resources, e.g. Amazon are accessible via traditional funding agencies. The blog post calls this a “pervasive, technological shift”. I can’t help but agree. It’s time for those of us with an interest and need for computing resources to solve scientific problems to get access to these, not just for solving those problems, but also to come up with new workflows and algorithms that leverage these new computing paradigms

Image via Wikipedia

Technorati Tags: , ,

Next Page »