Gamers, get your folding on
May 9, 2008
Technology Review was the first place I saw it, then someone put it up on Friendfeed and now Andrew Perry has a great post on Foldit. Foldit comes out of the lab of a bbgm favorite, David Baker, right here at the University of Washington.
Foldit combines gaming with protein structure prediction. It’s an interesting approach to spreading scientific problems. Folding@home built upon the success of Seti@home and the geek cred of running on gaming consoles and has built quite a following. Will Foldit, which presents a simple, fun interface to get people interested in protein structure (and the existence of Folding@home makes this somewhat familiar to geeks everywhere) be an example of how we can leverage crowdsourcing? Andrew makes some interesting points (which I agree with) on weighting crowdsourcing, although that’s always a hard thing to do, but I’d like to see karma, etc come into play here.
It’s good to see protein structure getting some attention and continuing to be creative. It’s always been my favorite scientific subject. The field lends itself to “pretty pictures”, so getting non-experts involved is a possibility.
The site and server have had connectivity issues since I’ve been trying, so perhaps they need help with web resources, cause lots seem to be interested.
Here is a list of people supporting the project: UW Animation Research Labs, UW Baker Lab, DARPA, Microsoft, and Adobe. Nice list.
Image via Wikipedia
Technorati Tags: David Baker, Foldit, Protein Folding, Protein Structure Prediction, Gaming, Crowdsourcing
Tons of data everywhere. Do we need life science CDNs?
May 3, 2008
This weeks Bio-IT World meeting was all about data storage. Driven by the needs of integrating complex, heterogenous data and most of all by next gen sequencing, it’s amazing how much data the life sciences are generating and how poorly prepared we are. I won’t necessarily mention names, but there are places which have data hitting the petabytes AFTER throwing away most of it. How do you access this data? How do you back it up? What kind of data centers do you need? What kind of power do you need? When people are worried about the city being able to handle their power needs then there is cause for concern.
It is also why I think the future of scientific data generation needs to be thought about like Google, etc view data, infrastructure and data access. What if we had a Big Table like distributed file system where all this data could be uploaded to? What data would be uploaded there? How would we access it? Ideally data from public genome projects would be made available as Open Data, available to everyone for downstream analysis under a CC0 or similar license. Of course there is a lot more to these data than just whole genome sequencing. There is also the challenge of just the pipes that the data needs to travel through. These are really large files.
Whatever the solution(s), next gen sequencing and the resultant data glut were top of mind. And this is just the start. Personally, I think that those in small labs who want access to sequencers for their own work really need to reconsider. Their utilization rates are unlikely to justify the cost and they will almost certainly run into data storage, access and archiving issues, especially when something like PacBio comes online. A utility model works best here, a model where people get access to time on machines or access to machines hosted at core facilities, etc.
The more I think about these issues, the more I am convinced that the life sciences really need to embrace something like CDNs. With the sheer volume and variety of data, we need people who can step up to the plate and provide the infrastructure instead of depending on a few people who aren’t necessarily thinking about data the way Google or Yahoo do on a daily basis (although the way it looks some of them are doing just that). I am especially worried about smaller groups and labs who might just get left out if we don’t develop the appropriate ecosystem.
The economics of all this? That’s another issue for another day.
Further reading
Chris Dwan’s Bio-IT World presentation
The DNA Data Deluge
Technorati Tags: Data, Open Data, Next Gen Sequencing, CDN, BioIT
Tranche in the news: More wins for Open Data
May 2, 2008
Proteome Commons Tranche is one of the cooler resources on the web. Ever since I met Jayson Falkner, I have liked their approach to open data, and their early support for CC0. Looks like Tranche has hit the big time with the announcement that the resource has been chosen to host all mouse model proteomics data collected by the National Cancer Insititute. From the press release (which you can read in its entirety here).
The innovative scientific file sharing network and data repository, Tranche, has been chosen to host all Mouse Models proteomics data collected by the National Cancer Institute (NCI) Mouse Proteomic Technologies Initiative (MPTI) for public release.
In collaboration with Dr. Philip Andrews, University of Michigan, Department of Biological Chemistry and the Tranche team, the NCI MPTI project consortia deposited their mass spectrometry data sets into the Tranche data repository for storage and secure data sharing among participating research labs. See details about the MPTI projects below.
The mouse model data sets are already available on Tranche
Further reading
MPTI
Science Commons blog
Technorati Tags: Proteome Commons Tranche, Jayson Falkner, Open Data, Open Science, MPTI, NCI
Biobootcamp 2008
May 1, 2008
Perhaps I was premature in bemoaning the lack of a startup school for life scientists. Adam Rubenstein points to biobootcamp 2008. Not exactly what I had in mind, but knowing some of the people involved, I suspect it will be quite useful to people.
Image via Wikipedia
Technorati Tags: biobootcamp, entrepreneurship
Bio-IT World day 2 - iPhones, Virtualization, EC2 and the Semantic Web
April 30, 2008
A quick report on Day 2 of Bio-IT World.
The day started with a keynote by Josh Boger, founder and CEO of Vertex. His talk spanned several real world examples and some food for thought. Highlights
- Vertex has made active use of a MedChem ELN, which has been extended to their entire MedChem community, including external partners. In his own words the goal was “enabling the virtual research organization”
- Metric of success was user adoption and there were some good analytics supporting uptake
- He spoke at length about the HCV program, where they have used extensive predictive modeling and simulation
- Clinical data has backed up their predictive modeling (they’re in Phase III now)
- They have avoided some experiments (carried out by competitors in one case) that their models suggested they avoid
- He ended by talking a lot about communication and how technology can impact the healthcare system. Much of this section of his talk was around the iPhone. For example how the iPhone can be used to track RFID tagged pill bottles, patient exercise regimens, carry patient records, monitor weight, etc. They’re actually implementing some of these ideas
There were many other talks to attend, and I won’t bore you with some of the details, but I will talk about one talk, a talk by Chris Dagdigian of The BioTeam, a small boutique consulting shop, which readers of this blog will know via mentions of Michael Cariaso. Chris spent a lot his talk discussing the economics of storage and the kinds of storage, etc available these days and trends in storage and computing. Perhaps it shows how much of a geek I am, but this was a dream talk, one full of hardware specs, pictures of data centers, etc. It is clear that virtualization is big; Chris’ preference being Xen. There was a cool slide on meta-virtualization (a virtual machine inside a virtual machine inside a virtual machine). Two thoughts really resonated with me; first was his distaste for classical Grid Computing, which I have long considered impractical for most companies. The second was his strong support for Amazon Web Services, especially EC2. Apparently, every single BioTeam consultant has independently deployed an EC2 solution, i.e. they’ve all come to the same conclusion. Can’t wait to see this talk next year to find out where they’ve gone with AWS. One thing he said which also resonated was to talk about the death of the small cluster. Today and in the future, we will either have multicore (8-16 cores) on our desktops or dial up cloud resources. His slides will be available somewhere. Can’t wait to get my hands on them. This was a GREAT talk.
One of the highlights for me was attending the W3C Semantic Web HCLSIG lunch. I got to meet people I know (Eric Neumann), people I have interacted with online (Vipul Kashyap) and followed (John Wilbanks from Science Commons). And I got to say hello to Sir Tim Berners-Lee, who needs no introduction.
Another highlight for me. I got to finally meet Joe Landman, whose JackRabbit got a good plug in the BioTeam talk as well. It was great to meet Joe with whom I’ve been having a conversation via our respective blogs for quite a while now.
Met several former colleagues and customers as well. Bio-IT World has definitely been one of the better conferences I have had a chance to attend in terms of interest and people.
Image via Wikipedia
Technorati Tags: Bio-IT World, W3C, BioTeam, BioIT






