Bio-IT World Day 3 - Linda Avey Keynote

April 30, 2008

Bio-IT World day 2 - iPhones, Virtualization, EC2 and the Semantic Web

April 30, 2008

Cropped version of :Image:IPhone_Release_-_Seattle_(keyboard).A quick report on Day 2 of Bio-IT World.

The day started with a keynote by Josh Boger, founder and CEO of Vertex. His talk spanned several real world examples and some food for thought. Highlights

  • Vertex has made active use of a MedChem ELN, which has been extended to their entire MedChem community, including external partners. In his own words the goal was “enabling the virtual research organization”
  • Metric of success was user adoption and there were some good analytics supporting uptake
  • He spoke at length about the HCV program, where they have used extensive predictive modeling and simulation
  • Clinical data has backed up their predictive modeling (they’re in Phase III now)
  • They have avoided some experiments (carried out by competitors in one case) that their models suggested they avoid
  • He ended by talking a lot about communication and how technology can impact the healthcare system. Much of this section of his talk was around the iPhone. For example how the iPhone can be used to track RFID tagged pill bottles, patient exercise regimens, carry patient records, monitor weight, etc. They’re actually implementing some of these ideas

There were many other talks to attend, and I won’t bore you with some of the details, but I will talk about one talk, a talk by Chris Dagdigian of The BioTeam, a small boutique consulting shop, which readers of this blog will know via mentions of Michael Cariaso. Chris spent a lot his talk discussing the economics of storage and the kinds of storage, etc available these days and trends in storage and computing. Perhaps it shows how much of a geek I am, but this was a dream talk, one full of hardware specs, pictures of data centers, etc. It is clear that virtualization is big; Chris’ preference being Xen. There was a cool slide on meta-virtualization (a virtual machine inside a virtual machine inside a virtual machine). Two thoughts really resonated with me; first was his distaste for classical Grid Computing, which I have long considered impractical for most companies. The second was his strong support for Amazon Web Services, especially EC2. Apparently, every single BioTeam consultant has independently deployed an EC2 solution, i.e. they’ve all come to the same conclusion. Can’t wait to see this talk next year to find out where they’ve gone with AWS. One thing he said which also resonated was to talk about the death of the small cluster. Today and in the future, we will either have multicore (8-16 cores) on our desktops or dial up cloud resources. His slides will be available somewhere. Can’t wait to get my hands on them. This was a GREAT talk.

One of the highlights for me was attending the W3C Semantic Web HCLSIG lunch. I got to meet people I know (Eric Neumann), people I have interacted with online (Vipul Kashyap) and followed (John Wilbanks from Science Commons). And I got to say hello to Sir Tim Berners-Lee, who needs no introduction.

Another highlight for me. I got to finally meet Joe Landman, whose JackRabbit got a good plug in the BioTeam talk as well. It was great to meet Joe with whom I’ve been having a conversation via our respective blogs for quite a while now.

Met several former colleagues and customers as well. Bio-IT World has definitely been one of the better conferences I have had a chance to attend in terms of interest and people.

Image via Wikipedia

Technorati Tags: , , ,

Bio-IT World Day 1 - Visualization, the cloud and people

April 29, 2008

Collective intelligenceDetailed blog posts will follow when I have some additional cycles, but thought I’d share some quick thoughts on day 1 of Bio-IT World. My conference started with a workshop on data visualization, which was mostly about the importance of visualization for making sense of multidimensional data sets and what kind of visualizations could be done. My take aways from the talks

  • There was a distinction made between statistical methods and data mining and presenting information to humans.
  • Life science data is inherently multiscalar and reducing dimensions without losing information or creating artifacts is not trivial
  • Importance to create systems that can help scientists go through a workflow and predict visualizations, and help guide the user to the most appropriate visualization for the relevant questions
  • APIs are important for Pfizer. If a full API is not available, they are not interested in a visualization package
  • and last but not the least, as I Twittered during the workshop, they need to invite Ben Fry to give a talk on visualization. I am sure he would have a lot to contribute

Perhaps the highlight was the keynote by John Reynder from Johnson and Johnson PRD. He gave us a tour of his experiences through his career, including his time at Los Alamos. The talk was not in any great depth, but I left it very encouraged. Encouraged that the head of an IT organization at a large pharma company understood the value of collaboration, understood that innovation happens everywhere, and needs to be tapped appropriately and a lot of information is pre-competitive and should be shared across companies. Other things he talked about

  1. The cloud :). There was a slide on how to dial up storage and cycles, with AWS prominently mentioned
  2. Collective intelligence. He spent a lot of time on collective intelligence, from knowledge and innovation networks, to connecting people internally and talking about using new ways to make tools available and connecting people together. There was a suitable amount of web 2.0 jargon and frequent mention of the Semantic Web as essential to the life sciences.
  3. We have the compute power, but the gap comes from the software.
  4. He also warned about getting too caught up in the technology and losing sight of the problem

Would have been nice to have open data mentioned explicitly, but he clearly said that pharma needs to appreciate data and information sharing.

Bio-IT World means meeting old friends, especially from my Accelrys days as well as finally meeting people I admire from my online life, with a special shoutout to Michael Cariaso

On tap on Day 2 - Electronic Data Capture, high throughput data management, supercomputing and a W3C lunch

Image via Wikipedia

Technorati Tags: , ,

New business models for life science content

April 28, 2008

Let me start of by pointing everyone to the standard disclaimer.

Now to the good stuff. I have blogged about NextBio in the past. A couple of weeks about I was on the site and noticed that I could use the search engine without having to log in and get some pretty interesting results fast (well presented, well laid out, etc). I also registered and got an account for enhancements to the search experience. So when I got an advance copy of a press release announcing the formal public launch of the NextBio search engine. From the release

Using NextBio, any researcher or clinician can search the world’s public life sciences data and literature - over 10,000 experiments, 16 million articles, and literally billions of data points. Moreover, users can import their own experimental data into the NextBio search engine, share it with the community, and collaborate with others as never before

The release offers more details. There are over a billion data points, tens of thousands of study results and millions of scientific articles. There is a really neat autocomplete feature. Perhaps most importantly one can make correlations across six species, comparing animal models to human data.

Here are some screenshots. What I like most about the service is just the look and feel, very “Googley” if I might say.

NextBio autocomplete

BRCA2 - NextBio

For me the more interesting part is the business model. The NextBio model is essentially the freemium model that so many have advocated. They offer a quality free search engine, but revenues are going to be driven by commercial services, both hosted search and local installs. Transinsight, with GoPubMed, is doing something similar albeit not quite at this scale.

I like the direction life science content is taking. It’s only going to be better for science and for the companies working in this space

Hopefully I will get a chance to see the presentation tomorrow here at Bio-IT World. Check the site out, I would love to hear what all of you think.

Further readingh
Searching biological information at NextBio

Technorati Tags: , ,

Rethinking software access

April 26, 2008

So today, I tried to download MODELLER which is free for academics and $$$ for commercial via Accelrys (Full Disclosure: While i did not directly manage MODELER at Accelrys, I had indirect responsibilities). I completely understand that part. The problem is that the MODELLER license does not seem to address what I want to do: hobby science. So I had to wait for my request to be approved, which it didn’t.

There’s two thoughts that arise from this exercise, or maybe three. First, it’s clear that when the MODELLER license was written, personal research use was not considered. It harks back to the assumption that “real science” was either done in industry or at companies. Well folks, it might have been true some years ago, but it is an assumption that is a bit of a problem. I completely understand that they are trying to avoid the system from being gamed, but in my mind the old model (free for academic, $$$ for commercial via another entity) does not work as classically constructed in this case, for multiple reasons. The whole licensing model does not work for bursty science either, especially when one or more non-academics is installed (this is a question that I took a hard look at once for MD programs).

Which leads me to thought #2. I come from an era when modeling software was local, either on a workstation or on a cluster somewhere. That’s how I always ran CHARMM, MODELLER, WHATIF, various threading packages, MOPAC, GAUSSIAN, various other QM packages. That is how most people run those codes today. Then think about a project that you might want to do, a bursty project spanning geeks across countries and continents. Yeah, modeling doesn’t live well on the programmable web. There are servers out there, especially for structure prediction, sequence alignment, etc, but they seem to belong to a different era of the web. We need to start thinking about the source hosted model, at least for academic code. Source code licenses that target developers and power users that like tinkering with the code, but that’s also better done by hosting all academic code at sourceforge, google code or github, so that collective intelligence comes into play, rather than people developing their own forks which no one else gets to see. Second, applications should be available on the web, ideally with APIs that make it possible to mash up solutions. Now, automating these tasks is not always trivial, neither is setup. All of us with hundreds of utility scripts know that, but lets think about the web when we develop code. Not just providing a web server, but how that server can be used as a powerful resource, not just a result submission and retrieval backend. I’d love to be able to get access to a NAMD server, run a series of utility tasks and then launch a compute job, where I could dial up a set of servers, etc. It’s also possible to attach utility based licensing and pricing to such a service.

What I am arguing for is new ways to think about how we make software available, and how it is used. This can’t be done at the individual group level, but there is an opportunity here for universities and funding agencies to figure out how they can help facilitate this, and even companies that might want to commercialize some of these packages.

Comments on this? What would you like to see? How might you access such tools? Would you want mashup APIs?

Technorati Tags: ,

Next Page »