Science Big, Science Connected
November 14, 2008
The first attempt at distilling some of my thoughts on Big Data and the Networked Future of Science. Thanks to Chris Lasher for the invite to speak at VA Tech. I had fun, although in my jetlagged, uber-caffeinated state I spoke at 200 mph
Big data and a big blogger
September 30, 2008
It’s kind nice having people whose research you’ve followed for years start blogging. I believe Shirley has something to do with this, which means she gets thanks for the fact that Russ Altman is a blogger, even if he finds it hard.
Anyway, in recent days/months the whole concept of big data has been top of mind, partly due to personal interest and partly driven by professional interest at both past and current places of work. So it was nice to see Russ cover the subject in a blog post. In calling big data an informaticians best friend, he talks about what big data is and the impact it will have on informatics. Specifically he points to the need to collect all the data we can, but equally importantly, what will make the data useful and valuable. I think we aren’t there yet, but we’ll get there.
He also talks about the market that big data will generate for informatics tools, algorithms, and solutions from the computer industry. I remember sitting in a talk by Lee Hood some years ago where he talked about how the mathematics for deriving useful information from the millions of data points collected over a variety of analytes across various high throughput technologies wasn’t there yet. That’s the really hard part. Even with our current methods, we can really push the boundaries. Most importantly though, I believe Big Data in the life science will really make us think about data collection, data management, data analysis and data distribution at an industrial scale. This is even more true for the derived data. The days of hacked out code, a server on a grad students computer, and thinking about instruments as personal lab properties are gone. We need to think about capacity, content delivery, knowledge management and a lot of topics that so far only a few have had to worry about, and we need to do so as a community.
Of course, computing will play a big role in all this, one reason I made the move out of the life sciences into the heart of virtualized computing. I can’t wait to see the life science community, both industry and academia, begin to take computing more seriously, both from the programming, and the architecture point of view. We can’t just be casual consumers anymore, we have to be active about leveraging the technologies and paradigms of data intensive technologies that the web has spawned and add to them the compute intensive needs and requirements often unique to science.
So where was I … oh yes, just read Russ’ darn post and ignore the paragraphs above

![Reblog this post [with Zemanta]](http://img.zemanta.com/reblog_b.png?x-id=29b50742-13e1-48ea-8497-71b471a900ee)

