Fork me on GitHub

Big data and a big blogger

It’s kind nice having people whose research you’ve followed for years start blogging. I believe Shirley has something to do with this, which means she gets thanks for the fact that Russ Altman is a blogger, even if he finds it hard.

Anyway, in recent days/months the whole concept of big data has been top of mind, partly due to personal interest and partly driven by professional interest at both past and current places of work. So it was nice to see Russ cover the subject in a blog post. In calling big data an informaticians best friend, he talks about what big data is and the impact it will have on informatics. Specifically he points to the need to collect all the data we can, but equally importantly, what will make the data useful and valuable. I think we aren’t there yet, but we’ll get there.

He also talks about the market that big data will generate for informatics tools, algorithms, and solutions from the computer industry. I remember sitting in a talk by Lee Hood some years ago where he talked about how the mathematics for deriving useful information from the millions of data points collected over a variety of analytes across various high throughput technologies wasn’t there yet. That’s the really hard part. Even with our current methods, we can really push the boundaries. Most importantly though, I believe Big Data in the life science will really make us think about data collection, data management, data analysis and data distribution at an industrial scale. This is even more true for the derived data. The days of hacked out code, a server on a grad students computer, and thinking about instruments as personal lab properties are gone. We need to think about capacity, content delivery, knowledge management and a lot of topics that so far only a few have had to worry about, and we need to do so as a community.

Of course, computing will play a big role in all this, one reason I made the move out of the life sciences into the heart of virtualized computing. I can’t wait to see the life science community, both industry and academia, begin to take computing more seriously, both from the programming, and the architecture point of view. We can’t just be casual consumers anymore, we have to be active about leveraging the technologies and paradigms of data intensive technologies that the web has spawned and add to them the compute intensive needs and requirements often unique to science.

So where was I … oh yes, just read Russ’ darn post and ignore the paragraphs above

Reblog this post [with Zemanta]

This entry was posted in Big Data, Informatics, Modeling & Simulation. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

blog comments powered by Disqus
  • Archives

  • Disclaimer

    All opinions on this blog are my own and do not reflect those of my employers, past or present