I just returned from a hectic trip to New York City. The highlight of that trip, and the primary purpose, was Hadoop World. I had always planned to be there, and had registered a long time ago, but ended up giving a talk on Hadoop for Bioinformatics (slides below).
Much of the discussions I had were work related, so I won’t talk about them. What I would like to talk about were some general trends that are of relevance to the life science informatics community. It’s pretty clear to me that Hadoop is here to stay. It’s not just the number of people using Hadoop, or the size of the data sets, but rather the variety of data-intensive computing that people are doing, and the thriving ecosystem. Those two tend to go together, and it bodes well for the future.
There is a definite trend in making Hadoop more accessible and usable, and in my talk I touched upon the development of the ecosystem and higher level abstractions as a key in seeing Hadoop usage grow more in the life science community. It’s still early days, but you’re beginning to see a little bit of traction, and as the community adopts more tooling, you are going to see even more adoption beyond the traditional Silicon Valley, web data centric world of Hadoop. Perhaps my favorite talks at the event were the ones by Stuart Sierra on Clojure, and Pete Skomoroch on Trending Topics. Those talks did a great job of talking about how you can use Hadoop in conjunction with higher level tools (Hive, Ruby on Rails, Clojure, etc) to achieve some very interesting results.
One talk I missed, but wished I had seen, was the one by Jake Hofman on social network analysis (slides). Some keys from his slides which are very relevant to the life sciences
Dynamic, data-rich social networks exceed memory limits and require considerable storage
MapReduce convenient for parallelizing individual node/edge-level calculations
Higher-order calculations more difficult when network exceeds memory constraints, but can be adapted to MapReduce framework
These ideas are relevant for analyzing biological networks and relationships and hopefully once we are done taking care of some of the core needs (alignments, assembly, etc) we can start applying Hadoop and other distributed computing frameworks to solve more downstream problems.
I met a lot of great people too, some whom I had only known online. Pete Skomoroch is someone I’ve long followed and meeting him in person was wonderful. I had a brief conversation with Jeff Hammerbacher, whose chapter in Beautiful Data has inspired many thoughts and ideas at this end. Others whom I had interacted with before via Twitter or Friendfeed and finally got to meet were Shiran and Chris Baglieri, who just happened to be the other life science geeks in the crowd.
A couple of people I got to meet all too briefly or not at all were the aforementioned Jake Hofman and Hilary Mason (many Friendfeeders will really appreciate her blog). Hilary also blogged about her post-Hadoop World thoughts.
I am pretty sure I am missing many others, but I can always blame it on jetlag
Thanks for the kind words Deepak! It was great to meet you as well, and your words have certainly sent me off on a few intellectual escapades over the last few years. Looking forward to running into you more often soon.
Post Hadoop World thoughts
I just returned from a hectic trip to New York City. The highlight of that trip, and the primary purpose, was Hadoop World. I had always planned to be there, and had registered a long time ago, but ended up giving a talk on Hadoop for Bioinformatics (slides below).
Much of the discussions I had were work related, so I won’t talk about them. What I would like to talk about were some general trends that are of relevance to the life science informatics community. It’s pretty clear to me that Hadoop is here to stay. It’s not just the number of people using Hadoop, or the size of the data sets, but rather the variety of data-intensive computing that people are doing, and the thriving ecosystem. Those two tend to go together, and it bodes well for the future.
There is a definite trend in making Hadoop more accessible and usable, and in my talk I touched upon the development of the ecosystem and higher level abstractions as a key in seeing Hadoop usage grow more in the life science community. It’s still early days, but you’re beginning to see a little bit of traction, and as the community adopts more tooling, you are going to see even more adoption beyond the traditional Silicon Valley, web data centric world of Hadoop. Perhaps my favorite talks at the event were the ones by Stuart Sierra on Clojure, and Pete Skomoroch on Trending Topics. Those talks did a great job of talking about how you can use Hadoop in conjunction with higher level tools (Hive, Ruby on Rails, Clojure, etc) to achieve some very interesting results.
One talk I missed, but wished I had seen, was the one by Jake Hofman on social network analysis (slides). Some keys from his slides which are very relevant to the life sciences
These ideas are relevant for analyzing biological networks and relationships and hopefully once we are done taking care of some of the core needs (alignments, assembly, etc) we can start applying Hadoop and other distributed computing frameworks to solve more downstream problems.
I met a lot of great people too, some whom I had only known online. Pete Skomoroch is someone I’ve long followed and meeting him in person was wonderful. I had a brief conversation with Jeff Hammerbacher, whose chapter in Beautiful Data
has inspired many thoughts and ideas at this end. Others whom I had interacted with before via Twitter or Friendfeed and finally got to meet were Shiran and Chris Baglieri, who just happened to be the other life science geeks in the crowd.
A couple of people I got to meet all too briefly or not at all were the aforementioned Jake Hofman and Hilary Mason (many Friendfeeders will really appreciate her blog). Hilary also blogged about her post-Hadoop World thoughts.
I am pretty sure I am missing many others, but I can always blame it on jetlag
Related articles by Zemanta