Category Archives: Big Data
TrendingTopics.org: A reference site for data analytics in Hadoop and Hive
In episode 21 of Coast to Coast Bio (not yet released) I talk about Hive. For those who may not know, Hive is a data warehouse infrastructure built on top of Hadoop.
One of the most recent Amazon Public Data Sets is a sample of Wikipedia page stat statistics by Peter Skomoroch. The full data [...]
Also posted in Computing, Informatics, Life Science Leave a comment
The future of big compute for big science
Image via Wikipedia
As readers of bbgm know, one of the subjects that interests me the most is computing, even though I am hardly a guru, but I’ve been around long enough and close enough to the world of computing to notice and observe various trends in this space. Perhaps the most recent trend, and [...]
Also posted in BioIT, Computing Leave a comment
Bursting on to a cloud
OK, cheesy title, but this one pleases me at multiple levels. It was work done on EC2. It is one of the first examples of a MapReduce implementation of something many people will find useful in the world of bioinformatics. And it is one of the sample apps for Elastic MapReduce.
Now, it’s a [...]
Also posted in Informatics, Software & Internet Leave a comment
Data produced, analyzed and consumed. The impact of big science
When genome centers have to start thinking about large scale data center operations you know something is different. In Science Big, Science Connected, I talked about how the availability of high throughput instruments has fundamentally changed our approach to science. On Coast to Coast Bio, Hari and I often argue about whether this [...]
Also posted in Informatics, Life Science 8 Comments
Freerisk – An open platform for risk modeling