Notes from the 2009 Hadoop Summit West

I just got back from Santa Clara where Yahoo and Cloudera were hosting the 2009 Hadoop Summit West on Wednesday followed by a training on Thursday. My interest was one of a prospective user — to gauge how real and mature hadoop is.

The turn-out was more than decent, in the hundreds; a number from Yahoo, running the largest clusters so far, a few folks from Amazon, Facebook, some local universities and a fair number of small companies that have deployed their own clusters (or are running on EC2).

The good news first, hadoop is real and it’s getting real use. It’s clearly a promising platform with active use and development. The scaling model is fairly simple: buy more machines. The current sweet spot is dual-quad hosts with 4x1TB drives and 16GB or so of ECC RAM. Decoupling storage from a central system (à la SAN) is the way to go. Some folks have tried to hook up Thumpers to Niagara chips that run a lot of threads in parallel with some success but the TCO question is unclear.

Hence we can start with a handful of cheap machines and go from there. A few things to watch for: the secondary name node for instance, is there here for backup but to persist the DFS layout structures that exist in RAM on the primary name node. It could have been implemented in a more robust fashion using a sql database rather than requiring a re-implementation of redo logs and data files.

That’s overall the negative point: applications built on the platform (such as hive, hbase and pig) are still pretty much works in progress, somewhat duplication functionality. There is an air of Not Invented Here that still pervades but it’s a sign that the whole thing is still young. A vocal user base that meets regularly should help the project focus on the pieces that truly do not exist yet.


About alq

Devops entrepreneur
This entry was posted in infrastructure, scalability, storage. Bookmark the permalink.

One Response to Notes from the 2009 Hadoop Summit West

  1. Pingback: BotchagalupeMarks for June 12th - 23:51 | IT Management and Cloud Blog

Leave a Reply