Catching up on Velocity 09

29 06 2009

This year I could not attend Velocity so I decided to catch up via http://velocityconference.blip.tv. Here are a few notes on the sessions I have been able to see so far.

John Allspaw (Ops) & Paul Hammond (Dev): 10+ Deploys per day: Dev/Ops coöperation at Flickr

This is a topic dear to my heart: changing the culture shared (or not) by dev and ops.

  • Contrary to popular wisdom, ops’ real mission is not to keep the service stable per se, but to enable the business.
  • Business requires change
  • Build the tools and the culture that allow repeated change with minimal uncertainty.
  • Automate your infrastructure
  • Use one shared source control, between devs and ops so that everyone on the team knows where to look
  • Reduce all manual steps down to one, that of deciding to build and deploy
  • Small frequent changes better than fewer large changes
  • Use “feature flags”, i.e., use code to enable features, rather than branches
  • Ship TRUNK so that everyone knows what gets released
  • Feature flags allow for private betas, reduces uncertainty
  • Dark launches: enable the feature to exercise the data path but don’t present the results to the end-user
  • Metrics, metrics, metrics
  • Add context to it, such as the last time something was deployed
  • We use IRC and IM bots to bring system updates into the conversation between dev and ops in real time, then push the logs into a search engine
  • Develop respect and trust between devs and ops
  • Have a healthy attitude toward failure (don’t blame, fix the problem first)




Started a friendfeed webops public group

29 06 2009

Feel free to join: http://friendfeed.com/web-ops





#structure09 Hosting on commodity hardware

25 06 2009

I just got out of the panel on commodity hardware and did not get a chance to participate so here’s my take on it.

The panel started with an opening question: google, amazon and the likes run at a huge scale on commodity hardware, yet enterprise vendors still push customized hardware and expensive at that.

To me the answer is pretty obvious: enterprise hardware is being for the most part sold to people who don’t know how to architect and design software on a commoditized stack. Let’s be honest, look at most “enterprise” hardware/software literature: it’s just noise and a waste of both the writer’s and the reader’s time. And by stack I mean from the server, all the way up to the application code.

If you constrain yourself to buy servers that cost no more than $5k, buying high-end database software makes little sense. Rather you recognize that low-end compute is how you get economies of scale and you apply the same reasoning to your networking gear, storage systems, database software, load balancing software, etc.

Google, from its earlier papers, seems to be the first to have understood that, rejecting the usual marketing garbage from large vendors. And for that we should be grateful.





I love Amazon Web Services open pricing

17 06 2009

I’ve just spent 2 hours crafting a spreadsheet to compare how much it would cost to set up a decent platform to deliver the kind of data services I manage, vs. the same on EC2. Easy access to pricing is a key variable that’s often hard to get from vendors without being subjected to the “custom solution” time-waste. Technology vendors, your customers, more often than not, know what they want. When I ask for a price list, don’t try to second-guess whether I’ve done my homework, just give me the price list. If I have questions regarding the “solution” I’ll be more than happy to ask.





How about sub-second queries in Hadoop?

16 06 2009

Two observations from talking and listening to people during the Hadoop summit; firstly hadoop is used quite often to process clickstream data — in all fairness I missed the talk about hadoop used for genomics. Secondly and a corollary of the first, sub-second queries in hive or pig are not quite there yet. Since a hive query translate into maps and reductions their scheduling determines in addition to the sheer volume of data is going to determine response time. Undoubtedly pre-computing aggregates is a natural way to go much like what is done for data warehouses.

Where these aggregated should be stored for consumption is a problem that could to hybrid solutions. Process data with hadoop and export then to postgres or infobright to enjoy a more mature (but less scalable) run-time environment. Get multi-terabyte daily processing and sub-second analytics and all that open source.

If you’ve done something like that, I’d be interested to know before I embark on a route where others have failed before.





Notes from the 2009 Hadoop Summit West

13 06 2009

I just got back from Santa Clara where Yahoo and Cloudera were hosting the 2009 Hadoop Summit West on Wednesday followed by a training on Thursday. My interest was one of a prospective user — to gauge how real and mature hadoop is.

The turn-out was more than decent, in the hundreds; a number from Yahoo, running the largest clusters so far, a few folks from Amazon, Facebook, some local universities and a fair number of small companies that have deployed their own clusters (or are running on EC2).

The good news first, hadoop is real and it’s getting real use. It’s clearly a promising platform with active use and development. The scaling model is fairly simple: buy more machines. The current sweet spot is dual-quad hosts with 4×1TB drives and 16GB or so of ECC RAM. Decoupling storage from a central system (à la SAN) is the way to go. Some folks have tried to hook up Thumpers to Niagara chips that run a lot of threads in parallel with some success but the TCO question is unclear.

Hence we can start with a handful of cheap machines and go from there. A few things to watch for: the secondary name node for instance, is there here for backup but to persist the DFS layout structures that exist in RAM on the primary name node. It could have been implemented in a more robust fashion using a sql database rather than requiring a re-implementation of redo logs and data files.

That’s overall the negative point: applications built on the platform (such as hive, hbase and pig) are still pretty much works in progress, somewhat duplication functionality. There is an air of Not Invented Here that still pervades but it’s a sign that the whole thing is still young. A vocal user base that meets regularly should help the project focus on the pieces that truly do not exist yet.





Very interesting talk about SmugMug

4 06 2009

A few key points: 2 ops people, automatic scaling, 1000s of cores on EC2, PBs of storage on S3.

http://mysqlconf.blip.tv/file/2037101





Tokyo train map meets Internet powerhouses

16 04 2009


Web Trend Map 4 Final Beta

Originally uploaded by formforce

Cute.





A sensible approach to source code branching

8 04 2009

Source code branching is one of the most contentious activity that you can engage in a software company. For some reason that’s eluding me, I keep hearing the same arguments over and over again about why we should not use branches, about how branching is hard. It’s not, neither conceptually, nor practically, it simply requires to be methodical and to overcome a visceral fear of the *Merge*. It works more or less with all current tools, with CVS probably the hardest to deal with and the last batch of distributed source control, the easiest.

One of the primary problems that Feature Crews address is the difficulty of maintaining the integrity of very large code bases under development (imagine 1000 developers coding against a 10,000,000 line system). FC poses the problem as the tension between a) keeping the main branch as current as possible, and b) keeping the main branch as robust as possible. The FC solution is to make features an atomic transaction. A feature is either 0% complete or 100% complete, and a feature is not 100% complete until it can be demonstrated that it satisfies the same quality criteria as the rest of the main branch.

Here’s an excerpt from Lean Software Engineering. FC in this context means “feature crew”.

“Features-in-process are not allowed on the main branch. The FC alternative is branch-by-feature. A crew takes a branch when it takes possession of the feature kanban. The crew is responsible for forward-integrating any changes that are checked into main while their feature is in process. That is, if another crew integrates and breaks your feature-in-process, it’s your responsibility, not theirs. When your feature is finally complete AND you have integrated with all changes on main AND you pass all of the quality gates, THEN you can reverse integrate your feature into the main branch, and everybody else will have to forward integrate your changes.”

Here it is: use branches extensively, merge back and forth. It takes some time, a bit of practice, but it puts to rest these endless discussions about whether we should branch, when and what for.





Robots

5 04 2009

Somewhat unrelated to the topic of data crunching and computing I wanted to mention an eye-opening book about robots: Wired for War by P.W. Singer.