Velocity: some perf. tools used at Google

23 06 2008

Grinder, jMeter and some Windows tool, whose name I did not catch.





Velocity: Luiz Barroso @google, efficient energy ops

23 06 2008

Hypothetical energy cost extrapolations, 5 years from now, hardware could be only 20-50% of the total energy costs.

Efficiency defined as computing speed divided by power. Can be broken down further (computing speed / power provided to chip x power provided to chip / power provided to server x power provided to server / power provided to data center).

  • Data center efficiency, PUE around 1.83, worse if data center is underutilized
  • Server energy efficiency, 25% dissipated by power supply

From uptime institute, 10-year energy costs, $9/W for consumption, $10-22/W for data center build out.

Rough cost breakdown: 50% on hardware, 22% on energy, 28% on  data center (assumptions, dual socket x86, 4 year depreciation, 70% load at peak).

How to be more efficient:

  1. consolidate workloads
  2. measure actual power usage rather than rely on nameplates
  3. investigate oversubscription

Oversubscription potential rises as the number of machines grows so oversubscribe at the data center level. Also mix workloads and be ready to kill instances if you get close to the limit.

Source: Energy-proportional computing

Consider a data center as a device (5,000 machines), distribution with 2 peaks, one at 5% utilization, another around 30%.

Typical power efficiency of a typical server, a machine running at a load of 0.3 is at 60% power efficiency, while a fully loaded machine is at 100% power efficency, and sadly data center are very rarely at 100% as seen before.

The idea behind energy-proportional computing: a generally proportional relation between work and power. Idleness in a server is scarce. It should happen at the electronics because in software it’s much harder (think of kernel getting interrupts all the time).

If you breakdown power by component, you find out that the CPU is much-more proportional than the rest of the components so even powering down the cpu the total savings are still between 10% and 20% of power gains.

Still CPUs have 2 important power-usage features:

  1. wide dynamic power range (ram, disks and network devices remain in a much closer power range)
  2. active low-power modes, where the cpu can do things

People, which average around 120W, have a 20x dynamic power range, compared to a 2x of a PC.

In conclusion, write fast code (biggest contribution to energy efficiency), consider reduction of all energy-related costs (provisioning), and demand energy-proportionality from equipment manufacturers.

Plug: http://climatesaverscomputing.org





Velocity: John Fowler (Sun), Innovation That Drives Opportunity for the Web Infrastructure

23 06 2008

John is responsible for hardware @Sun.

Web is built on a new software stack (varnish, rails, memcache, hadoop, etc.)

Trends:

  1. 16 cores per socket for 2009, Sun, AMD and Intel on the same track. Clock rates will remain the same.
  2. Application memory capacity increasing, working to get 1TB of RAM at commodity prices
  3. ZFS and SSD, enterprise SSD, $0.08 per iops to compare to $2.43 per ios for HDD

[Sun is clearly attacking the storage market by pushing for commoditization of software, as opposed to proprietary systems such as 3PAR, EMC, etc.] Sun is building something like x4500, using an x4450 with 1 32GB ZIL SSD, 1 80GB SSD ad 5 slow SATA drives, same capex, 3 times the throughput.





Velocity: Artur Bergman

23 06 2008

Artur works for Wikia. WoWWikia is the 2nd largest wiki around.

Value of performance and reliability is around

WoW: $520 MM of profit per year, 99% reliable but users expect it, so it’s really about setting expectations.

Operations is about using resources efficiently, reliably and has to be measured against revenues from user and the value of downtime (which must be computed): e.g. cost per page served is vital to guide decisions.

Example from wikia: 20% of all wiki pages went up from 200ms to 15s to load, 35% of pages were slow [per session] but that led to a 15% reduction of “fast” pages viewed, which has a clear cost.

Launched a project with 3 engineers for 4 weeks to improve page performance. Yielded good results but ads network is slowing down the whole thing. Since ads use document.write, wiki overrides it to allow for pages to load without waiting for ads to finish loading. This lead to more pageviews, but about 20% ads are not even loaded (network time-out, users clicks away).





Velocity: Jiffy, open-source performance measurement

23 06 2008

Scott Ruthfield, Whitepages.com, a people search company with 2 bn searches per year, 500 requests/s at peak.

Initial analysis: 8s to return results, sub-second to actually get the data. What’s the source of the slow-down? Possible candidates:

  1. Ads
  2. Microsoft Virtual Earth
  3. Content generated from the results

Toolset (Gomez networks) is not good enough because of poor sampling (20 samples per hours, compared to 1.3 MM requests) [should quantify error margin here, presumably high assuming a normal distribution]

Introducting Jiffy. Objectives: measure anything, with little impact on page performance. Architecture starts with a jiffy.js that generates logs, then loaded into a DB and rolled up.

Basic tenet: mark and measure. One mark, multiple measures.

Miscellaneous features: immediate or batch submits (to not overload measurement system), default browser event measurements (onload, etc.)

Bill Scott @netflix put together a firebug plug-in to capture client-side data.

Source: code.whitepages.com





Velocity: KITE from keynote systems

23 06 2008

This is a bit of an infomercial: http://kite.keynote.com/ a new testing ground for web applications.





Velocity: Green Data Centers by Bill Coleman

23 06 2008

Notes taken from the floor

Problem with current data centers: rising energy costs, increased complexity. Current solution: automate further, just pushing back.

What’s started to happen is an “inflexion point” [I'm not sure I see why this mathemaical term has been chosen], the ability of anyone on the world to be connected to anything in the world, we’re getting there.

The current cloud: 1.0, IT-centric, used to build proprietary applications. 2.0, store everything on the cloud, with security but still proprietary. Commoditization is unstoppable and is happening in the next decade.

How do I get started with green data centers?  Firstly you can shut down servers as soon as you’ve figured out which ones to turn off. The problem is to find out dependencies and shut down the right servers. Why? Save money but the overarching goal is to drive automation by policy [presumably requiring an ontology to let systems know about themselves]. Average utilization percent for VMs is for 50,000 virtual machines is barely 20%, which compared with mainframe utilization figures is quite low (80%).





At O’Reilly Velocity next week

20 06 2008

Looking forward to attending sessions on scalability and operations.





Erlang is at long last getting the break it deserves.

17 05 2008

Facebook chat is a heavy erlang user (so is SimpleDB). Erlang is one of these languages that open your eyes to a new way of programming. Eight years ago, shortly after it was open-sourced, I used it to build a reliable message passing system for a small start-up (that never quite made it). I remember being in awe of 3 features:

  1. The explicit inclusion of time in the language, this is probably the killer feature. You can write elegant program that expect events to happen within a certain timeframe and react if no events show up. Because it is an integral part of the language you have to think about failure and how to handle it.
  2. Hot code upgrades; it was hot in 1999, it’s still hot. Build systems with the aim of zero-downtime, even for releases. With share-nothing architecture this might be less relevant now, but there is often a little shared state that creeps in and requires high-availability of a core component.
  3. Service dependency; a service is built out of components that must be functional for the service to be rendered properly. One often ends up slapping an external monitoring layer on top of the whole thing and kludgy scripts to restart components the best one can based on the data available to the external monitoring layer. With erlang, it’s all in the box, no tools required.

Nice to see a great piece of engineering (you have to read to the book “Programming in Erlang”) getting the exposure it deserves.





Cookie crumbles

22 04 2008

On Monday’s front page of the Financial Times one could read “Google resolve crumbles on ‘cookies’ pledge“, an interesting piece on how earlier inquiries about the role of cookies in “behavioural targeting” had been gently pushed aside after the acquisting of DoubleClick had started, with the apparent benediction or at least indifference of regulatory bodies. As the paper puts it,

Some Google insiders say that as the company’s understanding of “behavioural targeting” has grown, some of its earlier fears about cookies have turned out to seem simplistic, and it has become less clear that the practice raises big privacy concerns.

As much as I like Google’s services and applications I find it disconcerting, to say the least, that the assessment about privacy cannot be clearly and publicly stated (and I doubt, though it is possible, that the paper would have not cited its sources if it could). And more importantly that this much needed assessment could not be conducted by an independent body. Protection of trade secrets I’m told.

It is also for the sake of trade secrets that the “market” for online advertising is run without any real auditing of any kind. In other industries, even with “independent” auditors quite a few irregularities manage to sneak through (see Enron, Countrywide, etc.) so I can only imagine what skeletons we will find, in the closet of a company that won’t let anyone look at how its main inventory is assessed, counted and verified. It is a true instance of self-regulation, back to the meaning of self. But hey, who can argue against a license to print a few billion dollars per quarter? Might is right, right?