“The illusion of unlimited supply…”

Berkeley’s Reliable, Adaptable, Distributed Systems Lab has produced a nice synthesis of the current technological underpinning of cloud computing, in a paper called “Above the clouds“. StorageMojo and Perspectives have done a fine exegesis of the paper so I thought I’d focus on a claim that has caught my attention.

The claim

That claim is the fundamental premise of perceived, unlimited compute and storage elasticity: “The illusion of infinite computing resources available on demand, thereby eliminating the need for Cloud Computing users to plan far ahead for provisioning”.

I would argue that this illusion has to be dispelled for a stable and long-term development of the clouds. Being in its infancy demand is still fairly limited (Amazon claims 400,000 registered AWS customers) so as a customer I can operate on the assumption that any individual demand does not significantly affect supply.

To make sure that individual demand does not rock the boat, a provider such as Amazon rations the amount of resources available  to any customer, so that fluctuation in demand can be absorbed by their resource allocator. To go beyond that limit requires to enter a longer discussion with the provider, so that it can ensure that its resource allocator will handle the peaks and troughs and demand. This is a classic production/price control scheme.

Amazon Reserved instances

Recently Amazon has introduced the concept of “reserved instances”; a one-time payment per instance opens allows for lower per-hour charges. Presumably that one-time payment, while still profitable, allows the provider to better predict future sustained demand.

Firstly it is only worth getting a reserved instance if you plan to use it:

  • more than 193 straight days for a year or,
  • more than 99 days per year for 3 years.

193 straight days is really 12 hours a day every day of the year if your business is cluttered around a few timezones. A three-year contract gives you 660 days of free run-time compared to signing up for 3 consecutive years.

It’s a clever move. Let the customer pre-pay all or part of the fully-burdened, marginal cost of an instance, yet retain a variable part so that it does not turn into a all-you-can-eat feast that would devour margins.

Also consider that in the Amazon Web Services ™ Customer Agreement, “reserved instances” are really about “reserved instance pricing”, not about Amazon reserving capacity to serve these instances, their Service Level Agreement notwithstanding.

This should allow Amazon to achieve at least 3 objectives:

  1. better plan for capacity with at least part of the marginal cost of instances, pre-paid,
  2. give corporate customers a greater perception of control and security, however tenuous it is in reality,
  3. differentiate corporate customers (usage patterns, utilization) and tweak the resource allocator to that purpose.

As a side note I have not found a clause restricting the reselling of reserved instances though margins would be low and eaten away by the payment and billing system used by the reseller (e.g. Amazon FPS).

Resource allocation

For the provider this is an interesting and crucial problem to solve, that of properly oversubscribing their actual physical resources (physical cpus, physical drives, physical network pipes). To oversubscribe physical resources through virtualization at a large scale is after all, what the cloud is about. It’s also the name of the game in traditional banking: loan 9 times more money than you have in trusted assets.

So as a provider, using the circulated figure of a “best-case” 30% compute utilization, with 100 physical compute units I should be able to lease at most 300 units. This is of course simplistic and there is a wealth of literature on the topic of oversubscribing commodities, be it money, phone minutes, bandwidth or plane seats.

All providers being for-profit enterprises we can foresee that they are going to drive their oversubscription to the limit and the winners are likely going to be driven by 2 factors:

  • the marginal cost of physical resources (compute, storage, network) and management overhead for each,
  • the sophistication of the resource allocator.

The first factor, the marginal cost of physical resources, is, as the authors of the paper point out, varies with the inverse of scale. The bigger you get, the cheaper it becomes to operate your data centers, the current figures documented call for a 5x-7x reduction in cost when you reach internet scale. Clearly this is not going to leave a lot of breathing room for the mom-and-pop operations.

The second factor is by and large, independent of scale. How high you can drive your data center utilization as a cloud computing provider, without running out of resources, is going to make a significant difference once you’ve reached internet scale.

The crystal ball

Since this market is still young and because the barrier of entry to reach a profitable zone I suspect that the smaller players that do not offer a lot of added value on top of computing infrastructure leasing will have to drive their resource allocator very aggressively; too aggressively, which means that they’ll end up losing customers because in the end oversubscription will reach unsustainable levels.

Once the small operators are out of the picture, the 3 major players will either cartelize the market or contend over price by deploying a better resource allocator.

In the former case we would be looking at a situation analogous to the oil market or the U.S. ISP market; the illusion of competition shrouding an illicit price fixing agreement.

In the latter case, assuming a greater shift on cloud computing, the best resource allocator will have to buffer demand spikes and maintain a high utilization. Make switching from one provider to another swift, public and reversible, and you have a public market where prices are not driven by supply and demand (who in 2009 can still believe that?) but also by trust and perception.

Without more transparency to let the public get a sense of inventories (e.g. through a clearinghouse), who’s to say we won’t see another bubble?

About alq

Devops entrepreneur
This entry was posted in architecture, cloud computing and tagged , , . Bookmark the permalink.

Leave a Reply