Differential Power Analysis is a neat way to cryptanalyze smart cards and that triggered an interesting counter-measure: keeping power consumption constant regardless of the computation performed. Moving to a bigger scale and assuming low cloud compute costs, one could hide sensitive data processing in one VM by running ninety-nine others with slightly different data, whose results will be discarded silently.
Fun but not pratical: cloud computing steganography
4 04 2009Comments : Leave a Comment »
Tags: impractical, random
Categories : architecture, cloud computing
#Cloudcamp: storing sensitive data in a public cloud
2 04 2009Yesterday at CloudCamp, a few of us discussed methods to store and use sensitive data on a public cloud, where you presumably do not have strong assurances that your data are for your eyes only. To keep structured data (e.g. relational data) a pattern emerged among participants assuming your data have an easy key:
- Store all non-identifiable data in the cloud, keyed by an arbitrary identifier.
- Keep actual identifiable data on-premises with the mapping to that arbitrary identifier.
- Let the client device resolve the mapping locally.
For instance suppose you store transactions, this would require at a minimum the scrambling of the transaction details such as item name and party name. That offers mitigation against simple analysis of the data, using statistical methods to derive information, which can be acceptable. Of course everyonen remembers the AOL search term fiasco, where people could be identified based on the search terms. Which is why this scheme should work best if the data are highly structured.
Comments : Leave a Comment »
Tags: cloudcamp, newyork
Categories : architecture, cloud computing, conference
Open cloud manifesto, not much radicalism here
31 03 2009The manifesto triggered copious traffic thanks to the backroom-smoke-filled air of its inception. I wish the same could be said about its less-than-radical contents. If you were expecting a stated vision about the cloud as the substrate of all future computing that’s not a mobile phone or nettop, no such thing there. It sounds more like the cries of small players about to be crushed by the non-signatory parts, i.e. Amazon, Google and Microsoft.
Comments : Leave a Comment »
Tags: cloud computing, manifesto
Categories : cloud computing, distributed computing, document
Posting from emacs
30 03 2009Tiny post about using weblogger.el, I already feel so l33t.
Comments : Leave a Comment »
Categories : Uncategorized
Interesting data growth factoid
20 03 2009From http://aws.amazon.com/publicdatasets/
“United States demographic data from the 1980 (approximately 2 GB), 1990 (approximately 50 GB), and 2000 US Censuses (approximately 200GB)”
Should we expect a 400GB volume for the 2010 census, or 2TB in 2020 and 200TB in 2040? Probably the latter once you start adding biodata.
Comments : Leave a Comment »
Tags: census, data, growth, scale, storage
Categories : Uncategorized
Thinking about IT Operations and Kanban
18 03 2009As our developers are transitioning to an agile methodology, we have been figuring out how to adapt our operational processes to a more regular schedule that fixed-length, 1-month-long sprints are going to entail. So far we have worked in a more waterfall approach with a high-level of interrupts, taking on projects, doing an upfront analysis to break work into small chunks and piping such chunks through FogBugz to track progress. Recently the team has been reading about kanban as a way to formalize flow and make under-capacity visible. While I believe we have adopted an informal pull-driven process, now is the time to formalize all this so as to properly communicate whether and when infrastructure projects can be delivered.
The first round of experiments is taking shape, more to follow shortly: 
Comments : Leave a Comment »
Tags: it, kanban, operations, stickies
Categories : Uncategorized
“The illusion of unlimited supply…”
18 03 2009Berkeley’s Reliable, Adaptable, Distributed Systems Lab has produced a nice synthesis of the current technological underpinning of cloud computing, in a paper called “Above the clouds“. StorageMojo and Perspectives have done a fine exegesis of the paper so I thought I’d focus on a claim that has caught my attention.
The claim
That claim is the fundamental premise of perceived, unlimited compute and storage elasticity: “The illusion of infinite computing resources available on demand, thereby eliminating the need for Cloud Computing users to plan far ahead for provisioning”.
I would argue that this illusion has to be dispelled for a stable and long-term development of the clouds. Being in its infancy demand is still fairly limited (Amazon claims 400,000 registered AWS customers) so as a customer I can operate on the assumption that any individual demand does not significantly affect supply.
To make sure that individual demand does not rock the boat, a provider such as Amazon rations the amount of resources available to any customer, so that fluctuation in demand can be absorbed by their resource allocator. To go beyond that limit requires to enter a longer discussion with the provider, so that it can ensure that its resource allocator will handle the peaks and troughs and demand. This is a classic production/price control scheme.
Amazon Reserved instances
Recently Amazon has introduced the concept of “reserved instances”; a one-time payment per instance opens allows for lower per-hour charges. Presumably that one-time payment, while still profitable, allows the provider to better predict future sustained demand.
Firstly it is only worth getting a reserved instance if you plan to use it:
- more than 193 straight days for a year or,
- more than 99 days per year for 3 years.
193 straight days is really 12 hours a day every day of the year if your business is cluttered around a few timezones. A three-year contract gives you 660 days of free run-time compared to signing up for 3 consecutive years.
It’s a clever move. Let the customer pre-pay all or part of the fully-burdened, marginal cost of an instance, yet retain a variable part so that it does not turn into a all-you-can-eat feast that would devour margins.
Also consider that in the Amazon Web Services ™ Customer Agreement, “reserved instances” are really about “reserved instance pricing”, not about Amazon reserving capacity to serve these instances, their Service Level Agreement notwithstanding.
This should allow Amazon to achieve at least 3 objectives:
- better plan for capacity with at least part of the marginal cost of instances, pre-paid,
- give corporate customers a greater perception of control and security, however tenuous it is in reality,
- differentiate corporate customers (usage patterns, utilization) and tweak the resource allocator to that purpose.
As a side note I have not found a clause restricting the reselling of reserved instances though margins would be low and eaten away by the payment and billing system used by the reseller (e.g. Amazon FPS).
Resource allocation
For the provider this is an interesting and crucial problem to solve, that of properly oversubscribing their actual physical resources (physical cpus, physical drives, physical network pipes). To oversubscribe physical resources through virtualization at a large scale is after all, what the cloud is about. It’s also the name of the game in traditional banking: loan 9 times more money than you have in trusted assets.
So as a provider, using the circulated figure of a “best-case” 30% compute utilization, with 100 physical compute units I should be able to lease at most 300 units. This is of course simplistic and there is a wealth of literature on the topic of oversubscribing commodities, be it money, phone minutes, bandwidth or plane seats.
All providers being for-profit enterprises we can foresee that they are going to drive their oversubscription to the limit and the winners are likely going to be driven by 2 factors:
- the marginal cost of physical resources (compute, storage, network) and management overhead for each,
- the sophistication of the resource allocator.
The first factor, the marginal cost of physical resources, is, as the authors of the paper point out, varies with the inverse of scale. The bigger you get, the cheaper it becomes to operate your data centers, the current figures documented call for a 5x-7x reduction in cost when you reach internet scale. Clearly this is not going to leave a lot of breathing room for the mom-and-pop operations.
The second factor is by and large, independent of scale. How high you can drive your data center utilization as a cloud computing provider, without running out of resources, is going to make a significant difference once you’ve reached internet scale.
The crystal ball
Since this market is still young and because the barrier of entry to reach a profitable zone I suspect that the smaller players that do not offer a lot of added value on top of computing infrastructure leasing will have to drive their resource allocator very aggressively; too aggressively, which means that they’ll end up losing customers because in the end oversubscription will reach unsustainable levels.
Once the small operators are out of the picture, the 3 major players will either cartelize the market or contend over price by deploying a better resource allocator.
In the former case we would be looking at a situation analogous to the oil market or the U.S. ISP market; the illusion of competition shrouding an illicit price fixing agreement.
In the latter case, assuming a greater shift on cloud computing, the best resource allocator will have to buffer demand spikes and maintain a high utilization. Make switching from one provider to another swift, public and reversible, and you have a public market where prices are not driven by supply and demand (who in 2009 can still believe that?) but also by trust and perception.
Without more transparency to let the public get a sense of inventories (e.g. through a clearinghouse), who’s to say we won’t see another bubble?
Comments : Leave a Comment »
Tags: cloud computing, economics, market
Categories : architecture, cloud computing
Why is eucalyptus written in C?
16 03 2009Today I perused the source code of eucalyptus 1.4 and I was a bit surprised to find out that it’s written in C, even when most of the system interaction is done through libvirt. I have to say this makes me a bit queasy. Python/Ruby/(insert your favorite high-level language) would have excelled at that game.
Comments : Leave a Comment »
Tags: c, ec2, eucalyptus, question
Categories : cloud computing, infrastructure
IPSec throughput tests between 2 Soekris 5501 running pfsense
11 02 2009Late last year I was playing with pfsense in order to replace ssh+vtund connections between sites with a cleaner ipsec rig. To that effect I set up 2 soekris 5501 with HiFn crypto accelerators, directly connected via a Cat-6 ethernet cable, both running pfsense-1.2 (I forget which release candidate) and was able to pipe 20Mb/s using 256 bit-AES ESP (note the little b as bit, not byte). I controlled for ethernet limitation by sending 8x-10x as much data over the same link without ipsec.
Comments : 2 Comments »
Tags: 5501, benchmark, crypto, hifn, network, soekris, vpn
Categories : hardware, infrastructure
Fun with R and wifi art
8 02 2009On Friday I was having a frustrating experience with our wifi system, a nifty setup from Extricom. Ping round-trips were varying widly but always going back from a 1 s to a sub-10 ms elapsed time. I ran ping for a bit, pushed the data through R and was extremely surprised by the results. Since this particular setup uses a controller that allows seamless hand-over from one access point to another, I suppose that my bits are being transmitted by different access points at least every second.
A pleasant graph nonetheless.

Latency (ms) of sequential 1-second pings
Comments : 1 Comment »
Tags: art, extricom, jitter, ping, R, wifi, wlan
Categories : R, hardware, pdf