I’ve just spent 2 hours crafting a spreadsheet to compare how much it would cost to set up a decent platform to deliver the kind of data services I manage, vs. the same on EC2. Easy access to pricing is a key variable that’s often hard to get from vendors without being subjected to the “custom solution” time-waste. Technology vendors, your customers, more often than not, know what they want. When I ask for a price list, don’t try to second-guess whether I’ve done my homework, just give me the price list. If I have questions regarding the “solution” I’ll be more than happy to ask.
I love Amazon Web Services open pricing
17 06 2009Comments : Leave a Comment »
Tags: price, pricing, rant
Categories : Uncategorized
Tokyo train map meets Internet powerhouses
16 04 2009Cute.
Comments : Leave a Comment »
Categories : architecture
Robots
5 04 2009Somewhat unrelated to the topic of data crunching and computing I wanted to mention an eye-opening book about robots: Wired for War by P.W. Singer.
Comments : Leave a Comment »
Categories : Uncategorized
Posting from emacs
30 03 2009Tiny post about using weblogger.el, I already feel so l33t.
Comments : Leave a Comment »
Categories : Uncategorized
Interesting data growth factoid
20 03 2009From http://aws.amazon.com/publicdatasets/
“United States demographic data from the 1980 (approximately 2 GB), 1990 (approximately 50 GB), and 2000 US Censuses (approximately 200GB)”
Should we expect a 400GB volume for the 2010 census, or 2TB in 2020 and 200TB in 2040? Probably the latter once you start adding biodata.
Comments : Leave a Comment »
Tags: census, data, growth, scale, storage
Categories : Uncategorized
Thinking about IT Operations and Kanban
18 03 2009As our developers are transitioning to an agile methodology, we have been figuring out how to adapt our operational processes to a more regular schedule that fixed-length, 1-month-long sprints are going to entail. So far we have worked in a more waterfall approach with a high-level of interrupts, taking on projects, doing an upfront analysis to break work into small chunks and piping such chunks through FogBugz to track progress. Recently the team has been reading about kanban as a way to formalize flow and make under-capacity visible. While I believe we have adopted an informal pull-driven process, now is the time to formalize all this so as to properly communicate whether and when infrastructure projects can be delivered.
The first round of experiments is taking shape, more to follow shortly: 
Comments : Leave a Comment »
Tags: it, kanban, operations, stickies
Categories : Uncategorized
Velocity: Panel, a survival guide
24 06 2008Panelists: presented by Adam Jacob (HJK Solutions), Shayan Zadeh (Zoosk, Inc. ), Brian Moon (dealnews.com), Don MacAskill (SmugMug), John Allspaw (Flickr (Yahoo!)), Michael Halligan (BitPusher, LLC) and a gentleman (Fotolog)
Don McAskill: Rafael Nadal started to win Roland-Garros and his fanclub was there. He won the Open, which created a huge spike. Comments had to be turned off for the site to survive. The next year, he won again and stats had to be turned off. For his third victory servers did not collapse. This year he won and we did not even register.
John Allspaw: code gets pushed 20 to 30 times a day… Major events triggered traffic spikes.
Don would love to not operate a data center anymore, despite their expertise.
John: DB problems are hard [everyone in agreement, myself included]
[Discussion follows on scalablity: do not optimize for scale too early]
Don: EC2 is not worth it for servers that run around the clock, but if you’re good at shutting down instances that you don’t need.
Comments : 1 Comment »
Tags: ec2, flickr, panel, smugmug, velocity
Categories : Uncategorized
Velocity: Sean Quilan @google, Storage at scale
24 06 2008Strategy: buy lots of commodity hardware, because problems tend to be too big for their problem space. Hardware reliability is not that useful as well because it’s expensive.
[Showing the same pictures over and over again, someone from Google PR, please authorize the release of newer pictures]
[A GFS description follows, nothing new so far, read the papers on the topic]
[A BigTable description follows, same deal]
I wish this talk had some new information…
Comments : Leave a Comment »
Tags: google, velocity
Categories : Uncategorized
Velocity: Brent Chapman @great circle, what can IT professionals learn from emergency services?
23 06 2008Example: a car hits a fire hydrant. Lots of agencies involved (fire dpt, ambo, police, electrical company). How do they coördinate all that?
Incident Command System is the protocol used in pretty much all emergency situations (courses available here).
I’ll put a pointer to slides, the example used in the talk is good. The wikipedia article is supposedly good and this article from ham radio operators is a good introduction.
Comments : Leave a Comment »
Tags: emergency, ics, velocity
Categories : Uncategorized
Velocity: Luiz Barroso @google, efficient energy ops
23 06 2008Hypothetical energy cost extrapolations, 5 years from now, hardware could be only 20-50% of the total energy costs.
Efficiency defined as computing speed divided by power. Can be broken down further (computing speed / power provided to chip x power provided to chip / power provided to server x power provided to server / power provided to data center).
- Data center efficiency, PUE around 1.83, worse if data center is underutilized
- Server energy efficiency, 25% dissipated by power supply
From uptime institute, 10-year energy costs, $9/W for consumption, $10-22/W for data center build out.
Rough cost breakdown: 50% on hardware, 22% on energy, 28% on data center (assumptions, dual socket x86, 4 year depreciation, 70% load at peak).
How to be more efficient:
- consolidate workloads
- measure actual power usage rather than rely on nameplates
- investigate oversubscription
Oversubscription potential rises as the number of machines grows so oversubscribe at the data center level. Also mix workloads and be ready to kill instances if you get close to the limit.
Source: Energy-proportional computing
Consider a data center as a device (5,000 machines), distribution with 2 peaks, one at 5% utilization, another around 30%.
Typical power efficiency of a typical server, a machine running at a load of 0.3 is at 60% power efficiency, while a fully loaded machine is at 100% power efficency, and sadly data center are very rarely at 100% as seen before.
The idea behind energy-proportional computing: a generally proportional relation between work and power. Idleness in a server is scarce. It should happen at the electronics because in software it’s much harder (think of kernel getting interrupts all the time).
If you breakdown power by component, you find out that the CPU is much-more proportional than the rest of the components so even powering down the cpu the total savings are still between 10% and 20% of power gains.
Still CPUs have 2 important power-usage features:
- wide dynamic power range (ram, disks and network devices remain in a much closer power range)
- active low-power modes, where the cpu can do things
People, which average around 120W, have a 20x dynamic power range, compared to a 2x of a PC.
In conclusion, write fast code (biggest contribution to energy efficiency), consider reduction of all energy-related costs (provisioning), and demand energy-proportionality from equipment manufacturers.
Plug: http://climatesaverscomputing.org
Comments : Leave a Comment »
Tags: datacenter, energy, google, power, talk, velocity
Categories : Uncategorized
