Looking into system performance of an Oracle data warehouse

9 10 2009

Introduction

This is the start of an ongoing investigation into system performance of an oracle 10.2 data warehouse being loaded . The database server has 2 real storage volumes (called dw-clear and dw-encrypt) and 1 virtual one (dw-encrypt-u) used to decrypt data on the fly. Most of the data and the i/o are on the dw-clear volume.

System-data performance have been collected via sadc -d to capture per-device statistics. The data are then extracted using sadf -d filename -- -d -b -d. The summary is available here as a csv. It’s a large table of block i/o stats, cpu stats and per-device i/o stats, suitable to be imported into R.

The system characteristics are as follows.

  • Sun x4150 64GB RAM, 2×4 x5450, 1 4Gb/s QL2462 HBA with 2 ports.
  • 3 device-mapper devices, 2 using a round-robin multipath (v1, v2), 1 using an on-the-fly cipher to decode encrypted data (v3).
  • 3PAR S400 with 10k drives and 4Gb/s HBAs.
  • Out of the 64GB, 8GB are set aside as HugePages to serve as memory pages for the SGA.

The goal of this investigation is to understand what the bottleneck is in the processing and what can be done to remove it.

Let’s start with cpu utilization.

Distribution of CPU time spent in userland when not idle

Distribution of CPU time spent in userland when not idle

Not terribly loaded (I’m filtering out the long idle portions with user > 5. How about I/O?

% of CPU spent waiting on IO

% of CPU spent waiting on IO

Interesting, iowait is not negligible. Is it correlated to anything in particular? First of all, let’s see how iowait varies with device utilization of v1.
IOWait against v1 device utilization v1 is slowly but surely bringing iowait higher, to the point than more than one processor ends up waiting on I/O.

To be continued…





Blog battle on the storage appliance front

3 09 2009

Backblaze has started an interesting conversation by detailing how they get to $117,000 per PB, down to the type and number of SATA card used in their design. A great PR move for a company in the crowded personal backup space. Of course publishing comparisons with Dell, Sun, NetApp and EMC at 8x, 10x, 30x the price is a sure way to start stirring people’s emotions. The first to publish a lengthy response (that StorageMojo could find) is Joerg Moellenkamp in a blog post. Laudable in pointing design flaws for fundamentally 2 different markets. Sure, Sun’s hardware is a great piece of engineering, squarely aimed at the enterprise market. Which, incidentally, is not buying in droves and Sun’s financials is clearly reflecting that. Backblaze took the google route for storage and it’s hard to see, given the competitive pressure, how they would be better off spending their margin on Sun hardware. The era of gold-plated hardware is slowly drawing to a close and I can’t say I oppose that change.





Netflix describes its culture

4 08 2009





Catching up on Velocity 09

29 06 2009

This year I could not attend Velocity so I decided to catch up via http://velocityconference.blip.tv. Here are a few notes on the sessions I have been able to see so far.

John Allspaw (Ops) & Paul Hammond (Dev): 10+ Deploys per day: Dev/Ops coöperation at Flickr

This is a topic dear to my heart: changing the culture shared (or not) by dev and ops.

  • Contrary to popular wisdom, ops’ real mission is not to keep the service stable per se, but to enable the business.
  • Business requires change
  • Build the tools and the culture that allow repeated change with minimal uncertainty.
  • Automate your infrastructure
  • Use one shared source control, between devs and ops so that everyone on the team knows where to look
  • Reduce all manual steps down to one, that of deciding to build and deploy
  • Small frequent changes better than fewer large changes
  • Use “feature flags”, i.e., use code to enable features, rather than branches
  • Ship TRUNK so that everyone knows what gets released
  • Feature flags allow for private betas, reduces uncertainty
  • Dark launches: enable the feature to exercise the data path but don’t present the results to the end-user
  • Metrics, metrics, metrics
  • Add context to it, such as the last time something was deployed
  • We use IRC and IM bots to bring system updates into the conversation between dev and ops in real time, then push the logs into a search engine
  • Develop respect and trust between devs and ops
  • Have a healthy attitude toward failure (don’t blame, fix the problem first)




Started a friendfeed webops public group

29 06 2009

Feel free to join: http://friendfeed.com/web-ops





#structure09 Hosting on commodity hardware

25 06 2009

I just got out of the panel on commodity hardware and did not get a chance to participate so here’s my take on it.

The panel started with an opening question: google, amazon and the likes run at a huge scale on commodity hardware, yet enterprise vendors still push customized hardware and expensive at that.

To me the answer is pretty obvious: enterprise hardware is being for the most part sold to people who don’t know how to architect and design software on a commoditized stack. Let’s be honest, look at most “enterprise” hardware/software literature: it’s just noise and a waste of both the writer’s and the reader’s time. And by stack I mean from the server, all the way up to the application code.

If you constrain yourself to buy servers that cost no more than $5k, buying high-end database software makes little sense. Rather you recognize that low-end compute is how you get economies of scale and you apply the same reasoning to your networking gear, storage systems, database software, load balancing software, etc.

Google, from its earlier papers, seems to be the first to have understood that, rejecting the usual marketing garbage from large vendors. And for that we should be grateful.





I love Amazon Web Services open pricing

17 06 2009

I’ve just spent 2 hours crafting a spreadsheet to compare how much it would cost to set up a decent platform to deliver the kind of data services I manage, vs. the same on EC2. Easy access to pricing is a key variable that’s often hard to get from vendors without being subjected to the “custom solution” time-waste. Technology vendors, your customers, more often than not, know what they want. When I ask for a price list, don’t try to second-guess whether I’ve done my homework, just give me the price list. If I have questions regarding the “solution” I’ll be more than happy to ask.





How about sub-second queries in Hadoop?

16 06 2009

Two observations from talking and listening to people during the Hadoop summit; firstly hadoop is used quite often to process clickstream data — in all fairness I missed the talk about hadoop used for genomics. Secondly and a corollary of the first, sub-second queries in hive or pig are not quite there yet. Since a hive query translate into maps and reductions their scheduling determines in addition to the sheer volume of data is going to determine response time. Undoubtedly pre-computing aggregates is a natural way to go much like what is done for data warehouses.

Where these aggregated should be stored for consumption is a problem that could to hybrid solutions. Process data with hadoop and export then to postgres or infobright to enjoy a more mature (but less scalable) run-time environment. Get multi-terabyte daily processing and sub-second analytics and all that open source.

If you’ve done something like that, I’d be interested to know before I embark on a route where others have failed before.





Notes from the 2009 Hadoop Summit West

13 06 2009

I just got back from Santa Clara where Yahoo and Cloudera were hosting the 2009 Hadoop Summit West on Wednesday followed by a training on Thursday. My interest was one of a prospective user — to gauge how real and mature hadoop is.

The turn-out was more than decent, in the hundreds; a number from Yahoo, running the largest clusters so far, a few folks from Amazon, Facebook, some local universities and a fair number of small companies that have deployed their own clusters (or are running on EC2).

The good news first, hadoop is real and it’s getting real use. It’s clearly a promising platform with active use and development. The scaling model is fairly simple: buy more machines. The current sweet spot is dual-quad hosts with 4×1TB drives and 16GB or so of ECC RAM. Decoupling storage from a central system (à la SAN) is the way to go. Some folks have tried to hook up Thumpers to Niagara chips that run a lot of threads in parallel with some success but the TCO question is unclear.

Hence we can start with a handful of cheap machines and go from there. A few things to watch for: the secondary name node for instance, is there here for backup but to persist the DFS layout structures that exist in RAM on the primary name node. It could have been implemented in a more robust fashion using a sql database rather than requiring a re-implementation of redo logs and data files.

That’s overall the negative point: applications built on the platform (such as hive, hbase and pig) are still pretty much works in progress, somewhat duplication functionality. There is an air of Not Invented Here that still pervades but it’s a sign that the whole thing is still young. A vocal user base that meets regularly should help the project focus on the pieces that truly do not exist yet.





Very interesting talk about SmugMug

4 06 2009

A few key points: 2 ops people, automatic scaling, 1000s of cores on EC2, PBs of storage on S3.

http://mysqlconf.blip.tv/file/2037101