Yesterday at CloudCamp, a few of us discussed methods to store and use sensitive data on a public cloud, where you presumably do not have strong assurances that your data are for your eyes only. To keep structured data (e.g. relational data) a pattern emerged among participants assuming your data have an easy key:
- Store all non-identifiable data in the cloud, keyed by an arbitrary identifier.
- Keep actual identifiable data on-premises with the mapping to that arbitrary identifier.
- Let the client device resolve the mapping locally.
For instance suppose you store transactions, this would require at a minimum the scrambling of the transaction details such as item name and party name. That offers mitigation against simple analysis of the data, using statistical methods to derive information, which can be acceptable. Of course everyonen remembers the AOL search term fiasco, where people could be identified based on the search terms. Which is why this scheme should work best if the data are highly structured.