It is often assumed that the problem of data retention is about how to backup data and then restore it quickly and accurately, if there is a security event or system crash.
But, there are important cases, where the best data retention strategy is not to backup the data at all.
The process of backup is fairly well understood today and there are technologies for backing up data all the way from personal data backups on flash memory drives and Internet backup services to robots and cassette technologies for backing up terabytes of data.
Restoring data from backup is also nominally a fairly straightforward exercise, although I suspect that most businesses with well-oiled backup procedures generally don’t bother testing their backup media to see if they can actually restore the data.
But – there is another dimension to data retention besides backup and restore and that is minimizing the threat surface of sensitive data: PII (personally identifiable information) and ePHI (protected healthinformation stored in an electronic format).
Let’s take the case of a typical business that has customer data, commercial information and intellectual property related to a development and/or manufacturing process. What is more important in our data retention strategy: Backup and restore of customer data? backup and restore of contracts or backup and restore of source code? The only way to answer this question is to understand how much these assets are worth to the company and how much damage would be incurred if there was a data breach.
For the purpose of asset valuation, we distinguish between customer data without PII and customer data that may have PII. Let’s consider 4 key assets of a company that designs and manufactures widgets and sells them over the Internet.
1. Customer data that may have some personal identifiers. The company may not deliberately accept and process customer data with attributes that would enable a third party to identify end users but such data may be collected in the course of marketing campaigns or pilot programs and and stored on company computers. At the end of the marketing campaign, was the data removed? Probably not. In the case of a data breach of PII, it does not matter what the original intent was, the liability is there. The company will pay the cost of the disclosure all the way through investigative audit through possible litigation.
2. Customer data with no personal identifiers. Best practice is not to store data with PII at all, if the business needs numerical data for statistics, price analysistrend analysis of sales or simulations for new products, the analysis can be done on raw data without any PII.The best security control for PCI DSS and HIPAA is not to store PII at all.
3. Company reputation. If there was a data breach, chances are company reputation may be tarnished for a while but notoriety is a form of publicity that can always be spun to the company’s advantage.
4. Intellectual property – for example, chemical recipies, algorithms, software engineering and domain expertise. The damage of IP data loss can be sizable for a business, especially for an SME. Here – the data retention strategy should focus on highly reliable backup and restore with data loss prevention to block leakage of sensitive digital assets. There is an ethical component to protecting IP and that means making sure that your employees and contractors understand the importance of protecting the business IP.
Note that in the life cycle of a customer data breach, damage first accrues from attacks on the data assets followed by reputational damage as the company gets drawn deeper into damage control, investigation and litigation.
But what about the customer data?
How do you minimize the customer data security threat surface?
In 3 words, your data retention strategy is very simple:
Don’t store PII.
Decide now that sensitive data will be removed from servers and workstations. Make sure that customer data with PII is not backed up.