Data Protection Part 1: What Does it Mean to Lose Data?
I recently attended the Storage Networking Industry Association (SNIA) Storage Developer Conference (SDC). It was very much a going-home event. I hadn’t thought about locks and leases in the context of SMB (Server Message Block) and NFS (Network File System) in a while. There was a lot of discussion about data loss and data protection. I realized that even within the storage industry, there’s a lack of clarity about the terms, which results in confusion and splintered approaches to protecting data. Having recently joined a network solutions organization, I realize the broader IT approach to protecting data is even more stovepiped.
The public press makes its own contribution to the confusion. Data breaches are too often subjects of news and commentary. A data breach becomes stolen financial records, emails, credit card numbers, personal identities, etc. Read far enough or listen long enough, eventually someone will say that a company has lost said records, email, etc. This kind of phraseology can cause confusion for business leaders. Anytime a leader says they’re concerned about losing data, one needs to ask, “What does it mean to you to lose data?”
There’s an industry in IT/business technology called Data Loss Protection (DLP). TechTarget defines DLP as “a strategy for ensuring that end users do not send sensitive or critical information outside the corporate network.” Actually emailing data doesn’t cause the data to be lost. Emailing data makes 100s, 1000s, maybe 1,000,000s of copies of the content of the message. A simple way to protect a document is to email it to yourself. It will be preserved in your Sent folder and also stored in your company’s email archive, available for recovery at a later time.
An attribute of digital age is that a lot of value in society is derived from data rather than physical products. Look at the music industry as an example. I’ve always been a music fan. I was an avid album collector in the 70s and 80s. I crank up music while I work (Black Sabbath’s Iron Man is blasting at the moment). I still have a turntable and boxes of vinyl, and I had a cassette player in my ’79 Hurst Olds Cutlass (which I wrote about in my SDN blog post). I made copies of some albums so I could listen while driving. It was a laborious process. It took 45 minutes to create a copy of an album, plus I had to stick around to flip the album. I can’t say I never gave away a copy of an album, but I didn’t have a significant impact to my favorite bands’ incomes.
This was the beginning of the transformation of music from a physical product (vinyl album) to digital (CD). Though the value of the album was always the human experience of music, the monetization of the music was captured in the physical media of an album. I could loan my album to someone, but while it was gone, I received no value from it. When I could copy the music and give it to someone, the value of the music to me was not diminished (ignore my personal moral degradation) by loaning a copy. However, the digital transformation of the music industry enabled someone to rip the music from a CD in minutes and easily share with friends (or the whole world) without that someone losing any value of their music.
Music digitization made it much easier to protect one’s music collection, but also unbound the value of music from the physical media, changing the music industry forever. (For more information on the effect of sharing on the music industry, see a summary about the impact of Napster.) If I broke the album (before making a tape copy) then I had lost the value of the music contained in the album. If I wanted the value of the music, I was going to have to spend additional resources to get it back.
Now, let’s apply this to IT data loss. The loss of data means you have lost the ability to derive the value of the data it once had. Examples:
- Media failure, such as a disk drive failure, that makes it impossible to retrieve the data in a way that can be converted back into information (the spinning door stop).
- Data corruption, the 1s and 0s are no longer arranged in such a way that the upper level applications can understand the data (bit flips, application errors).Lost encryption keys fall into this category, though the encryption software vendor will claim all of the data is there. By the way, an efficient way to provide certifiable data erasure on a disk drive is to encrypt the data and “shred” the key.
- Accidental data erasure. While Microsoft calls it a Recycle Bin and Apple refers to it as Trash, the feature has saved many a user’s sanity. An IT help desk manager in a large company told me the most frequent help desk end user request is file-level or message-level recovery.
The storage industry has created several solutions to address these data loss problems, including backups, snapshots, continuous data protection, and remote replication that can be applied to data sets based on allowable data loss, recovery times, etc. If you have data you absolutely can’t afford to lose, post it to every social media site you can find, every cloud service available, and email it to all your connections on LinkedIn.
However, if the value of your information would be reduced if someone else knew about it, this is not a great data loss prevention scheme. In fact, many data protection schemes increase data exposure. Backup tapes are encrypted so that if they come into the possession of the wrong person, the data will not be accessible.
In part 2, we’ll explore when data is ripped off.