Privacy Enabling Technologies: De-identification

Privacy Enabling Technologies or "PET's" are those technologies that provide options for using personally identifiable information ("PII") of individuals.  These individuals can be customers, employees or patients.  Furthermore, your company may collect such information in a regulated industry, such as financial services (GLBA) or healthcare (HIPAA), or possibly through customers online through a website. PET's are most commonly thought to be implemented to provide the security of such information against unauthorized disclosure.  Such protective PET's would include encryption, access controls, integrity controls or secure destruction technology.  However, there are those PET's that serve to not only safeguard privacy, but also enable expanded use of such information.  De-identification is one such PET.

De-identification is the process of rendering PII either unidentifiable or modifying a PII data set to have a very low risk of re-identification.  Many discussions on de-identification involve the use of health information.  This is primarily because of the laws in place, the Health Insurance Portability and Accountability Act ("HIPAA") has the most robust standards for de-identification.  Although HIPAA does include privacy and security requirements for covered entities, many people forget that HIPAA was first implemented to encourage the digitization of health information to increase sharing - indeed to make it easier to use and share such information.  Furthermore, because of this standard and the 10+ years since its enactment, HIPAA has also provided the most opportunity for testing and implementation of de-identification.  Therefore, in my opinion, we know what we know about de-identification because of the work being done under HIPAA.  This is the case not just in the U.S., but globally, as other countries use HIPAA as the model for their approaches to de-identification.  Thus, HIPAA really is the standard by which most technologies and processes for de-identification are evaluated, regardless of the type of information in use.

To understand de-identification, and I would argue to better understand privacy in the Internet Age, you have to consider personal identifiers, both direct and indirect.  "Scot Ganow" is a direct identifier.  My name identifies or isolates me from a broader population.  "Faruki Ireland & Cox P.L.L.", my employer, is an indirect  identifier.  Thus, by itself, "FI&C" does not identify me.   But, used with other indirect identifiers such as gender, age, or law school attended, "FI&C" could be used to identify me from others who work here and from the larger population.  To de-identify data properly, a process or technology must account for these direct and indirect identifiers, as well as how the resulting data set will be used.  Under HIPAA, there are two means by which you can render data de-identified.

One is the safe harbor method, by which all direct and indirect (or quasi) identifiers are removed.  While this conservative method is secure, it strips out a great deal of information that could have value for a variety of reasons.  The second method is the use of an expert statistician, and by extension an approved technology, to assess any method use of limited data sets that may include direct and indirect identifiers.  Any assessment would consider a  combination of identifiers  and a variety of other factors (population, industry, party with whom the information is shared) to determine if the resulting data set is considered to have a very low risk of re-identification.

Generally, regardless of the method used, if completed properly, the data set is no longer consider identifiable and can be used outside the constraints of regulation.  This transformation opens the door to numerous benefits to include enhanced marketing, the selling of such data, and improved research through the combining or linking of such de-identified data sets.  Indeed, the Centers for Disease Control uses such information to track outbreaks in the U.S. Another benefit is that as de-identified data is not identifiable, any breach or unauthorized disclosure of de-identified information does not trigger data breach responsibilities, to include notice under various state and federal laws. Lastly, as I stated before, I think the use of PET's, like de-identification, not only provide enhanced information use without compromising privacy, but their use and implementation further improve an organization's approach to privacy because they force you to think about the interplay of data elements, data collection, and data sharing - all things implicit in privacy today.

About The Author

Scot Ganow |