Identifying & Protecting Overlooked Sources of PII
By Duncan Bradley | April 3, 2018
3 minute read
Identifying, monitoring, and protecting Personally Identifiable Information (PII) is crucial to meeting compliance standards, mitigating risk, and maintaining the trust of customers and shareholders. With the arrival of the General Data Protection Regulation (GDPR) in Europe—and the likelihood of future tightening of regulations around the world—many organizations are adopting stronger PII security compliance policies and procedures. However, even organizations with robust PII protection policies are challenged when it comes to identifying every single source of PII in their systems.
Keep reading for a deep-dive into the importance of PII protection beyond just compliance, and how organizations can uncover and protect a few commonly overlooked sources of PII.
PII protection as part of overall data security
If we can learn anything from the well-publicized data breaches in recent years, it’s that maintaining PII data security is more than just a compliance matter. In reality, there’s no one, perfect security solution; no company has impenetrable walls when it comes to data security. Enterprises need sound perimeter defenses, but they also need to minimize the risk to their clients if a breach occurs. Instituting strong PII access and management protocols is instrumental in meeting this objective.
If companies don’t protect and identify their PII, they could face considerable costs. In the case of Equifax, for instance, a 2017 data breach resulted in the exposure of the social security numbers of 145.4 million Americans. A Home Depot breach in 2014 resulted in the theft of 56 million credit card numbers and cost the company an estimated $3 billion.
To mitigate these risks as much as possible, organizations must cull their PII so that only personal data relevant to ongoing business processes is retained. Everything else must be identified, deleted, masked, or encrypted so that personal customer data isn’t at risk if a breach occurs. Before the culling and mediation can be done, however, all of the company’s PII must first be located. There are several reasons why rooting out all PII records can be challenging for an organization.
The PII proliferation problem
One of the biggest problems when it comes to locating and protecting PII is that PII tends to proliferate into locations that are easily overlooked. After the initial take-in of customer records—which is generally handled in a very orderly and structured way—PII tends to get duplicated and stored in systems outside the main customer service platform. Different lines of business or teams may use PII for purposes like research projects or analytics and end up duplicating and storing client data in fileshares or ECMs. Because of this replication, most companies have vast volumes of PII lurking in unseen locations. In fact, it’s unlikely that any systems within a company are completely clear of PII.
PII content can be non-searchable
To further complicate the challenge of identifying and protecting the entirety of a company’s PII data, much of the content is often unstructured and non-searchable. A customer support team working on a research project, for example, may duplicate PII and then share it with their team by email. Emails are difficult to search because they often aren’t a single delivery system (with a header, footer, and body content), but can be threaded chains of email or nested emails, and may contain attachments which aren’t searchable (TIFFs, PowerPoint files, etc.) In a sense, when the PII in these emails becomes non-searchable, the files go “off the map,” and organizations may not even be aware it has happened.
An extra degree of search sophistication is needed to root out and identify these unstructured forms of PII. For instance, to determine if there is PII in an image-only PDF attached to an email, the system must be able to locate the records and query them to see if they’re searchable. If they aren’t, the records must be converted to a searchable format.
PII data complexity
PII doesn’t just get overlooked because it’s duplicated and stored in unlikely locations (or because it gets put into non-searchable formats), it can also be difficult to locate because there is often a great deal of variance and complexity in the way the data is structured.
Even amongst relatively structured data records, there is a tremendous amount of variability. If a company does business in the US, it will have to be able to deal with differences in the way variables are structured on each state’s driver’s license. Credit card data is different from vendor to vendor. The result is that general searches will not turn up all the possible PII returns.
And then there’s PII that is contained in strings that are proprietary to a company—variables like claim, policy, and account numbers—all of which may be organized in unique ways. Effectively searching all of these different types of variables requires the creation of hundreds of different search strings—one for each variant of a piece of information. And that requires a truly advanced file analytics capability. Otherwise, many of those PII variants will go “dark,” unable to be located.
PII needs to be identified, protected, and managed everywhere it’s found within an organization. If not necessary for business operations, it must be deleted, mediated, or masked. However, PII often proliferates due to duplication, is typically saved in non-searchable formats, and exist within complex data structures. These factors make it difficult for many enterprises to root out and remediate PII. To achieve full compliance and protect clients’ personal information, organizations must develop and implement robust PII protection policies and use advanced analysis tools to uncover every overlooked source of PII data.