A recent survey shows that organizations are still on their way to GDPR compliance in 2018, the initial step for many is around getting a full view of their entire information chain. It’s the first pillar of data management best practices that businesses need to put into place as they get ready for the new regulation.
To be fully prepared, organizations need to know where data that relates to privacy is today; where does it come from, where does it go, how is it processed and who is consuming it. Given today’s increasingly demanding data management environment, navigating this data requires the implementation of an increasingly complex mapping capability.
To use an analogy, before the advent of sophisticated mobile computing, people bought paper-based maps when they visited a new region. It was the only way to find places they had never been before – but this kind of map was static. It quickly became outdated, as it lacked dynamic context – in other words, there was no way of gauging roadworks, traffic problems, newly-built roads, etc. Making a change required redoing the whole map.
There was a lack of transparency; for example, without any way to track and trace, passengers wouldn’t know if their taxi driver was taking the fastest route to their destination. The advent of GPS changed everything, giving travelers a more accurate and dynamic view of a region, with details of traffic, road and weather conditions constantly updated. Today, people often use GPS even on familiar trips because it can instantly update them of any issues they may encounter during their journey.

“Organisations will need to have a more precise view of their data. Not just where it is stored but also the overall context; a dynamic real-time view of where all the data is located.”
We see a similar shift taking place in data management today. In the past, for many businesses, there was no need for dynamic data mapping. A high-level paper-based view of their landscape was sufficient. That’s changing today with the explosion of data and this change further accelerates as data privacy regulations as GDPR come into force. Organisations now need to have a more precise and up to date view of their data. Not just where it is stored but also the overall context; a dynamic real-time view of where all the data is located. They need to provide transparency for “the rights of the data subject” such as the rights to be forgotten, rights of accessibility, and rights of rectification.
Navigating In-depth Data Management
This explains the rationale behind metadata management – but from that 360° umbrella view now let’s drill down to discover how the concept of in-depth data management applies in the new world. Once again, we find a recurring theme – how drawing connections between disparate data sets is key to the kind of data management that GDPR demands.
One of the biggest impacts that GDPR has is that businesses must take a more holistic view of their private data and its management. In the past, organizations managed data relative to privacy and eventually processed opt-ins but it was typically done in a specific context and limited to one department. If people working in the marketing department were responsible for managing a list of customers that potentially contained private data, they might have had to inform the local authorities about it. Equally, the HR department would take on exclusive responsibility for the privacy of employee data.
That’s all changed. Today, with GDPR in the offing, businesses need to have a comprehensive view of the private data they are managing. One business may know an individual in many different contexts. If they have bought their products or services, they will know them as a customer, and their details will be stored in the CRM system. If they are also contracted, however, they will be in the financial system; if they have taken out a subscription, details will be stored in the support department and if for digital products or services, such as connected objects in the internet of things, everything that they do might be tracked somewhere.
This highlights the broader view of data that compliance with GDPR will require businesses to achieve. The emphasis can no longer be on a single department, such as marketing, for example, managing data for its own requirements. Instead, the focus must be on managing all the private data that relates to a customer or an employee across the entire enterprise. That’s clearly a complex undertaking, so how can businesses most effectively go about it?
To see how metadata management can bring clarity to your data landscape and support GPR compliance, watch this webinar on line with demo.
Gaining a Holistic Data View
The first stage of the process is to create full segmentation of your data, or in other words a data taxonomy. At this point, the focus should be on creating a high-level view of the private data that needs to be managed. In the case of GDPR, that’s likely to be some data related to customers and some to employees. Drilling down into the latter, that’s likely to include information about their performance, salary, benefits and even health or family data. High-end business tools might be needed to complete this task, in a business glossary.
The next stage for the business is to assign responsibility for the different data areas. This involves deciding who takes care of employees’ health data, for example, or who looks after their performance details. In parallel, organizations can start to define the foundations of their approach to data policy, something which typically includes outlining their data retention strategy – how long they need to keep certain types of data before archiving or deleting it.
Businesses will, of course, also need to start drilling down into the data more. If they are dealing specifically with identity data, for example, they will need to identify and process all the critical data elements. In the case of identity, that may mean the passport number of the individual, their date of birth, gender, how many children they have and whether they are married, for example.

Businesses will, of course, also need to start drilling down into the data more. If they are dealing specifically with identity data, for example, they will need to identify and process all the critical data elements.
Once this whole process has been undertaken, the business will understand the datasets it needs to control in this context. It doesn’t necessarily know where all this data resides, but it does at least understand what information needs to be managed and what data will need to be considered when a customer asks for information to be changed or deleted. The business may also need to implement technology to connect to the data in order to maintain its quality and ensure it is kept consistently accurate and up-to-date.
Making the Right Connections
When it comes to connecting to the data, businesses will have to carry out a metadata management technique known as ‘stitching,’ which involves connecting the data element in question to the physical system that manages it. If the organization concerned is looking at identity data specifically, they should connect to the HR system but maybe also the payroll system. Beyond that, they might also need to consider that identity data will also be in the recruitment system because before the person in question became an employee, they were a candidate. And they should as well consider the travel and expense management system who might hold sensitive information such as credit card numbers.
In order to ensure compliance, businesses will need to carry out the ‘stitching’ process referenced above. In other words, they will need to make a physical connection to the actual data that they are managing. Some tools are now coming out that enable this process to be carried out semi-automatically. In other words, taking the high-level definition we referenced earlier, they can map directly to the file if the attribute name is identical or alternatively they can connect through the creation of relevant correspondence that helps to make links between the logical high-level data and the physical data. This also means that when a candidate becomes an employee, a data integration project can be run that takes data about the candidate residing in the recruitment system and brings it into the HR system, effectively helping to draw the lineage between the different bits of data.
Foundations in Place
At this point, the business has come a long way in its metadata management journey. It has developed the kind of dynamic mapping that we referenced in our earlier GPS analogy. All the finer-grained data elements have been defined and linked to all the systems that use them and the dependency or relationship between each of the systems has been established. This solid mapping foundation makes it easier to make adjustments further down the line. This means that if the business needs to change the format of its data in any way, using four digits for the year, for example, instead of two, it is far easier to achieve. They can get answers to questions like – where do I have the data first? If I change it in the HR system, what is the impact elsewhere? They can ask – should I change the data integration job that takes the data from the recruitment information or should I just propagate these four digits down to the HR application?
The same principles can apply to data masking. The organization can leverage its mapping and data integration capability to start applying guidelines to the data. They might want to disguise the exact birth date of a given individual within the system, for example, or to avoid the segregation of younger and older candidates in the recruitment system; they might want to mask the date of birth information completely.
As we have seen then, good metadata management is about having a dynamic view of the data. So, to use the GPS analogy once again, you need to be able to see the route to your customer, roughly where their offices are and how long it will take for you to drive there. But you also need to be able to act whenever an exception occurs. Metadata management is not simply about mapping and visualizing the data; it is also about knowing how to act when there is a problem, and it’s about helping to guide that action. Today, after all, the latest GPS systems don’t just tell you that there is a traffic jam, they also suggest another route to take. That’s the same kind of benefit that the business can attain with metadata management – whenever a change or a new regulation is introduced the metadata management tool should guide the business to apply the right action to its data.
Technology Whose Time Has Come
In the past, despite the rapid growth in data volumes affecting multiple industry sectors, the market for metadata management remained largely restricted to banking, financial services and other highly regulated industries. The advent of GDPR and the demands it places on companies of all sizes and all types has raised metadata management up the priority list for all businesses.
What this means will vary from company to company. Some businesses will use existing software to document their data and then focus on keeping records accurate and up-to-date by evolving their systems over time. However, as time goes by and the importance of this approach becomes increasingly clear, more and more companies will opt to commit themselves fully to a metadata management approach and to the growing portfolio of technologies being brought into support it.
To see how you can operationalize your data governance with Metadata Management, MDM and Data Quality, see this webinar on line with demo.
The post Data Mapping Essentials in the GDPR Era appeared first on Talend Real-Time Open Source Data Integration Software.