Protect Your Data / Protect Your Company

Published by admin on

Data Privacy is a serious topic around the globe.  As a data scientist, we do put data security in the top priority.

For sensitive data, it is vital to have proper protection like encryption and masking.   We would like to share some of the answers in the market.

  1. Data Masking Tools
  2. Data Virtualization

There are different needs for masking sensitive personal data including Excel files to relational data or even Big Data Hadoop.  We are helping different online and offline retailers to protect personal data like contact information with these solutions.

Data Masking Tools

There are lots of security tools providing data encryption features.  For instance, IRI (a software company with 40 years in US) is producing different masking products like FieldShield to de-indentifies data subject to CIPSEA, DPA, FERPA, GDPR, GLBA, HIPAA, PCI, POPI, etc.  In Hong Kong, it is a good exercise to map out capabilities to the requirements of the Hong Kong PDPO and it is also a perfect fit to protect data at column level.

Figure 1. Data Masking Example

Data Masking

In short, data masking is to encrypt the sensitive data as something others not able to understand.  For the IRI tool nowadays, it is not only able to mask database columns but being possible to apply to Excel files.  The Excel column encryption features of IRI is still a unique one across the market (as of 13 Aug 2019).

Data Virtualization

Data Virtualization is becoming much popular nowadays.  It is a very good choice for solving data silos problem.  However, with the nature of data virtualization, the data access control is one of the key area for a data virtualization.  For CDS team, it is recommended to have data virtualization for more than 3 different data sources.

There are several vital areas:

  1. Multi-level access controls across different database, view, row , column, and cell.
  2. Policy-based security and Workload Management (enforcement of customer policies for query execution according to security & workload considerations)
  3. Acting as a single point for accessing all the information avoiding point-to-point connections to sources.

Figure 2. Data Virtualization Example

Denodo-Data Virtualization

Other Security Solutions available

Apart from data masking / data virtualization, there are additional tools to protect data like database security and compliance for the Enterprise.  It is suggested to have a “firewall” or “proxy” like security application to protect vital database or other repository like HBase.

It is suggested to have the features below:

  • Control database access for defined users and groups (on top of the database engine)
  • Approve SQL activity (statement and workflow) using black / white lists
  • Monitor DB systems with / without having to connect to the DB
  • Log, report, and dashboard PII access, activity and alert details
  • Perform selective dynamic data masking (DDM)
  • Detect and protect against log alteration/forgery
  • Comply with Hong Kong and international data privacy law

Figure 3. Database Proxy Example

IRI Database proxy example

Conclusion

To sum up, it is now a difficult challenge for protecting huge volume of data during the analytic process.  There are different types of data with higher concerns on data security such as personal identity, medical information for individual patients, etc.  Thus, it is better to protect your company data collection today rather than your company being shown in the newspaper after data breach issue happened in your company.