Data Governance and Protection Using Microsoft Purview

By Kevin Dillaway on October 26, 2022

How to make the most out of the Microsoft licensing to secure your environment – Part 3

In this multi-part blog series, we are walking you through the best ways to take on certain challenges and be successful in implementing your M365 licensing while increasing your ability to secure the environment. Over the past couple of months, we have covered:

Over the next few months, we will cover:

Dealing with Data Protection and Visibility (this session)
Managing the Endpoints
Deploying Defender solutions
Extending security across SaaS Applications

While Part 1 (How to Start) was focused on tackling the foundational components of tenant configuration and Part 2 (Identity) went into the tactical Microsoft components for Identity, this Part 3 will dive into Microsoft capabilities for securing and governing data across the environment. Data Protection and Identity are the underpinnings for securing your Microsoft and other provider’s cloud environments.

Take a Phased Approach

The biggest issue that companies have when it comes to trying to protect data, is that they try to do everything at once. Spyglass has found that taking the more programmatic approach of phasing in pieces of an overall solution over time works better than trying to institute everything at the same time for all kinds of data. This also allows for more time with some of the solutions to make appropriate tweaks based on additional available data. The topics below help to outline the order in which many customers have had success.

Discovery

The first step in any project should be to perform a discovery or assessment of what is actually in the environment. The output of the data discovery needs to prioritize certain types of data for two things:

What are the most critical types of data in the environment?
- How do I identify that data?
- Where does and should that data reside?
- Who has access to that data?
- Are there any specific regulatory or corporate requirements for the data?
What are the specific use cases that users have for both internal and external sharing of data?

Based on the answers to these questions, a plan can be put in place to make sure that the highest priority components of the overall Data Governance and Protection strategy are taken care of first, and that they target the right people, the right data, and have the right controls on them.

App Protection Policies

App Protection Policies are a component of Mobile Application Management (MAM) (which will be discussed in further detail in the endpoint management piece of this blog series). The policies focus on containerizing the data associated with corporate credentials on mobile applications. This prevents a user’s personal data from mixing with their corporate data within a specific app. A specific example of this would be a user who leverages Outlook to access both their personal and work email. Once a policy has been configured to cover Outlook, the two accounts within the application are separated and controls can be put in place to prevent actions from being taken with the corporate data.

The policies can also target requiring the user to “authenticate” into the app by leveraging PIN or biometrics, so that just because the phone may be unlocked, the user still needs to “login” to the application. Once the policy applies, data within a corporate location is often blocked from being copied or printed on the device unless it is another corporate location or application. Additionally, the policies can require minimum levels of an OS, look for jailbroken or rooted devices, and automatically wipe corporate application data along with other capabilities as well.

The key component of MAM versus Mobile Device Management (MDM) is that MAM does not require the enrollment of the device. This means that it can be deployed quicker and require less “oversight” of the user’s device. Because the capabilities of MAM are so broad, it is often one of the first places to start since it begins to cover all corporate data as soon as the policy is applied.

Data Loss Prevention

Once data across the mobile devices is being protected, focus should shift to determining the types of sensitive information and its proliferation across the environment. To do this, Data Loss Prevention (DLP) is leveraged to potentially do two overarching things:

Provide visibility into where and how often sensitive information is interacted with or exchanged in the environment.
Allow for controls, alerts, and tips to be associated with finding the Sensitive Information Types (SITs).

DLP starts with SITs. Once the SITs that need to be tracked are defined, policies can be created, and logic set up to find the data. Then, the policies are targeted to the specific locations and/or users that interact with the SITs. The SITs can be based on:

Templates provided by Microsoft (Financial, Medical, Privacy, etc.)
Custom templates based on combination of:
- Functions from Microsoft
- Keyword lists
- Keyword dictionaries
- Regular Expressions (RegEx)
Exact Data Matching (EDM)
Trainable Classifiers
Document Fingerprinting

Once the SITs are defined, the DLP policies leverage rule sets to dictate the number of matches required for triggering as well as what to do once the trigger is met. These actions could include:

Encryption
Blocking
Alerting
Notifying the user that it contains potential SIT

When the policies and rules are first created, they should be placed in “Test Mode” so that they will start to report on what is being triggered and discovered. This information then needs to be used to fine tune the policies so that it can become an amount that will be readily handled by a helpdesk or other support organization. Once the policies are fine-tuned, they can be enabled for users.

Retention

The second phase of governance, after you know what sensitive data is out there, is to start limiting how long the data remains in the environment. In some cases, this will focus on making sure the data remains searchable for specific amounts of time for regulatory reasons, while in other cases, the goal is to make sure the data does not stay longer than needed and is purged from the environment.

Part of the governance plan is to take the types of data in the environment and determine the appropriate lifecycle of that data. That could be based on classification, content, location, or other attributes of the data. As those are completed, Retention policies can be created that then put those documented timelines in place.

It is possible to also apply a retention policy as a Label. While a typical policy will only apply to a location (so when the file is moved to a new location, it would get the retention label of the new location) a Label has the policy stay with the data, so no matter where it goes, it will always be available for the time dictated. Additionally, it is possible to leverage a retention label to make a document a “Record”. Once this occurs, the document is immutable to change except for certain highly permissioned roles and the owner of the document.

Classification (Labelling)

Building on both the DLP and Retention Policies, Classifciation or Sensitivity Labels actually add meta data to data objects. This meta data can be used to create new SITs or be the trigger for specific retention policies. Because the labels are associated directly with the file, additional characteristics of the file can be added, such as:

Headers or Footers in certain kinds of documents
Watermarks in certain kinds of documents
Encryption
Permissions
Offline usage

Once the labels are created and configured, they can published through a policy to users. Users can then manually apply the appropriate label. Once a file is labelled, it can be searched for by that label and policies outside of the label itself can then be used to help control what happens to the file. Any controls within the file itself will stay with the file even when it leaves the company.

It is possible to perform auto-labelling as well with the proper licensing. This allows for the ability to leverage a defined SIT (built-in or custom) to dictate what label should be applied.

Extending to On-Premises and SaaS

One of the benefits of having some of the more advanced licensing is the ability to extend the data governance and protection capabilities of Purview beyond just the M365 suite. The two main aspects of this include classification and DLP.

DLP for on-premises is handled by the Microsoft Defender for Endpoint engine locally on end user devices. The DLP engine will focus on two things:

Performing locally based scans of files and documents for anything that matches the published DLP rules and policies that were published.
Applying any Endpoint specific policies that were created within the Compliance Portal. These policies will control things like:
1. URLs that should not have SITs moved to
2. Bluetooth apps that are not allowed
3. Browser controls
4. File path exclusions
5. Auditing file activities

Microsoft continues to add new features in the space, and it is becoming an integral part of the DLP ecosystem.

Classification can also be expanded to on-premises by leveraging scanners, which will then go through and auto-label based on the same criteria set for the cloud-based documents. This will work for any SMB based file share in the environment.

Microsoft also offers the extension of the DLP and Labelling controls into SaaS applications through Defender for Cloud Apps. This solution can inspect traffic that is going to and from the SaaS applications and look for either SITs or labels. When a policy is triggered, it can then be programmed into taking certain actions, which may include blocking the activity, just auditing the activity, or an action in between. This will be expanded upon in a further blog centered around the Defender for Cloud Apps solution capabilities.

Structured vs. Unstructured Data

Purview contains a solution called Purview Data Map. This solution greatly expands the capabilities of Purview into other types of data. Historically, Purview has focused on the governance and protection of unstructured data (file shares, E-mail, document libraries, chats, messages, Teams files, SaaS applications, etc.) but with Purview Data Map, it is possible to now include structured data such as databases. There are currently a number of supported and published connectors that allow for the inclusion of unstructured data in Azure, AWS, Google, Oracle, SAP, Salesforce, and other solutions. Once a data source is added, it can then scan the data source and if supported allow for:

Glossary terms to be applied
Classification of the data
Lineage information on how the data is related
Labelling of the data

Not all data sources support all capabilities, but the support is continuing to improve. The expansion into this space allows for a single set of data governance policies and controls to be leveraged across all data sources, not just the unstructured data.

Insider Risk Management

Once all the other controls and policies are in place, the focus should shift not on protecting things from the outside, but what can be done about threats on the inside. This includes creating DLP policies that target internal traffic, but it can also include the Microsoft solution called Insider Risk Management (IRM). This tool offers a way to take the telemetry from what has already been set up to create baselines of how users work. When it starts to see variations including in even how much profanity a user leverages in chat messages, it adjusts a risk score. When the risk score reaches a particular set threshold, it will notify an administrator to perform an investigation.

IRM can also leverage additional sources of information including HR systems and badging access systems that tracks where a user is going within a building. Both sources can be used to set an initial baseline for a user.

Although companies would like to trust all people within the organization, IRM helps to determine if that trust is warranted. The reality is that the people already in the company are often the biggest threat to sensitive data and data exfiltration. This could be for a number of reasons, but it is a growing industry concern and should be the next major focus for data governance efforts.