Limiting Microsoft Copilot Data Exposure

Microsoft Copilot

Microsoft Copilot has been called one of the most powerful productivity tools on the planet.

Copilot is an AI assistant that lives inside each of your Microsoft 365 apps — Word, Excel, PowerPoint, Teams, Outlook, and more. Microsoft’s dream is to take the drudgery out of the day-to-day work and let people focus on being creative problem solvers.

What makes Copilot a different beast than ChatGPT and other AI tools is that it has access to everything you’ve done since 365. Copilot can quickly search and compile data from your documents , presentations, email, calendar, notes, and contacts. .

And therein lies the problem for information security teams. Copilot can access all the sensitive data a user has access to, which is often too much. On average, 10% of a company’s M365 data is open to all employees.

Copilot can also generate fast new net sensitive data that must be protected. Before the AI ​​revolution, people’s ability to create and share data far outstripped the capacity to protect it. Just look at data breach trends. Generative AI pours kerosene on this fire.

There’s a lot to unpack when it comes to generative AI in general: model poisoning, hallucination, deepfakes, and more. In this post, however, I will specifically focus on data security and how your team can ensure a secure Copilot rollout .

Microsoft 365 Copilot use cases

The use cases of generative AI with a collaboration suite like M365 are limitless. It’s easy to see why many IT and security teams are clamoring to get early access and prepare their launch plans. Productivity increases can be huge.

For example, you can open a blank Word document and ask Copilot to create a proposal for a client based on a target data set that may include OneNote pages, PowerPoint decks, and other documents in office. In a few seconds, you will have a complete proposal.

Microsoft Copilot

Here are some more examples that Microsoft has given in their time launch event:

  • Copilot can join your Teams meetings and summarize in real time what was discussed, get action items, and tell you which questions were not resolved in the meeting.
  • Copilot in Outlook helps you test your inbox, prioritize emails, summarize threads, and generate responses for you.
  • Copilot in Excel can analyze raw data and provide you with insights, trends, and suggestions.

How Microsoft 365 Copilot works

Here’s a simple overview of how to process a Copilot prompt:

  • A user inputs a prompt into an app such as Word, Outlook, or PowerPoint.
  • Microsoft gathers the user’s business context based on their M365 permissions.
  • The prompt is sent to LLM (like GPT4) to generate a response.
  • Microsoft performs post-processing responsible AI checks.
  • Microsoft made a response and ordered back the M365 app.

The security model of Microsoft 365 Copilot

With Microsoft, there is always a great tension between productivity and security.

This was demonstrated during the coronavirus when IT teams quickly deployed Microsoft Teams without first fully understanding how the underlying security model worked or what M365 permissions, groups, and their organization’s link policy.

The good news:

  • Tenant isolation. Copilot only uses data from the current user of the M365 tenant. The AI ​​tool will not display data from other tenants that the user may be visiting, or any tenants that may be set up using cross-tenant sync.
  • Training boundaries. Copilot not use any data in your business to train the basic LLM that Copilot uses for all tenants. you should not need to worry about your proprietary data showing other users’ responses to other tenants.

The bad news:

  • Permissions. Copilot displays all organizational data that individual users have at least permission to view.
  • Labels. Copilot-generated content not inherits the MPIP labels of the Copilot source files in its response.
  • People. Copilot responses are not guaranteed to be 100% accurate or secure; humans should take responsibility for reviewing AI-generated content.

We all have bad news.


Giving Copilot access to only what a user can access is a good idea if companies can easily implement least privilege in Microsoft 365.

Microsoft stated in its Copy of data security documentation:

“It’s important that you use the permission models available in Microsoft 365 services, such as SharePoint, to help ensure that the right users or groups have the right access to the right content within your organization .”

Source: Data, Privacy, and Security for Microsoft 365 Copilot

We know empirically, however, that most organizations are as close to least privilege as possible. Just look at some of the statistics from Microsoft itself State of Cloud Permissions Risk report.

Microsoft Copilot

This picture matches what Varonis sees when we do thousands of Data Risk Assessments for companies using Microsoft 365 every year. In our report, The Big SaaS Data Exposurewe found that a typical M365 tenant has:

  • 40+ million unique permissions
  • 113K+ sensitive records shared with the public
  • 27K+ sharing links

Why does this happen? Microsoft 365 permissions are more complex. Just think of all the ways in which a user can gain access to data:

  • Direct user consent
  • Microsoft 365 group permissions
  • SharePoint local permissions (with custom levels)
  • Guest access
  • External access
  • Public access
  • Link access (anyone, org-wide, direct, guest)

Even worse, permissions are often in the hands of end users, not the IT or security team.


Microsoft relies heavily on sensitivity labels to enforce DLP policies, apply encryption, and largely prevent data leaks. In practice, however, get labels to work hard, especially when you’re relying on people to apply sensitivity labels.

Microsoft paints a rosy picture of labeling and blocking as the ultimate safety net for your data. The reality reveals a much darker scenario. As people create data, labeling often lags behind or becomes obsolete.

Blocking or encrypting data can add friction to workflows, and labeling technologies are limited to specific file types. The more labels an organization has, the more confusing it is for users. This is especially severe for large organizations.

The effectiveness of label-based data protection will surely be undermined if we have AI that produces orders of magnitude large amounts of data that require accurate and automatic updating of labels.

Are my labels okay?

Varonis can validate and improve an organization’s Microsoft sensitivity labeling by scanning, discovering, and remediating:

  • Sensitive files without labels
  • Sensitive files with incorrect labels
  • Non-sensitive files with a sensitive label


AI can make people lazy. The content produced by LLMs like GPT4 isn’t just good, it’s great. In many cases, speed and quality are more than one person can handle. As a result, people are starting to trust AI to make safe and accurate answers.

We’ve seen real-world scenarios where Copilot drafts a proposal for a client and includes sensitive data belonging to a completely different client. The user hits “send” after a quick look (or no look), and now you have a privacy or data breach scenario on your hands.

Prepare your tenant security for Copilot

It is important to have a sense of your data security posture HISTORY your Copilot rollout. Now that Copilot is generally available, it’s a good time to set up your security controls.

Varonis protects thousands of Microsoft 365 customers with our Data Security Platform, which provides a real-time view of risk and the ability to automatically enforce least privilege.

We can help you solve Copilot’s biggest security risks with virtually no manual effort. With Varonis for Microsoft 365, you can:

  • Automatically discover and classify all sensitive AI-generated content.
  • Automatically ensure that MPIP labels are used correctly.
  • Automatically enforce least privilege permissions.
  • Continuously monitor sensitive data on M365 and alert and respond to abnormal behavior.

The best way to start is a free risk assessment. It takes minutes to set up and within a day or two, you’ll have a real-time view of sensitive risk data.

it article originally appeared on the Varonis blog.

Did you find this article interesting? Follow us on Twitter and LinkedIn to read more exclusive content we post.

Leave a comment