PromptArmor Tests Microsoft Copilot Cowork Approval Gap


TL;DR

  • Security Claim: Enterprise AI security and compliance platform PromptArmor says Copilot Cowork may expose file links through self-directed messages triggered by poisoned workflow content.
  • Approval Model: Microsoft’s public Cowork guidance says sensitive actions require permission, but the reported test path reached the active user without that stop.
  • Tenant Exposure: Existing Microsoft 365 permissions, broad app reach, and recurring tasks could widen the impact in overpermissioned enterprise environments.

Enterprise AI security and compliance platform PromptArmor warns Copilot Cowork may expose downloadable links to files a user already had permission to open when a poisoned workflow sends a message back to that same user. Hidden instructions inside ordinary business content could turn a normal workflow into a file-access path without relying on a user to approve a notably suspicious step.

Enterprise exposure gives the reported flaw more weight than a narrow chat failure. Microsoft made Copilot Cowork available to users on March 10 as a tool built to act across Microsoft 365 data instead of staying inside a simple prompt box. 

Approval Controls Under Pressure

Microsoft’s current public Copilot Cowork guidance points in the opposite direction as PromptArmor suggests. According to Microsoft, Copilot Cowork asks for permission before sensitive actions such as sending email or posting in Teams. PromptArmor alleges those same message actions can still reach the active user without that approval stop, leaving Microsoft’s documented safeguard in direct tension with the reported result.

Microsoft labels medium- and high-risk approvals with a risk-level indicator. Users can also skip future approvals for similar actions inside the same conversation. If PromptArmor’s test path is accurate, the weak point sits where a self-directed Teams or Outlook message reaches the active user before that approval model can do its job.

The reported payload was compact rather than sprawling. In the cited testing, five lines inside an 81-line skill file were enough to poison the sequence, trigger a self-addressed message, and surface a working file link. The attack also succeeded in all five trials, which raises the possibility that the behavior did not depend on one fragile prompt variation or one unusually permissive setup.