medium advisory January 9, 2024

M365 Copilot Impersonation Jailbreak Attack

This detection identifies attempts to jailbreak M365 Copilot by impersonating roles, adopting unrestricted personas, or mimicking malicious AI systems to bypass safety controls, searching exported eDiscovery prompt logs for roleplay keywords and categorizing prompts into impersonation types to detect persona injection attacks.

This threat brief covers attempts to jailbreak Microsoft 365 Copilot through prompt injection, specifically focusing on impersonation and roleplay attacks. Attackers attempt to manipulate the AI into adopting alternate personas, behaving as unrestricted entities, or impersonating malicious AI systems. The activity is detected by analyzing exported eDiscovery prompt logs, searching for specific keywords related to roleplaying and impersonation. This technique, observed starting in late 2025 and early 2026, is concerning because successful jailbreaks can bypass safety controls, leading to potential data leakage, policy violations, and the generation of harmful content. The focus of targeting is organizations leveraging Microsoft 365 Copilot for enterprise productivity.

Attack Chain

The attacker crafts a malicious prompt containing keywords like “pretend you are,” “act as,” “you are now,” “amoral,” “roleplay as,” or “imagine you are.”
The crafted prompt is submitted to Microsoft 365 Copilot through a standard user interaction.
The prompt is logged by Microsoft 365 and available for eDiscovery.
An administrator exports the M365 eDiscovery prompt logs from the Microsoft Purview compliance portal.
The exported logs, including the Subject_Title field containing the prompt text, are ingested into a security information and event management (SIEM) system.
A detection rule identifies prompts containing the specified keywords.
The rule categorizes the prompt based on the specific keywords used, such as “AI_Impersonation,” “Malicious_AI_Persona,” or “Unrestricted_AI_Persona.”
If the jailbreak attempt is successful, the AI may generate responses that violate organizational policies or expose sensitive data.

Impact

A successful M365 Copilot jailbreak can result in the AI generating harmful or inappropriate content, bypassing security controls, and potentially leaking sensitive information. While the exact number of affected organizations is currently unknown, the potential impact spans across any sector utilizing M365 Copilot. Consequences include reputational damage, data breaches, and compliance violations.

Recommendation

Enable and regularly review M365 Exported eDiscovery Prompts logs for suspicious activity as this log source is critical for detecting jailbreak attempts.
Deploy the provided Sigma rule to your SIEM to detect M365 Copilot impersonation and roleplay jailbreak attempts.
Tune the provided Sigma rule using the m365_copilot_impersonation_jailbreak_attack_filter macro to reduce false positives based on your organization’s specific usage patterns.
Investigate any alerts generated by the Sigma rule, focusing on the user and impersonation_type fields to understand the nature and source of the attempted jailbreak.

Detection coverage 3

Detect M365 Copilot AI Impersonation Jailbreak Attempt

medium

Detects attempts to jailbreak M365 Copilot by prompting it to impersonate other AI systems.

sigma tactics: defense_evasion sources: webserver, windows

Detect M365 Copilot Unrestricted AI Persona Jailbreak Attempt

medium

Detects attempts to jailbreak M365 Copilot by prompting it to adopt an unrestricted or uncensored persona.

sigma tactics: defense_evasion sources: webserver, windows

Detect M365 Copilot Malicious AI Persona Jailbreak Attempt

medium

Detects attempts to jailbreak M365 Copilot by prompting it to adopt a malicious or harmful persona.

sigma tactics: defense_evasion sources: webserver, windows

Detection queries are kept inside the platform. Get full rules →