Safety Instruction Displacement

Displace safety instructions from the active context to remove guardrails

Tactic

Privilege Escalation · Stage 4

Escalate capabilities beyond declared scope or bypass authorization

Attack class

SOUL-DRIFT

Gradually displacing safety instructions from the active context through conversation manipulation

Evidence grade

Validated

Reproduced in a controlled lab environment (DVAA) with documented steps.

DVAA validation

L2-07

Reproductions in Damn Vulnerable AI Agent, the OpenA2A intentionally-broken agent for kill-chain validation.

Honeypot

AgentPwn coverage

Live

context-windowagentpwn.com/learn ↗

An AgentPwn trap page produces a payload tagged with this technique class. Following the AgentPwn taxonomy of trap pages shows what an agent encounters.

Instruction-displacement tiers evict safety instructions from active context.

Detect

Detection · HackMyAgent

Live2 live · 0 queued

PROMPT-001PROMPT-002

npx hackmyagent secure --ciLive = implemented in hackmyagent; queued = declared

Defend

Defense · OASB controls

Live5 live · 0 queued

OASB 2.1 OASB 2.2 OASB 2.3 OASB 2.4 OASB 2.5

Live = documented at oasb.ai; queued = declared

Reference

How to cite

AI Agent Threat Matrix T-4006 (Safety Instruction Displacement). OpenA2A, 2026. https://threats.opena2a.org/techniques/T-4006

← Back to the matrix