System Prompt Boundary Bypass

Exploit weak boundaries between system and user prompts to override system-level instructions

Tactic

Initial Access · Stage 2

Gain control over agent behavior through prompt manipulation or input exploitation

Attack class

SOUL-INJECT

Directly manipulating or overriding the agent's system-level instructions and behavioral boundaries

Evidence grade

Validated

Reproduced in a controlled lab environment (DVAA) with documented steps.

DVAA validation

L3-04

Reproductions in Damn Vulnerable AI Agent, the OpenA2A intentionally-broken agent for kill-chain validation.

Honeypot

AgentPwn coverage

Live

prompt-injectionagentpwn.com/learn ↗

An AgentPwn trap page produces a payload tagged with this technique class. Following the AgentPwn taxonomy of trap pages shows what an agent encounters.

Delimiter-escape tiers exploit weak system/user prompt boundaries.

Detect

Detection · HackMyAgent

Live1 live · 0 queued

SOUL-OVERRIDE-001

npx hackmyagent secure --ciLive = implemented in hackmyagent; queued = declared

Defend

Defense · OASB controls

Live5 live · 0 queued

OASB 3.1 OASB 3.2 OASB 3.3 OASB 3.4 OASB 3.5

Live = documented at oasb.ai; queued = declared

Reference

How to cite

AI Agent Threat Matrix T-2008 (System Prompt Boundary Bypass). OpenA2A, 2026. https://threats.opena2a.org/techniques/T-2008

← Back to the matrix