AI Agent Threat Matrix

Attack Classes

40 attack classes group related techniques by vulnerability pattern. Each class represents a distinct category of threat to AI agent systems.

Governance (11 classes)

ASSEMBLY-INJECT

0 techniques

Attacks targeting the system prompt assembly process where components combine into exploitable injections

PHANTOM-SOUL

1 techniques

Agent deployed with zero behavioral constraints — no SOUL.md, no system prompt, no governance

T-1006

Detection: SOUL-HB-001, SOUL-HB-002

SOUL-BOUNDARY

1 techniques

Exploiting ambiguous or incomplete constraint definitions to find unguarded actions

T-2008

Detection: SOUL-CB-001, SOUL-CB-002

SOUL-DELEGATE

1 techniques

Exploiting delegation and capability transfer mechanisms to exceed authorized scope

T-4001

Detection: SOUL-DH-001, SOUL-DH-002

SOUL-DRIFT

2 techniques

Gradually displacing safety instructions from the active context through conversation manipulation

T-2004 T-4006

Detection: SOUL-TH-003, SOUL-TH-004

SOUL-FORK

1 techniques

Different behavior under evaluation vs production — agent passes safety tests but behaves differently in deployment

T-5002

Detection: SOUL-AS-001, SOUL-AS-002, SOUL-HT-001, SOUL-HT-002

SOUL-HIJACK

2 techniques

External content achieving full override of agent behavioral constitution

T-4002 T-5002

Detection: SOUL-HO-001, SOUL-HO-002

SOUL-HV

2 techniques

Techniques to bypass agent harm avoidance constraints (4 sub-types)

T-2001 T-2003

Detection: SOUL-HV-001, SOUL-HV-002, SOUL-HV-003, SOUL-HV-004

SOUL-IMPERSONATE

1 techniques

False capability claims exceeding actual authorization level

T-4002

Detection: SOUL-TH-005

SOUL-INJECT

5 techniques

Directly manipulating or overriding the agent's system-level instructions and behavioral boundaries

T-1003 T-2001 T-2003 T-2007 T-2008

Detection: SOUL-IH-001, SOUL-IH-002, PROMPT-001, PROMPT-002, PROMPT-003 +2 more

SOUL-POISON

4 techniques

Malicious instructions injected into governance files at write-time

T-2001 T-2003 T-2007 T-2008

Detection: SOUL-TH-001, SOUL-TH-002

Supply Chain (11 classes)

FAKETOOL-INJECT

0 techniques

MCP tool impersonation, squatting, and schema poisoning attacks

HEARTBEAT-RCE

2 techniques

Exploiting scheduled task or heartbeat mechanisms to achieve persistent code execution

T-6005 T-9003

Detection: HEARTBEAT-001, HEARTBEAT-002, HEARTBEAT-003, HEARTBEAT-004, HEARTBEAT-005 +2 more

MEM-POISON

5 techniques

Injecting malicious entries into agent persistent memory to maintain control across sessions

T-2002 T-3004 T-6001 T-6002 T-7004

Detection: MEM-001, MEM-002, MEM-003, MEM-004, MEM-005 +1 more

ORG-SKILL-SPREAD

4 techniques

Propagating malicious capabilities across an organization's agent fleet through shared skills and registries

T-9002 T-9004 T-9005 T-9006

Detection: SUPPLY-001, SUPPLY-002, SUPPLY-003, SUPPLY-004, DEP-001 +3 more

PERSIST-STATE

0 techniques

Cross-session persistence via memory poisoning, state tampering, and cached context injection

RAG-POISON

2 techniques

Injecting malicious content into retrieval-augmented generation data sources

T-2002 T-7006

Detection: RAG-001, RAG-002, RAG-003, RAG-004

SKILL-EXFIL

6 techniques

Using legitimate tool capabilities for unauthorized data transfer

T-5001 T-7003 T-8001 T-8002 T-8004 T-8006

Detection: SKILL-006, NET-001, NET-002, NET-003

SKILL-FRONTMATTER

3 techniques

Embedding malicious instructions in skill or plugin metadata and description fields

T-2005 T-6004 T-6006

Detection: SKILL-001, SKILL-002, SKILL-004, SKILL-005, SKILL-007 +5 more

SKILL-MEM-AMP

1 techniques

Skill plants payload in agent memory that survives skill uninstall — cross-session persistence

T-3004

Detection: SKILL-MEM-001

SUPPLY-CHAIN-INSTALL

1 techniques

Unsigned installation scripts executed without integrity verification — curl|sh without checksum

T-9003

Detection: INSTALL-001

UNICODE-STEGO

2 techniques

Using invisible Unicode characters, homoglyphs, and encoding tricks to bypass filters

T-2006 T-4005

Detection: UNICODE-STEGO-001, UNICODE-STEGO-002, UNICODE-STEGO-003, UNICODE-STEGO-004, UNICODE-STEGO-005

Infrastructure (10 classes)

A2A-EXPOSE

2 techniques

Agent-to-Agent protocol endpoints publicly discoverable without access control

T-1006 T-5002

Detection: A2A-001, A2A-002

AITOOL-EXPOSE

3 techniques

AI development tools (Jupyter, MLflow, Gradio, Streamlit) exposed without authentication

T-1001 T-1004 T-1005

Detection: AITOOL-001, AITOOL-002, AITOOL-003, AITOOL-004

CODE-INJECTION

2 techniques

Injecting and executing arbitrary code through SQL injection, command injection, or code generation

T-7002 T-9001

Detection: DOCKERINJ-001

GATEWAY-EXPLOIT

1 techniques

Modifying gateway or proxy configurations to intercept, redirect, or manipulate agent traffic

T-6003

Detection: GATEWAY-001, GATEWAY-002, GATEWAY-003, GATEWAY-004, GATEWAY-005 +3 more

INTEGRITY-BYPASS

1 techniques

Digest or hash verification bypass on empty or missing values — tampered artifacts pass silently

T-9001

Detection: INTEGRITY-001

LLM-EXPOSE

2 techniques

LLM inference endpoints exposed without authentication — allows arbitrary prompt execution

T-1001 T-1004

Detection: LLM-001, LLM-002, LLM-003, LLM-004

MCP-EXPLOIT

6 techniques

Attacking Model Context Protocol server configurations, tool registrations, and inter-server trust

T-1002 T-1005 T-4003 T-5003 T-5005 T-5006

Detection: MCP-001, MCP-002, MCP-003, MCP-004, MCP-005 +6 more

PARSER-DIFFERENTIAL

0 techniques

Exploits differences between parser implementations to bypass security controls

RETROACTIVE-PRIV

9 techniques

Exploiting previously granted access or cached credentials to gain unauthorized capabilities

T-1001 T-1004 T-1007 T-3001 T-3003 T-3005 T-3006 T-5004 T-8005

Detection: CRED-001, CRED-002, CRED-003, CRED-004, WEBEXPOSE-001 +4 more

TOCTOU-RACE

1 techniques

Time-of-check-time-of-use race between verification and execution — swap window for attackers

T-9001

Detection: TOCTOU-001

NemoClaw-Specific (5 classes)

NEMO-CRED-LEAK

2 techniques

Unintended exposure of credentials through environment variables, logs, or error messages

T-3002 T-7005

Detection: HMA-NMC-001, HMA-NMC-002, HMA-NMC-003, NEMO-004, NEMO-007

NEMO-NETWORK-EXPOSE

3 techniques

NemoClaw network services bound to public interfaces — gateway, k3s API, mDNS beacons

T-1001 T-1004 T-5001

Detection: HMA-NMC-010, HMA-NMC-011, HMA-NMC-012, HMA-NMC-013, HMA-NMC-014 +1 more

NEMO-OPENCLAW-INHERIT

1 techniques

Inherited OpenClaw flaws that survive NemoClaw sandboxing — heartbeat persistence, pre-allowed APIs

T-1001

Detection: HMA-NMC-040, HMA-NMC-041, HMA-NMC-042, NEMO-010

NEMO-SANDBOX-ESCAPE

1 techniques

Breaking out of agent sandbox restrictions to access the underlying file system or OS

T-7001

Detection: HMA-NMC-030, HMA-NMC-031, HMA-NMC-032, NEMO-003, NEMO-005 +2 more

NEMO-SUPPLY-CHAIN

1 techniques

Compromising upstream dependencies or infrastructure to affect downstream agent deployments

T-9006

Detection: HMA-NMC-020, HMA-NMC-021, HMA-NMC-022, NEMO-001, NEMO-002 +1 more

Identity (2 classes)

AGENT-IMPERSONATE

4 techniques

Impersonating trusted agents or administrative roles to gain unauthorized access

T-1006 T-4002 T-4004 T-5002

Detection: AIM-001, AIM-002, AIM-003

BEHAVIORAL-IMPERSONATE

1 techniques

Using stolen credentials detected via behavioral baseline mismatch — agent DNA forgery

T-4002

Detection: DNA-001, DNA-002, DNA-003

Sandbox (1 classes)

SANDBOX-ESCAPE

1 techniques

General sandbox escape via privileged containers, LSM degradation, or process environment leakage

T-7001

Detection: SANDBOX-005