Threat Modeling Agentic Systems

Why this matters

Agentic systems are not just chat interfaces with better UX. They combine four things that rarely sit together in ordinary software: model reasoning, external content, tool execution, and often access to internal context. That changes the risk model.

A normal assistant can say something wrong. An agentic system can say something wrong and then act on it. The useful unit to threat-model is therefore not the model alone, but the entire chain:

input source
prompt and context assembly
model reasoning
tool selection
execution
output or side effect

If you only evaluate model quality, you will miss the highest-impact failures.

Where teams usually get this wrong

threat-modeling the model provider but not the workflow around it
mixing trusted instructions and untrusted content into one context blob
exposing write-capable tools too early because they are convenient for demos
assuming logs exist when they only capture partial activity
treating rollback as optional because the workflow "usually just assists"

Minimal technical model

A basic agentic system usually looks like this:

1. user request arrives

2. system prompt and policy are assembled

3. extra context is retrieved from internal or external sources

4. model decides whether to call a tool

5. tool returns output or performs an action

6. result is shown to the user or committed elsewhere

The trust boundaries sit between these steps.

The most important one is usually this:

> untrusted content must not gain the same authority as system instructions or approval logic.

Bad vs better design

Bad

A browser page, retrieved document, and internal policy are all concatenated into one prompt. The model is then free to choose tools based on that combined context.

Better

System policy, user intent, and untrusted content are handled as separate logical inputs. Tool-calling logic is constrained, and high-impact actions require confirmation outside the same untrusted context path.

What to model first

Start with the smallest useful question:

What can this system read, what can it do, and what can influence its decisions?

That immediately gives you the first threat-model surface.

1. Inputs

List every input class:

direct user prompts
uploaded files
retrieved internal documents
web pages
email, chat, tickets, CRM notes
tool outputs from previous steps

Then separate them into:

trusted instructions
semi-trusted internal context
untrusted external content

The biggest failure pattern is instruction/data confusion. A page, document, or email should not quietly become a new source of authority.

2. Actions

Write down the real action surface, not the marketing description.

Examples:

search the web
send messages
modify tickets
create documents
execute commands
open browser sessions
trigger API calls
approve or reject flows

For each action, ask:

does it have side effects?
is it reversible?
does it touch sensitive systems?
does it require confirmation?

3. Sensitive context

Identify what sensitive context can be exposed to the agent:

secrets and tokens
customer data
internal docs
auth sessions
privileged search results
production environment details

This matters because many failures in AI systems are not dramatic model exploits. They are quiet context leaks.

The core control areas

Instruction and data separation

The system must distinguish between:

the user asking for a task
internal policy or system rules
external content that may contain malicious instructions

If this line is fuzzy, prompt injection becomes a design flaw, not an edge case.

Permissions and least privilege

Do not give the agent broad access just because it is convenient during development.

A good rule:

> if the agent does not need a capability every day, it probably should not have it by default.

Approval gates

High-impact actions should not happen silently.

Examples of actions that often deserve approval:

sending external communications
editing production data
accessing privileged environments
writing into systems of record
actions with financial or compliance implications

Domain and action boundaries

Browser and tool-using agents should have clear limits.

Examples:

which domains can be visited
which APIs can be called
which actions are read-only vs write
which sessions or credentials may be used

Auditability

You need enough logs to answer:

what did the system see?
what did it decide?
what did it do?
what changed?

Not full chain-of-thought fantasy logs—just enough operational evidence to investigate misuse, drift, or failure.

Mini-scenarios

Scenario 1: Internal support assistant with ticket update access

The assistant can read tickets, summarize attached logs, and update ticket state. On paper this looks harmless. In practice it means external customer content, retrieved internal notes, and write access now sit in the same workflow.

What to review:

whether customer text can reshape the assistant's task
whether write actions require confirmation
whether internal notes can leak into customer-facing output

Scenario 2: Browser agent for internal ops tasks

A browser agent can navigate an authenticated admin workflow faster than a human. It can also click the wrong thing faster than a human.

What to review:

session scope
allowed domains and actions
approval points before high-impact changes
logging strong enough to reconstruct what happened

A practical review flow

For a first-pass review, do this in order:

1. list the inputs

2. list the actions

3. mark which inputs are untrusted

4. mark which actions have irreversible or sensitive side effects

5. verify approval gates

6. verify logging and rollback

7. test one malicious or ambiguous input path

That is enough to find most embarrassing failures early.

Minimum viable standard

Before an agentic workflow should be trusted in a meaningful environment, a team should be able to answer:

what inputs are untrusted
what actions have side effects
what permissions the system holds
what approval gates exist
who owns the workflow
how the workflow is disabled or rolled back

If a team can describe the model but cannot describe the action boundaries, the system is not ready.

Related Daily Briefs

Daily brief

AI Security Signal Brief — 2026-03-14

The practical decision is no longer whether AI belongs in security workflows at all. It is where it creates enough leverage to justify real controls, review, and ownership.

Read the full brief

Daily brief

AI Security Signal Brief — 2026-03-15

AI risk is increasingly a system-design problem, not just a model-safety problem. If an agent can read untrusted content and take action, it needs explicit boundaries.

Read the full brief

Daily brief

AI Security Signal Brief — 2026-03-16

The useful signal today is concrete: AI risk becomes easier to manage when teams review workflows through permissions, approvals, and data boundaries instead of treating governance as a policy-only exercise.

Read the full brief