Threat Modeling Agentic Systems

Why this matters

Agentic systems are not just chat interfaces with better UX. They combine four things that rarely sit together in ordinary software: model reasoning, external content, tool execution, and often access to internal context. That changes the risk model.

A normal assistant can say something wrong. An agentic system can say something wrong and then act on it. The useful unit to threat-model is therefore not the model alone, but the entire chain:

If you only evaluate model quality, you will miss the highest-impact failures.

Where teams usually get this wrong

Minimal technical model

A basic agentic system usually looks like this:

1. user request arrives

2. system prompt and policy are assembled

3. extra context is retrieved from internal or external sources

4. model decides whether to call a tool

5. tool returns output or performs an action

6. result is shown to the user or committed elsewhere

The trust boundaries sit between these steps.

The most important one is usually this:

> untrusted content must not gain the same authority as system instructions or approval logic.

Bad vs better design

Bad

A browser page, retrieved document, and internal policy are all concatenated into one prompt. The model is then free to choose tools based on that combined context.

Better

System policy, user intent, and untrusted content are handled as separate logical inputs. Tool-calling logic is constrained, and high-impact actions require confirmation outside the same untrusted context path.

What to model first

Start with the smallest useful question:

What can this system read, what can it do, and what can influence its decisions?

That immediately gives you the first threat-model surface.

1. Inputs

List every input class:

Then separate them into:

The biggest failure pattern is instruction/data confusion. A page, document, or email should not quietly become a new source of authority.

2. Actions

Write down the real action surface, not the marketing description.

Examples:

For each action, ask:

3. Sensitive context

Identify what sensitive context can be exposed to the agent:

This matters because many failures in AI systems are not dramatic model exploits. They are quiet context leaks.

The core control areas

Instruction and data separation

The system must distinguish between:

If this line is fuzzy, prompt injection becomes a design flaw, not an edge case.

Permissions and least privilege

Do not give the agent broad access just because it is convenient during development.

A good rule:

> if the agent does not need a capability every day, it probably should not have it by default.

Approval gates

High-impact actions should not happen silently.

Examples of actions that often deserve approval:

Domain and action boundaries

Browser and tool-using agents should have clear limits.

Examples:

Auditability

You need enough logs to answer:

Not full chain-of-thought fantasy logs—just enough operational evidence to investigate misuse, drift, or failure.

Mini-scenarios

Scenario 1: Internal support assistant with ticket update access

The assistant can read tickets, summarize attached logs, and update ticket state. On paper this looks harmless. In practice it means external customer content, retrieved internal notes, and write access now sit in the same workflow.

What to review:

Scenario 2: Browser agent for internal ops tasks

A browser agent can navigate an authenticated admin workflow faster than a human. It can also click the wrong thing faster than a human.

What to review:

A practical review flow

For a first-pass review, do this in order:

1. list the inputs

2. list the actions

3. mark which inputs are untrusted

4. mark which actions have irreversible or sensitive side effects

5. verify approval gates

6. verify logging and rollback

7. test one malicious or ambiguous input path

That is enough to find most embarrassing failures early.

Minimum viable standard

Before an agentic workflow should be trusted in a meaningful environment, a team should be able to answer:

If a team can describe the model but cannot describe the action boundaries, the system is not ready.

Related Daily Briefs

Daily brief

AI Security Signal Brief — 2026-03-14

The practical decision is no longer whether AI belongs in security workflows at all. It is where it creates enough leverage to justify real controls, review, and ownership.

Read the full brief

Daily brief

AI Security Signal Brief — 2026-03-15

AI risk is increasingly a system-design problem, not just a model-safety problem. If an agent can read untrusted content and take action, it needs explicit boundaries.

Read the full brief

Daily brief

AI Security Signal Brief — 2026-03-16

The useful signal today is concrete: AI risk becomes easier to manage when teams review workflows through permissions, approvals, and data boundaries instead of treating governance as a policy-only exercise.

Read the full brief