Prompt Injection & Tool Abuse Checklist

Why this checklist exists

Prompt injection is often described like a strange model trick. In practice, it is usually much simpler: the system gives external content too much influence over what the agent believes, prioritizes, or does.

Tool abuse is the next step. Once the model can choose tools, a bad decision is no longer just a bad sentence. It can become a bad action.

This checklist is meant for real systems, not abstract demos.

The principle

Do not let untrusted content sit on the same authority path as tool selection or high-impact actions.

If an agent can read untrusted input, pull in sensitive context, and use tools in the same loop, you already have a practical abuse surface.

Where teams usually get this wrong

assuming prompt injection is only a chat UI problem
letting the same workflow browse and act with no approval break
exposing too many tools “just for flexibility”
leaving internal retrieval unbounded because it helps answer quality
treating logs as an afterthought

Bad vs better design

Bad

The application concatenates user request, system text, retrieved internal data, and page content into one prompt. The model can then both interpret and act from the same mixed context.

Better

The system keeps separate authority zones for:

system policy
user intent
internal context
untrusted external content

Tool permissions and approvals are enforced outside the same untrusted-content path.

Review checklist

1. External content is treated as untrusted data

Check whether the system clearly separates:

user intent
system instructions
retrieved internal context
external content such as pages, files, emails, or messages

If these are merged into one blob, you are creating a prompt-injection surface by design.

2. The agent cannot silently execute sensitive actions

If the system can browse, retrieve, and act in one chain, the highest-risk actions should be gated.

At minimum, review:

sending messages
writing to systems of record
changing account state
running commands
using privileged APIs

3. Tool permissions are scoped to the minimum needed

Avoid broad tool exposure.

A useful habit is to classify tools into:

read-only
low-risk write
high-risk write
external side effects

Then remove anything the workflow does not actually need.

4. Retrieved internal data is bounded

Ask:

what internal data stores can enter context?
what filters exist?
can the system over-retrieve?
can it leak sensitive snippets into output?

Prompt injection gets worse when sensitive context is freely retrievable.

5. Browser or web-agent actions are domain-scoped where possible

If a browser agent can navigate anywhere, click, authenticate, and submit, the attack surface grows quickly.

Prefer explicit constraints on:

allowed domains
allowed action types
session scope
stored credentials

6. Logs are useful without leaking secrets

You need evidence about tool selection and actions, but not raw secrets in logs.

A good log helps answer:

what source influenced the agent?
which tool was called?
what action was attempted?
did an approval gate fire?

Mini-scenarios

Scenario 1: Summarize-then-send workflow

A team builds an assistant that reads inbound emails, drafts summaries, and can optionally send replies.

What to test:

whether hostile email text can steer the draft or trigger unwanted follow-up actions
whether send capability is approval-gated
whether internal retrieved context can leak into outbound text

Scenario 2: Browser assistant with internal document access

The browser agent reads external pages while also having access to internal docs or authenticated sessions.

What to test:

whether page content can change the action plan
whether internal context is retrievable too broadly
whether browser actions are constrained by domain and approval

Minimum viable standard

If a workflow reads untrusted content and can call tools, you need:

instruction/data separation
scoped permissions
approval for high-impact actions
logging
an owner for the workflow

Bottom line

Anything less is not “an early version.” It is optimistic automation with unclear boundaries.

Related Daily Briefs

Daily brief

AI Security Signal Brief — 2026-03-14

The practical decision is no longer whether AI belongs in security workflows at all. It is where it creates enough leverage to justify real controls, review, and ownership.

Read the full brief

Daily brief

AI Security Signal Brief — 2026-03-15

AI risk is increasingly a system-design problem, not just a model-safety problem. If an agent can read untrusted content and take action, it needs explicit boundaries.

Read the full brief

Daily brief

AI Security Signal Brief — 2026-03-16

The useful signal today is concrete: AI risk becomes easier to manage when teams review workflows through permissions, approvals, and data boundaries instead of treating governance as a policy-only exercise.

Read the full brief