12. Safety Boundaries and the Permission Model

A coding agent can read and write files, run commands, access the network, and call models. As long as it runs on your machine, it has every capability within the scope of its process permissions. The first step of safety design is honesty: don't treat the prompt as a permission system, and don't treat "the model probably wouldn't do that" as an isolation boundary.

Threat model

Consider at least four categories of risk:

User error: asking the agent to delete files, overwrite changes, or run expensive commands.
Model misjudgment: the model writes a destructive command where a test command was intended.
Prompt injection: text in the repository tricks the model into leaking environment variables or bypassing rules.
Tool vulnerabilities: path escapes, command injection, parallel-write clobbering, secrets leaking into logs.

Different risks call for different boundaries. A prompt can lower the probability of misjudgment, but it cannot stop a malicious tool call. The real boundary sits in front of tool execution.

The permission gate

Every tool call can pass through a permission gate before it runs:

type PermissionDecision =
  | { type: "allow" }
  | { type: "deny"; reason: string }
  | { type: "confirm"; prompt: string };

read is usually allowed, edit may be allowed depending on file state, and bash is confirmed or denied based on how the command is classified. The permission gate must run before a tool call reaches the executor. If the call is denied, it should become an error tool result so the model knows why.

Don't intercept only in the UI. The SDK and JSON mode must go through the same permission gate. Otherwise a user can bypass your safety policy just by switching to a different shell.

Project trust

When the agent enters an unfamiliar repository for the first time, it can ask the user whether to trust the project. In the untrusted state, allow read-only tools while restricting bash, file writes, and reads of sensitive paths. Trust is not a permanent truth; users should be able to revoke it.

Trust policy should be concrete:

Whether the current workspace is trusted.
Whether project scripts may be executed.
Whether network access is allowed.
Whether environment variables may be read.
Whether paths outside the workspace may be written.

Collapsing all of these into a single allow/deny toggle leaves users unable to make fine-grained judgments.

Sandboxes and external boundaries

If you need stronger safety, put tool execution inside a container, a virtual machine, or a remote sandbox. That way, even if the model requests a dangerous command, the damage is confined to the sandbox. A sandbox doesn't have to be part of your first version in this book, but the interface should be reserved for it in advance: tool execution should not be hard-wired to the local filesystem and shell.

You can abstract file access into operations:

type FileOperations = {
  readText(path: string): Promise<string>;
  writeText(path: string, content: string): Promise<void>;
  realpath(path: string): Promise<string>;
};

Local, SSH, container, and remote runtimes can all implement the same interface. The agent kernel doesn't need to know where tools execute.

Logs have safety boundaries too

The session log stores user input, tool results, file fragments, and command output. It may contain secrets, private code, and error stack traces. Consider at least:

Where logs are stored.
Whether they are encrypted.
Whether they are redacted before export.
Whether the user is asked before uploading them to a remote service.
Whether the UI collapses sensitive output by default.

Safety is not just about blocking commands. An agent can avoid executing a single dangerous command and still write the contents of .env into the log or the model context — that's a leak all the same.

Exercises

Implement the permission gate and project trust state.

Acceptance criteria:

In an untrusted project, write tools and bash require confirmation or are denied by default.
A permission denial becomes an isError: true tool result.
The UI, CLI JSON mode, and SDK all invoke the same permission gate.
Reads of likely-sensitive files have a policy: deny, confirm, or redact.
Tool operations go through an interface abstraction that can later be swapped for container or remote execution.