09. Context Engineering and Compaction

No matter how large the context window is, an agent will fill it. A coding agent reads files, runs commands, emits diffs, hits errors, and receives user steering. The real question is not "how do we cram in more tokens" but "how do we preserve the state necessary to keep working." Compaction means projecting old context into a summary the agent can continue executing from — not merely making the chat transcript shorter.

When to compact

The compaction trigger should be based on the model window and a reserved budget. If the model window is 200k tokens, you cannot wait until 199k to compact, because the next request still needs room for output, tool parameters, and a safety margin. A simple policy is:

if used_tokens > context_window - reserved_output - safety_margin:
  compact()

Be conservative with the reserved budget. A coding agent's next response may contain long tool call parameters or explanations. Compacting too late means the provider rejects the request outright, and the agent never gets the chance to tidy up its own state.

Where to cut

Do not cut in the middle of an arbitrary message. A safe cut point is usually after a complete turn — that is, once the assistant message and all the tool results it requested have been written to the log. Cutting through half a tool batch makes the model see "I called a tool, but the result vanished," and after recovery it will easily re-execute or misjudge.

A compaction entry should record:

The summary text.
The token count before compaction.
The id of the first entry that is still kept in full.
The key files that were read.
The files that were modified.
Unfinished tasks and user constraints.

Not all of these details need to reach the model, but they should go into the log, where the UI and extensions can use them.

The shape of a good summary

An agent-facing summary is not meeting minutes. It should help the model keep working. A recommended structure:

Goal:
- User wants ...

Current status:
- Done ...
- Still failing ...

Important constraints:
- Do not edit tests.
- Keep public API unchanged.

Files observed:
- src/parser.ts: contains ...

Files modified:
- src/parser.ts: changed ...

Open tool results:
- Last test run failed with ...

Next step:
- Inspect ...

The most important parts of the summary are the constraints, the file facts, and the next step. Do not force the model to rediscover everything after compaction.

The post-compaction amnesia test

Every compaction implementation should run an amnesia test: build a long session in which the agent reads a file, discovers a constraint, and modifies another file, then trigger compaction. After compaction, ask the model "what is the next step." If the model has forgotten the user's constraints or the file it just modified, the summary is unusable.

This test does not need a real model. You can have the faux provider check whether the post-compaction context contains these keywords: the goal, the constraints, the modified files, the most recent failure, and the next step. A smoke test with a real model is only used to check that the summary reads naturally.

More context is not better

Many beginners lean toward keeping as many old messages as possible. It feels safe, but in practice it dilutes the model's attention with historical noise. Tool output especially: one failing test run can produce thousands of lines of logs, and what is actually useful is the failure name, the error line, the tail of the stack trace, and the command's exit code.

The goal of context engineering is a high signal-to-noise ratio. For an agent, good compaction is not "lossless"; it is "preserve the state needed to finish the task, and be explicit about what information was discarded." When the summary cannot cover some old fact, it should tell the model to re-read the file, rather than pretend to remember.

Production trade-offs

Compaction itself calls the model, so it can fail, cost money, and get rate-limited. The runtime has to decide what to do when compaction fails. Common strategies:

If there is still room, continue for one more turn and compact again later.
If you are already near the limit, pause the task and ask the user to confirm.
If the compaction model fails, fall back to a shorter local summary template, but flag the quality as lower.

The compaction summary should be written to the log. Do not keep it only in memory. When a session is recovered, the context builder must be able to see the compaction entry and skip earlier messages based on it.

Exercises

Add a compaction entry and a context builder to the session log.

Acceptance criteria:

Compaction only happens at complete turn boundaries.
The compaction entry contains a summary and a firstKeptEntryId.
When building context, old messages are replaced by the summary, not deleted from the log.
After compaction, the model can still see the goal, the constraints, the files read, the files modified, and the next step.
When the provider reports that the context is too long, the agent can trigger the compaction flow instead of simply exiting.