Build a Coding Agent from Scratch
中文

00. Introduction: An Agent Is Not a Single API Call

If you already know how to make a single large-model API call, what you have is "a function that answers questions." A coding agent solves a different class of problem: the user states a goal, and the system must repeatedly observe the environment, call tools, interpret results, revise its plan, and save progress — and keep working after the user steers it mid-run, stops it, or resumes it. It is not a longer prompt; it is a runtime built around the model.

This book walks you through building a teaching project called tiny-agent. It does not replicate any existing product, but it absorbs the key design decisions of mature coding agents: the protocol layer is isolated from provider APIs, tool execution is driven by stop reasons, sessions treat an append-only log as the source of truth, long conversations advance through compacted context projections, file reads, writes, and command execution all pass through a security boundary, and the CLI, TUI, JSON mode, and SDK all consume the same event stream.

What you will end up building

The final project needs these capabilities:

  • Accept a user task and call the model.
  • Expose tools such as read, grep, edit, write, and bash.
  • Execute tools in response to the model's tool use requests and feed the results back to the model.
  • Emit an event stream of model text, tool calls, tool progress, tool results, and end-of-turn.
  • Write all messages and non-message events to a JSONL session log.
  • Restore context from any session, and compact when the context approaches the window limit.
  • Support user steering or follow-ups while the model is running.
  • Provide at least two shells: an interactive CLI and a machine-readable JSON mode.
  • Test at zero cost with a faux provider and session replay.

This is not a toy chatbot. Each capability maps to a failure you will hit in a real product: the model generates invalid tool arguments, two parallel tools modify the same file at once, streaming output gets cut off, the context forgets a file it just read after compaction, the user changes the goal mid-run, the tool schema has already changed by the time a session is restored. The tutorial puts these failure modes front and center instead of handing you "best practices."

The reader contract

This book assumes you are comfortable with TypeScript, Node.js, Promises, async iterators, the command line, and basic file system APIs. Agent concepts are introduced from first principles: tool calling, loops, events, sessions, and tool boundaries. Example code stays at teaching scale: enough to express the boundaries, without cramming every edge case into an unreadable blob of code.

Every chapter has three layers:

  1. Concept: why this layer is needed.
  2. Structure: what the contract is between this layer and the ones above and below it.
  3. Checkpoint: what behavior you should observe once you finish.

If all you want is prompt tricks, this book is not for you. If you want to know how a system that can actually change code on your behalf is decomposed into protocol, loop, tools, state, permissions, interface, and extensions, that is exactly what this book was written for.

Cost and testing strategy

Agent development cannot turn every test into a real model call. There are three reasons: the cost is uncontrollable, the output is not reproducible, and when something fails it is hard to tell whether the problem is the model or the runtime. So this book introduces a faux provider from the very start. It is not a mock function that returns a string; it is scripted to return complete assistant messages, tool use, usage, and stop reasons. That lets you test the agent loop, tool error feedback, compaction, restoration, and UI events against recorded responses.

Real models are reserved for a small number of end-to-end checks. The default development workflow should be:

  • Unit tests use the faux provider.
  • Integration tests use session replay.
  • A handful of smoke tests call the real model.
  • Cost accounting goes into the usage field of every assistant message.

This discipline runs through the entire book. An agent without reproducible tests quickly turns into a black box tuned by feel.

The core mental model

You can think of a coding agent as the following data flow:

user goal
  -> context builder
  -> provider adapter
  -> model response
  -> stop reason
  -> tool executor
  -> tool result
  -> session log
  -> next context projection

The most important separation here is between the "log" and the "context." The log is the source of truth: it records what happened. The context is a projection: it is only the slice currently being prepared for the model. Compaction, branching, restoration, UI rendering, and extension records should all be built on the log — not the other way around, treating the current prompt as the system state.

Chapter checkpoint

After reading this chapter, you should be able to state the difference between an agent and a single LLM API call in one sentence: an agent is a recoverable runtime built around model responses. The hard part is not "getting the model to say something," but "when the model wants to act, how the system reliably executes, records, feeds back, and continues."