04. Provider Abstraction and a Unified Message Protocol

When an agent has only one model, you might pass the vendor SDK's types throughout the entire system. That gets you started quickly, but it soon locks the runtime into one provider's semantics. Different providers express tool calls, streaming deltas, errors, usage, stop reasons, system prompts, and image input in different ways. The agent kernel should not have to understand these differences.

The mature approach is to establish an internal message protocol and convert in both directions at the provider boundary. Model vendors are plugins; the agent's source of truth is your own types.

Why an internal protocol is necessary

Without an internal protocol, the problems spread step by step:

The UI needs to know whether the assistant stopped because of tool use, so it depends on vendor fields.
The session log stores raw vendor responses, and can no longer be restored after switching models.
The tool result format is coupled to one API, and another provider requires adaptation everywhere.
Tests must mock the vendor SDK instead of mocking the agent's real boundary.

An internal protocol confines these problems to the provider adapter. The agent loop only knows about Message, ToolDefinition, AssistantMessage, and stopReason. Provider differences exist in exactly two places: conversion before sending the request, and conversion after receiving the response.

The minimum the protocol layer must store

Assistant messages should not store text alone. At a minimum they should store:

model: the id of the model that actually responded.
provider: an optional provider id, useful for auditing and restoration.
usage: cost fields such as input, output, cache, and reasoning tokens.
stopReason: the normalized stop reason the loop's control flow needs.
content: text, tool calls, and possibly images or other blocks.

The reasons for storing these fields are practical. A user may switch models mid-session; you still need to know where each response came from. Billing needs usage. Resumption needs the stop reason. Tool calling needs structured content blocks.

The shape of a provider adapter

You can write the provider boundary as two directions:

type ProviderRequest = {
  messages: Message[];
  tools: ToolDefinition[];
  systemPrompt: string;
  model: string;
};

type ProviderClient = {
  id: string;
  complete(request: ProviderRequest, signal: AbortSignal): Promise<AssistantMessage>;
  stream(request: ProviderRequest, signal: AbortSignal): AsyncIterable<ProviderEvent>;
};

complete serves non-streaming use and tests; stream serves the product experience. The final assistant messages they return must be equivalent. Otherwise you will run into "the non-streaming tests pass, but the streaming UI behaves differently."

The faux provider is a first-class citizen

Do not treat the faux provider as a throwaway mock. It should implement the same interface as a real provider and be able to return complete assistant messages, tool calls, usage, and stop reasons. A scripted provider can work like this:

type ScriptedStep = {
  expectLastRole?: Message["role"];
  response: AssistantMessage;
};

class ScriptedProvider implements ModelClient {
  private readonly steps: ScriptedStep[];
  private index = 0;

  constructor(steps: ScriptedStep[]) {
    this.steps = steps;
  }

  async complete(input: { messages: Message[] }): Promise<AssistantMessage> {
    const step = this.steps[this.index];
    if (!step) {
      throw new Error("No scripted response left");
    }
    this.index += 1;
    const last = input.messages.at(-1);
    if (step.expectLastRole && last?.role !== step.expectLastRole) {
      throw new Error(`Expected last role ${step.expectLastRole}, got ${last?.role ?? "none"}`);
    }
    return step.response;
  }
}

This provider can test whether the loop requests the model again after a tool result, and it can also test unknown tools, compaction, steering, and resumption. It comes much closer to real agent behavior than mocking fetch.

The model catalog and capabilities

The agent also needs a model catalog. The catalog is not dropdown data — it is the basis for runtime decisions. For each model, record at least:

Context window size.
Whether it supports tool calling.
Whether it supports streaming tool arguments.
Whether it supports images, reasoning budgets, and caching.
The default provider and authentication method.

Compaction thresholds, tool exposure, UI hints, and error messages all depend on these capabilities. Do not scatter "if the model name contains some string" checks throughout the code. Model capabilities should come from configuration and the catalog.

Production trade-offs

The provider adapter is the boundary where error handling is densest. You need to normalize vendor errors into categories the runtime can understand: authentication failure, rate limiting, retryable server errors, context overflow, content-safety refusal, and network interruption. Retryable errors go into a backoff policy; non-retryable errors go into the session log and are surfaced to the user.

In addition, the streaming adapter must produce a clear state when the stream breaks. Do not let the UI hang on "the model is responding." The assistant message after a stream interruption can be marked stopReason: "error" and keep the text fragments already received, making it easy for the user to decide whether to retry or continue.

Exercises

Implement two providers:

ScriptedProvider: returns fixed assistant messages from an array.
HttpProvider: only needs to support non-streaming calls to one real model.

Acceptance criteria:

The agent loop can switch providers without changing a single line of core code.
Both providers return the internal AssistantMessage.
Usage and stop reason are not lost.
When the real provider returns a context-overflow error, the error is recognized as requiring compaction rather than treated as an ordinary exception.