TypeScript AI Agent Testing: Vitest, Mock LLMs, and Snapshot Testing

The Testing Challenge with AI Agents

AI agents are inherently non-deterministic. The same prompt can produce different responses across runs, making traditional assertion-based testing unreliable. A robust agent testing strategy separates what you can test deterministically — tool execution, input validation, state management, routing logic — from what requires fuzzy evaluation — the quality and correctness of LLM-generated text.

This guide walks through practical patterns for testing TypeScript AI agents using Vitest.

Setting Up Vitest

Install Vitest and configure it for a TypeScript project:

npm install -D vitest @vitest/coverage-v8

// vitest.config.ts
import { defineConfig } from "vitest/config";
import path from "path";

export default defineConfig({
  test: {
    globals: true,
    environment: "node",
    coverage: {
      provider: "v8",
      include: ["src/**/*.ts"],
      exclude: ["src/**/*.test.ts"],
    },
    testTimeout: 30_000, // Agent tests may be slow
  },
  resolve: {
    alias: {
      "@": path.resolve(__dirname, "src"),
    },
  },
});

Mocking LLM Responses

The most important testing pattern is replacing the LLM client with a mock that returns predetermined responses:

// src/lib/__mocks__/openai-client.ts
import { vi } from "vitest";

export function createMockOpenAI() {
  return {
    chat: {
      completions: {
        create: vi.fn(),
      },
    },
  };
}

export function mockChatResponse(content: string, toolCalls?: any[]) {
  return {
    choices: [
      {
        message: {
          role: "assistant",
          content,
          tool_calls: toolCalls ?? null,
        },
        finish_reason: toolCalls ? "tool_calls" : "stop",
      },
    ],
    usage: { prompt_tokens: 100, completion_tokens: 50, total_tokens: 150 },
  };
}

export function mockToolCallResponse(name: string, args: object) {
  return mockChatResponse(null as any, [
    {
      id: "call_mock_123",
      type: "function",
      function: {
        name,
        arguments: JSON.stringify(args),
      },
    },
  ]);
}

Testing Tool Execution Deterministically

Tools are pure functions with defined inputs and outputs — test them directly:

// src/tools/weather.test.ts
import { describe, it, expect, vi } from "vitest";
import { weatherTool } from "./weather";

// Mock the external API
vi.mock("./weather-api", () => ({
  fetchWeather: vi.fn().mockResolvedValue({
    temperature: 22,
    condition: "sunny",
    humidity: 45,
  }),
}));

describe("weatherTool", () => {
  it("returns formatted weather data for valid city", async () => {
    const result = await weatherTool.execute({
      city: "San Francisco",
      units: "celsius",
    });

    expect(result).toEqual({
      temperature: 22,
      condition: "sunny",
      humidity: 45,
    });
  });

  it("validates input schema rejects empty city", () => {
    const parsed = weatherTool.inputSchema.safeParse({ city: "" });
    expect(parsed.success).toBe(false);
  });

  it("applies default units when not specified", () => {
    const parsed = weatherTool.inputSchema.safeParse({ city: "Tokyo" });
    expect(parsed.success).toBe(true);
    if (parsed.success) {
      expect(parsed.data.units).toBe("celsius");
    }
  });
});

Testing the Agent Loop

Test that the agent correctly orchestrates tool calls and handles multi-step conversations:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

// src/agent/support-agent.test.ts
import { describe, it, expect, vi, beforeEach } from "vitest";
import { runAgent } from "./support-agent";
import { createMockOpenAI, mockChatResponse, mockToolCallResponse } from "../lib/__mocks__/openai-client";

describe("Support Agent", () => {
  let mockClient: ReturnType<typeof createMockOpenAI>;

  beforeEach(() => {
    mockClient = createMockOpenAI();
  });

  it("calls search tool when user asks a question", async () => {
    // First call: model decides to search
    mockClient.chat.completions.create
      .mockResolvedValueOnce(
        mockToolCallResponse("search_docs", { query: "reset password" })
      )
      // Second call: model responds with answer
      .mockResolvedValueOnce(
        mockChatResponse("To reset your password, go to Settings > Security.")
      );

    const result = await runAgent(mockClient as any, "How do I reset my password?");

    expect(result.text).toContain("reset your password");
    expect(mockClient.chat.completions.create).toHaveBeenCalledTimes(2);
  });

  it("respects maximum iteration limit", async () => {
    // Model keeps calling tools indefinitely
    mockClient.chat.completions.create.mockResolvedValue(
      mockToolCallResponse("search_docs", { query: "something" })
    );

    const result = await runAgent(mockClient as any, "loop forever", { maxIterations: 3 });

    expect(result.text).toContain("maximum iterations");
    expect(mockClient.chat.completions.create).toHaveBeenCalledTimes(3);
  });
});

Snapshot Testing for Agent Outputs

When you want to catch unexpected changes in agent behavior without brittle exact-match assertions, use snapshots on structured outputs:

it("produces expected structured analysis", async () => {
  mockClient.chat.completions.create.mockResolvedValueOnce(
    mockChatResponse(JSON.stringify({
      sentiment: "positive",
      confidence: 0.92,
      topics: ["product", "pricing"],
    }))
  );

  const result = await analyzeText(mockClient as any, "Great product, fair price!");

  expect(result).toMatchSnapshot();
});

Run vitest --update to update snapshots when behavior intentionally changes. Review snapshot diffs in pull requests to catch unintended regressions.

CI Integration

Add agent tests to your CI pipeline:

# .github/workflows/test.yml
name: Agent Tests
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
      - run: npm ci
      - run: npx vitest run --coverage
      - uses: actions/upload-artifact@v4
        with:
          name: coverage
          path: coverage/

Because all LLM calls are mocked, these tests are fast, deterministic, and free — no API keys needed in CI.

FAQ

Should I ever test with real LLM API calls?

Yes, but separately from your main test suite. Run a small set of "smoke tests" or "evaluation tests" against the real API on a schedule (daily or pre-release). These tests use fuzzy assertions — checking that responses contain expected keywords or pass a rubric — rather than exact matches. Keep them in a separate test file with a longer timeout.

How do I test streaming responses?

Mock the streaming response as an async iterable. Create a helper that yields chunks with simulated delays. Test that your stream processing code correctly accumulates deltas, handles tool call fragments, and emits the final assembled message.

What code coverage target should I aim for?

Focus on 90%+ coverage for tool implementations, input validation, and routing logic. The agent loop orchestration should be covered by integration tests with mocked LLM responses. Do not chase coverage on thin wrapper code that just forwards calls to the LLM SDK.

#Testing #Vitest #TypeScript #AIAgents #Mocking #CICD #AgenticAI #LearnAI #AIEngineering

TypeScript AI Agent Testing: Vitest, Mock LLMs, and Snapshot Testing

The Testing Challenge with AI Agents

Setting Up Vitest

Mocking LLM Responses

Testing Tool Execution Deterministically

Testing the Agent Loop

Snapshot Testing for Agent Outputs

CI Integration

FAQ

Should I ever test with real LLM API calls?

How do I test streaming responses?

What code coverage target should I aim for?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding