Skip to content
Learn Agentic AI11 min read0 views

SDK Testing: Unit Tests, Integration Tests, and Recorded HTTP Fixtures

Learn testing strategies for AI agent SDKs including unit tests for parsers and models, integration tests against live APIs, VCR-style recorded HTTP fixtures, and CI/CD pipeline configuration.

The Testing Pyramid for SDKs

SDK testing follows a specific pyramid. At the base, unit tests verify models, parsers, and utility functions with zero network calls. In the middle, recorded HTTP fixture tests replay captured API responses to validate the full request/response cycle without hitting live servers. At the top, integration tests run against the real API to catch compatibility issues.

Most SDK bugs live in the serialization, deserialization, and error handling layers — exactly where unit tests and fixture tests shine. Integration tests catch API contract changes but are slow and require credentials, so they run less frequently.

Unit Testing Models and Parsers

Start with the code that has no dependencies. Pydantic models, error classification, retry delay calculation, and SSE parsing are pure functions that deserve thorough unit tests:

# tests/test_models.py
import pytest
from myagent.types.agents import Agent, AgentCreateParams


def test_agent_deserialization():
    raw = {
        "id": "agent_abc123",
        "name": "Test Bot",
        "model": "gpt-4o",
        "instructions": "Be helpful.",
        "createdAt": "2026-03-17T00:00:00Z",
        "tools": [{"id": "t1", "name": "search", "type": "function"}],
    }
    agent = Agent.model_validate(raw)
    assert agent.id == "agent_abc123"
    assert agent.name == "Test Bot"
    assert len(agent.tools) == 1
    assert agent.tools[0].name == "search"


def test_agent_deserialization_ignores_unknown_fields():
    raw = {
        "id": "agent_abc123",
        "name": "Test",
        "model": "gpt-4o",
        "instructions": "",
        "createdAt": "2026-03-17T00:00:00Z",
        "tools": [],
        "futureField": "should not break",
    }
    agent = Agent.model_validate(raw)
    assert agent.id == "agent_abc123"


def test_create_params_validation():
    params = AgentCreateParams(name="Bot", model="gpt-4o")
    assert params.name == "Bot"
    assert params.model == "gpt-4o"


def test_create_params_rejects_invalid():
    with pytest.raises(Exception):
        AgentCreateParams(name=123)  # name must be str

Test the retry delay calculator independently:

# tests/test_retry.py
from myagent._retry import RetryPolicy


def test_exponential_backoff():
    policy = RetryPolicy(initial_delay=1.0, backoff_factor=2.0)
    assert policy.calculate_delay(0) == 1.0
    assert policy.calculate_delay(1) == 2.0
    assert policy.calculate_delay(2) == 4.0


def test_max_delay_cap():
    policy = RetryPolicy(initial_delay=1.0, backoff_factor=2.0, max_delay=5.0)
    assert policy.calculate_delay(10) == 5.0  # Capped at max


def test_retry_after_honored():
    policy = RetryPolicy()
    assert policy.calculate_delay(0, retry_after=10.0) == 10.0


def test_retry_after_capped():
    policy = RetryPolicy(max_delay=5.0)
    assert policy.calculate_delay(0, retry_after=60.0) == 5.0

Recorded HTTP Fixtures with pytest-recording

Recorded fixtures (also called VCR cassettes) capture real HTTP interactions and replay them in tests. This gives you the confidence of integration tests with the speed and determinism of unit tests:

# tests/test_agents_resource.py
import pytest
from myagent import AgentClient


@pytest.fixture
def client():
    return AgentClient(api_key="test-key-for-recording")


@pytest.mark.vcr()
def test_create_agent(client):
    agent = client.agents.create(
        name="Test Bot",
        model="gpt-4o",
        instructions="Be helpful.",
    )
    assert agent.id is not None
    assert agent.name == "Test Bot"


@pytest.mark.vcr()
def test_list_agents(client):
    agents = client.agents.list(limit=5)
    assert isinstance(agents, list)
    assert len(agents) <= 5

The first time you run these tests with --vcr-record=new_episodes, they hit the real API and record the responses to YAML cassette files. Subsequent runs replay the cassettes without network access.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Configure VCR to scrub sensitive data:

# conftest.py
import pytest


@pytest.fixture(scope="module")
def vcr_config():
    return {
        "filter_headers": ["authorization", "cookie"],
        "filter_query_parameters": ["api_key"],
        "before_record_response": scrub_response,
    }


def scrub_response(response):
    """Remove sensitive data from recorded responses."""
    body = response["body"]["string"]
    # Replace real IDs or PII if needed
    return response

TypeScript Testing with Nock

In TypeScript, nock intercepts HTTP requests at the Node.js level and returns mock responses:

// tests/agents.test.ts
import { describe, it, expect, afterEach } from 'vitest';
import nock from 'nock';
import { AgentClient } from '../src/client';

const BASE_URL = 'https://api.myagent.ai/v1';

describe('AgentsResource', () => {
  afterEach(() => nock.cleanAll());

  it('creates an agent', async () => {
    const mockAgent = {
      id: 'agent_abc123',
      name: 'Test Bot',
      model: 'gpt-4o',
      instructions: 'Be helpful.',
      tools: [],
      createdAt: '2026-03-17T00:00:00Z',
    };

    nock(BASE_URL)
      .post('/agents', { name: 'Test Bot', model: 'gpt-4o' })
      .reply(201, mockAgent);

    const client = new AgentClient({ apiKey: 'test-key' });
    const agent = await client.agents.create({
      name: 'Test Bot',
      model: 'gpt-4o',
    });

    expect(agent.id).toBe('agent_abc123');
    expect(agent.name).toBe('Test Bot');
  });

  it('handles 401 errors', async () => {
    nock(BASE_URL)
      .get('/agents/invalid')
      .reply(401, { error: 'Invalid API key' });

    const client = new AgentClient({ apiKey: 'bad-key' });

    await expect(client.agents.get('invalid')).rejects.toThrow(
      'Invalid API key'
    );
  });
});

Integration Tests with Live API

Integration tests run against the real API. Gate them behind an environment variable so they only run when credentials are available:

# tests/integration/test_live_api.py
import os
import pytest

pytestmark = pytest.mark.skipif(
    os.environ.get("MYAGENT_LIVE_TESTS") != "1",
    reason="Live API tests disabled. Set MYAGENT_LIVE_TESTS=1 to run.",
)


@pytest.fixture
def live_client():
    from myagent import AgentClient
    return AgentClient()  # Uses MYAGENT_API_KEY env var


def test_full_agent_lifecycle(live_client):
    # Create
    agent = live_client.agents.create(
        name="Integration Test Bot",
        model="gpt-4o",
        instructions="Say hello.",
    )
    assert agent.id is not None

    # Read
    fetched = live_client.agents.get(agent.id)
    assert fetched.name == "Integration Test Bot"

    # Delete
    live_client.agents.delete(agent.id)

CI/CD Pipeline

Run unit tests and fixture tests on every push. Run integration tests on a schedule or before releases:

# .github/workflows/sdk-tests.yml
name: SDK Tests
on: [push, pull_request]
jobs:
  unit-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - run: pip install -e ".[dev]"
      - run: pytest tests/ -m "not integration" --vcr-record=none

FAQ

When should I re-record VCR cassettes?

Re-record when the API changes (new fields, changed response structure) or when you add new test cases that cover previously untested endpoints. Automate periodic re-recording in CI by running integration tests monthly with --vcr-record=all and committing the updated cassettes.

How do I test streaming responses without a live server?

Create mock async generators that yield pre-built SSE event objects. In Python, write an async def mock_stream() that yields SSEEvent instances with controlled data and timing. This lets you test your SSE parser, event callback handler, and stream collector independently.

Should I mock the HTTP client or use a recording approach?

Use recordings for most tests — they validate the full serialization and deserialization stack, catching bugs that mocks miss. Use mocks only for testing specific error conditions (network timeouts, malformed responses) that are difficult to capture in recordings.


#Testing #SDKTesting #VCR #CICD #AgenticAI #Python #TypeScript #LearnAI #AIEngineering

Share this article
C

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.