SDK Testing: Unit Tests, Integration Tests, and Recorded HTTP Fixtures
Learn testing strategies for AI agent SDKs including unit tests for parsers and models, integration tests against live APIs, VCR-style recorded HTTP fixtures, and CI/CD pipeline configuration.
The Testing Pyramid for SDKs
SDK testing follows a specific pyramid. At the base, unit tests verify models, parsers, and utility functions with zero network calls. In the middle, recorded HTTP fixture tests replay captured API responses to validate the full request/response cycle without hitting live servers. At the top, integration tests run against the real API to catch compatibility issues.
Most SDK bugs live in the serialization, deserialization, and error handling layers — exactly where unit tests and fixture tests shine. Integration tests catch API contract changes but are slow and require credentials, so they run less frequently.
Unit Testing Models and Parsers
Start with the code that has no dependencies. Pydantic models, error classification, retry delay calculation, and SSE parsing are pure functions that deserve thorough unit tests:
# tests/test_models.py
import pytest
from myagent.types.agents import Agent, AgentCreateParams
def test_agent_deserialization():
raw = {
"id": "agent_abc123",
"name": "Test Bot",
"model": "gpt-4o",
"instructions": "Be helpful.",
"createdAt": "2026-03-17T00:00:00Z",
"tools": [{"id": "t1", "name": "search", "type": "function"}],
}
agent = Agent.model_validate(raw)
assert agent.id == "agent_abc123"
assert agent.name == "Test Bot"
assert len(agent.tools) == 1
assert agent.tools[0].name == "search"
def test_agent_deserialization_ignores_unknown_fields():
raw = {
"id": "agent_abc123",
"name": "Test",
"model": "gpt-4o",
"instructions": "",
"createdAt": "2026-03-17T00:00:00Z",
"tools": [],
"futureField": "should not break",
}
agent = Agent.model_validate(raw)
assert agent.id == "agent_abc123"
def test_create_params_validation():
params = AgentCreateParams(name="Bot", model="gpt-4o")
assert params.name == "Bot"
assert params.model == "gpt-4o"
def test_create_params_rejects_invalid():
with pytest.raises(Exception):
AgentCreateParams(name=123) # name must be str
Test the retry delay calculator independently:
# tests/test_retry.py
from myagent._retry import RetryPolicy
def test_exponential_backoff():
policy = RetryPolicy(initial_delay=1.0, backoff_factor=2.0)
assert policy.calculate_delay(0) == 1.0
assert policy.calculate_delay(1) == 2.0
assert policy.calculate_delay(2) == 4.0
def test_max_delay_cap():
policy = RetryPolicy(initial_delay=1.0, backoff_factor=2.0, max_delay=5.0)
assert policy.calculate_delay(10) == 5.0 # Capped at max
def test_retry_after_honored():
policy = RetryPolicy()
assert policy.calculate_delay(0, retry_after=10.0) == 10.0
def test_retry_after_capped():
policy = RetryPolicy(max_delay=5.0)
assert policy.calculate_delay(0, retry_after=60.0) == 5.0
Recorded HTTP Fixtures with pytest-recording
Recorded fixtures (also called VCR cassettes) capture real HTTP interactions and replay them in tests. This gives you the confidence of integration tests with the speed and determinism of unit tests:
# tests/test_agents_resource.py
import pytest
from myagent import AgentClient
@pytest.fixture
def client():
return AgentClient(api_key="test-key-for-recording")
@pytest.mark.vcr()
def test_create_agent(client):
agent = client.agents.create(
name="Test Bot",
model="gpt-4o",
instructions="Be helpful.",
)
assert agent.id is not None
assert agent.name == "Test Bot"
@pytest.mark.vcr()
def test_list_agents(client):
agents = client.agents.list(limit=5)
assert isinstance(agents, list)
assert len(agents) <= 5
The first time you run these tests with --vcr-record=new_episodes, they hit the real API and record the responses to YAML cassette files. Subsequent runs replay the cassettes without network access.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
Configure VCR to scrub sensitive data:
# conftest.py
import pytest
@pytest.fixture(scope="module")
def vcr_config():
return {
"filter_headers": ["authorization", "cookie"],
"filter_query_parameters": ["api_key"],
"before_record_response": scrub_response,
}
def scrub_response(response):
"""Remove sensitive data from recorded responses."""
body = response["body"]["string"]
# Replace real IDs or PII if needed
return response
TypeScript Testing with Nock
In TypeScript, nock intercepts HTTP requests at the Node.js level and returns mock responses:
// tests/agents.test.ts
import { describe, it, expect, afterEach } from 'vitest';
import nock from 'nock';
import { AgentClient } from '../src/client';
const BASE_URL = 'https://api.myagent.ai/v1';
describe('AgentsResource', () => {
afterEach(() => nock.cleanAll());
it('creates an agent', async () => {
const mockAgent = {
id: 'agent_abc123',
name: 'Test Bot',
model: 'gpt-4o',
instructions: 'Be helpful.',
tools: [],
createdAt: '2026-03-17T00:00:00Z',
};
nock(BASE_URL)
.post('/agents', { name: 'Test Bot', model: 'gpt-4o' })
.reply(201, mockAgent);
const client = new AgentClient({ apiKey: 'test-key' });
const agent = await client.agents.create({
name: 'Test Bot',
model: 'gpt-4o',
});
expect(agent.id).toBe('agent_abc123');
expect(agent.name).toBe('Test Bot');
});
it('handles 401 errors', async () => {
nock(BASE_URL)
.get('/agents/invalid')
.reply(401, { error: 'Invalid API key' });
const client = new AgentClient({ apiKey: 'bad-key' });
await expect(client.agents.get('invalid')).rejects.toThrow(
'Invalid API key'
);
});
});
Integration Tests with Live API
Integration tests run against the real API. Gate them behind an environment variable so they only run when credentials are available:
# tests/integration/test_live_api.py
import os
import pytest
pytestmark = pytest.mark.skipif(
os.environ.get("MYAGENT_LIVE_TESTS") != "1",
reason="Live API tests disabled. Set MYAGENT_LIVE_TESTS=1 to run.",
)
@pytest.fixture
def live_client():
from myagent import AgentClient
return AgentClient() # Uses MYAGENT_API_KEY env var
def test_full_agent_lifecycle(live_client):
# Create
agent = live_client.agents.create(
name="Integration Test Bot",
model="gpt-4o",
instructions="Say hello.",
)
assert agent.id is not None
# Read
fetched = live_client.agents.get(agent.id)
assert fetched.name == "Integration Test Bot"
# Delete
live_client.agents.delete(agent.id)
CI/CD Pipeline
Run unit tests and fixture tests on every push. Run integration tests on a schedule or before releases:
# .github/workflows/sdk-tests.yml
name: SDK Tests
on: [push, pull_request]
jobs:
unit-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- run: pip install -e ".[dev]"
- run: pytest tests/ -m "not integration" --vcr-record=none
FAQ
When should I re-record VCR cassettes?
Re-record when the API changes (new fields, changed response structure) or when you add new test cases that cover previously untested endpoints. Automate periodic re-recording in CI by running integration tests monthly with --vcr-record=all and committing the updated cassettes.
How do I test streaming responses without a live server?
Create mock async generators that yield pre-built SSE event objects. In Python, write an async def mock_stream() that yields SSEEvent instances with controlled data and timing. This lets you test your SSE parser, event callback handler, and stream collector independently.
Should I mock the HTTP client or use a recording approach?
Use recordings for most tests — they validate the full serialization and deserialization stack, catching bugs that mocks miss. Use mocks only for testing specific error conditions (network timeouts, malformed responses) that are difficult to capture in recordings.
#Testing #SDKTesting #VCR #CICD #AgenticAI #Python #TypeScript #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.