FastAPI Testing for AI Agent APIs: pytest, httpx, and Mock Strategies
Write comprehensive tests for AI agent APIs using pytest and httpx. Covers TestClient usage, async test patterns, fixture design for database and LLM mocking, and strategies for testing streaming endpoints.
The Testing Challenge for AI Agent APIs
Testing AI agent APIs is harder than testing typical CRUD endpoints because of external dependencies. Your endpoints call LLM APIs that are non-deterministic, expensive, and rate-limited. They read from vector databases, write to conversation stores, and may trigger background processing. A good test strategy mocks the expensive external calls while keeping everything else as real as possible.
The goal is a test suite that runs in seconds, costs nothing in API fees, and catches real bugs in your request handling, validation, error handling, and business logic.
Setting Up pytest for FastAPI
Install the testing dependencies:
pip install pytest pytest-asyncio httpx
Configure pytest in your pyproject.toml:
[tool.pytest.ini_options]
asyncio_mode = "auto"
testpaths = ["tests"]
Create your test fixtures in tests/conftest.py:
import pytest
from httpx import AsyncClient, ASGITransport
from sqlalchemy.ext.asyncio import (
create_async_engine,
async_sessionmaker,
)
from app.main import app
from app.dependencies import get_db, get_llm_service
# Test database
TEST_DB_URL = "sqlite+aiosqlite:///./test.db"
test_engine = create_async_engine(TEST_DB_URL)
test_session_factory = async_sessionmaker(
test_engine, expire_on_commit=False
)
async def get_test_db():
async with test_session_factory() as session:
try:
yield session
await session.commit()
except Exception:
await session.rollback()
raise
@pytest.fixture(autouse=True)
async def setup_database():
async with test_engine.begin() as conn:
await conn.run_sync(Base.metadata.create_all)
yield
async with test_engine.begin() as conn:
await conn.run_sync(Base.metadata.drop_all)
@pytest.fixture
async def client():
app.dependency_overrides[get_db] = get_test_db
app.dependency_overrides[get_llm_service] = (
lambda: MockLLMService()
)
transport = ASGITransport(app=app)
async with AsyncClient(
transport=transport,
base_url="http://test",
) as ac:
yield ac
app.dependency_overrides.clear()
Mock LLM Service
Create a deterministic mock that replaces real LLM calls:
class MockLLMService:
def __init__(self):
self.calls = []
self.response_text = "This is a mock agent response."
async def generate(self, messages: list[dict]) -> str:
self.calls.append(messages)
return self.response_text
async def stream_generate(self, message: str):
self.calls.append(message)
for word in self.response_text.split():
yield word + " "
def set_response(self, text: str):
self.response_text = text
def set_error(self, error: Exception):
self._error = error
async def generate_with_error(self, messages):
if hasattr(self, "_error"):
raise self._error
return await self.generate(messages)
This mock records every call for assertion and lets tests configure specific responses or errors.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
Testing Basic Endpoints
Write tests for your agent chat endpoint:
async def test_chat_returns_response(client):
response = await client.post(
"/agents/chat",
json={
"messages": [
{"role": "user", "content": "Hello"}
],
"session_id": "test-123",
},
)
assert response.status_code == 200
data = response.json()
assert "response" in data
assert len(data["response"]) > 0
async def test_chat_validates_empty_messages(client):
response = await client.post(
"/agents/chat",
json={"messages": [], "session_id": "test-123"},
)
assert response.status_code == 422
async def test_chat_validates_message_format(client):
response = await client.post(
"/agents/chat",
json={
"messages": [
{"role": "invalid_role", "content": "Hello"}
],
},
)
assert response.status_code == 422
async def test_chat_rejects_missing_auth(client):
# Remove default auth header if set
response = await client.post(
"/agents/chat",
json={
"messages": [
{"role": "user", "content": "Hello"}
],
},
headers={"Authorization": ""},
)
assert response.status_code == 401
Testing Streaming Endpoints
Streaming endpoints require reading the response body as a stream:
async def test_stream_chat_returns_tokens(client):
response = await client.post(
"/agents/chat/stream",
json={
"messages": [
{"role": "user", "content": "Hello"}
],
},
)
assert response.status_code == 200
# For SSE, parse the event stream
body = response.text
assert "data:" in body
# Extract all data lines
data_lines = [
line.split("data: ", 1)[1]
for line in body.split("\n")
if line.startswith("data: ")
]
assert len(data_lines) > 0
Testing with Database State
Tests that depend on existing data should set up state through fixtures or helper functions:
async def test_get_conversation_history(client):
# Create a conversation first
create_response = await client.post(
"/conversations",
json={"agent_type": "assistant"},
)
conversation_id = create_response.json()["id"]
# Send some messages
await client.post(
"/agents/chat",
json={
"messages": [
{"role": "user", "content": "First message"}
],
"session_id": conversation_id,
},
)
# Fetch history
history_response = await client.get(
f"/conversations/{conversation_id}/history"
)
assert history_response.status_code == 200
messages = history_response.json()["messages"]
assert len(messages) >= 2 # user + assistant
async def test_conversation_not_found(client):
response = await client.get(
"/conversations/nonexistent-id/history"
)
assert response.status_code == 404
Testing Error Scenarios
Deliberately trigger error conditions to verify your error handling:
async def test_llm_timeout_returns_503(client):
import asyncio
class TimeoutLLMService:
async def generate(self, messages):
raise asyncio.TimeoutError("LLM request timed out")
app.dependency_overrides[get_llm_service] = (
lambda: TimeoutLLMService()
)
response = await client.post(
"/agents/chat",
json={
"messages": [
{"role": "user", "content": "Hello"}
],
},
)
assert response.status_code == 503
assert "timeout" in response.json()["error"].lower()
async def test_rate_limit_returns_429(client):
class RateLimitedLLMService:
async def generate(self, messages):
from openai import RateLimitError
raise RateLimitError(
"Rate limit exceeded",
response=None,
body=None,
)
app.dependency_overrides[get_llm_service] = (
lambda: RateLimitedLLMService()
)
response = await client.post(
"/agents/chat",
json={
"messages": [
{"role": "user", "content": "Hello"}
],
},
)
assert response.status_code == 429
Parameterized Tests for Agent Types
Use pytest parametrize to test multiple agent configurations with the same test logic:
@pytest.mark.parametrize("agent_type", [
"assistant", "researcher", "coder",
])
async def test_all_agent_types_respond(client, agent_type):
response = await client.post(
f"/agents/{agent_type}/chat",
json={
"messages": [
{"role": "user", "content": "Hello"}
],
},
)
assert response.status_code == 200
assert "response" in response.json()
FAQ
Should I test with a real database or mock it?
Use a real test database, not a mock. Mocking the database hides SQL errors, missing columns, constraint violations, and query logic bugs. Use an in-memory SQLite database for fast tests or a dedicated PostgreSQL test database for integration tests. Create and drop all tables per test using the setup_database fixture to ensure test isolation. The test database approach catches real bugs that mocks would miss.
How do I test that my mock LLM service was called with the correct prompt?
Record calls in your mock service and assert against them. The MockLLMService shown above stores every call in a self.calls list. After your test makes a request, access the mock from the dependency override and check mock_llm.calls[-1] to verify the messages passed to the LLM. This lets you verify that your endpoint correctly constructs the prompt with conversation history, system prompts, and context.
How do I run only async tests with pytest?
With pytest-asyncio and asyncio_mode = "auto" in your config, any async def test_* function is automatically treated as an async test. You do not need the @pytest.mark.asyncio decorator when using auto mode. Run all tests with pytest tests/ and they will execute correctly whether sync or async.
#FastAPI #Testing #Pytest #AIAgents #Mock #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.