Testing MCP Servers: Unit Tests, Integration Tests, and Compliance Validation

Why MCP Servers Need Rigorous Testing

MCP servers are the hands and eyes of AI agents. When a tool returns incorrect data, the agent makes decisions based on wrong information. When a tool fails silently, the agent cannot recover. When the server violates the MCP protocol, the agent runtime crashes or enters an undefined state.

Testing MCP servers requires three layers: unit tests for the business logic inside each tool, integration tests that verify the full JSON-RPC protocol flow, and compliance tests that ensure the server behaves correctly according to the MCP specification.

Unit Testing Tool Functions

Start by testing the pure business logic of each tool function in isolation, without the MCP protocol layer:

# test_tools.py
import pytest
import json
import aiosqlite

# Import the tool functions directly
from db_server import list_tables, query_db, insert_record

DATABASE_PATH = ":memory:"


@pytest.fixture
async def setup_db(tmp_path):
    """Create a test database with sample data."""
    db_path = str(tmp_path / "test.db")

    async with aiosqlite.connect(db_path) as db:
        await db.execute("""
            CREATE TABLE users (
                id INTEGER PRIMARY KEY,
                name TEXT NOT NULL,
                email TEXT UNIQUE NOT NULL
            )
        """)
        await db.execute(
            "INSERT INTO users (name, email) VALUES (?, ?)",
            ["Alice", "alice@example.com"],
        )
        await db.commit()

    return db_path


@pytest.mark.asyncio
async def test_list_tables(setup_db, monkeypatch):
    """Verify list_tables returns correct schema information."""
    monkeypatch.setattr("db_server.DATABASE_PATH", setup_db)

    result = json.loads(await list_tables())

    assert "users" in result
    columns = result["users"]
    column_names = [col["name"] for col in columns]
    assert "id" in column_names
    assert "name" in column_names
    assert "email" in column_names


@pytest.mark.asyncio
async def test_query_db_select(setup_db, monkeypatch):
    """Verify query_db returns correct results for SELECT queries."""
    monkeypatch.setattr("db_server.DATABASE_PATH", setup_db)

    result = json.loads(await query_db("SELECT * FROM users"))

    assert result["row_count"] == 1
    assert result["rows"][0]["name"] == "Alice"
    assert result["columns"] == ["id", "name", "email"]


@pytest.mark.asyncio
async def test_query_db_rejects_non_select(setup_db, monkeypatch):
    """Verify query_db rejects non-SELECT statements."""
    monkeypatch.setattr("db_server.DATABASE_PATH", setup_db)

    result = json.loads(await query_db("DROP TABLE users"))

    assert "error" in result
    assert "SELECT" in result["error"]


@pytest.mark.asyncio
async def test_insert_record_validates_table_name(setup_db, monkeypatch):
    """Verify insert_record rejects invalid table names."""
    monkeypatch.setattr("db_server.DATABASE_PATH", setup_db)

    result = json.loads(
        await insert_record("'; DROP TABLE users; --", {"name": "Bob"})
    )

    assert "error" in result
    assert "Invalid table name" in result["error"]

Unit tests are fast, isolated, and catch logic bugs early. Run them on every commit.

Integration Testing with MCP Client

Integration tests verify the complete protocol flow — initialization, tool discovery, tool execution, and error handling. The MCP SDK provides a client you can use to test against a running server:

# test_integration.py
import pytest
from mcp import ClientSession
from mcp.client.stdio import stdio_client, StdioServerParameters

SERVER_COMMAND = "python"
SERVER_ARGS = ["db_server.py"]


@pytest.fixture
async def mcp_client():
    """Create an MCP client connected to the test server."""
    server_params = StdioServerParameters(
        command=SERVER_COMMAND,
        args=SERVER_ARGS,
        env={"DATABASE_PATH": "test.db"},
    )

    async with stdio_client(server_params) as (read_stream, write_stream):
        async with ClientSession(read_stream, write_stream) as session:
            await session.initialize()
            yield session


@pytest.mark.asyncio
async def test_initialization(mcp_client):
    """Verify server completes MCP initialization handshake."""
    # If we get here, initialization succeeded
    assert mcp_client is not None


@pytest.mark.asyncio
async def test_tool_discovery(mcp_client):
    """Verify all expected tools are listed."""
    tools = await mcp_client.list_tools()

    tool_names = [tool.name for tool in tools.tools]
    assert "list_tables" in tool_names
    assert "query_db" in tool_names
    assert "insert_record" in tool_names


@pytest.mark.asyncio
async def test_tool_schemas_valid(mcp_client):
    """Verify tool schemas contain required fields."""
    tools = await mcp_client.list_tools()

    for tool in tools.tools:
        assert tool.name, "Tool must have a name"
        assert tool.description, f"Tool {tool.name} must have a description"
        assert tool.inputSchema, f"Tool {tool.name} must have an input schema"
        assert tool.inputSchema.get("type") == "object", (
            f"Tool {tool.name} schema must be an object type"
        )


@pytest.mark.asyncio
async def test_tool_execution(mcp_client):
    """Verify tool call returns valid content."""
    result = await mcp_client.call_tool("list_tables", {})

    assert result.content, "Tool must return content"
    assert len(result.content) > 0
    assert result.content[0].type == "text"

    # Verify the response is valid JSON
    data = json.loads(result.content[0].text)
    assert isinstance(data, dict)

Protocol Compliance Testing

Compliance tests verify edge cases defined by the MCP specification — what happens when the client sends invalid parameters, calls a nonexistent tool, or sends malformed JSON:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

# test_compliance.py
import pytest
import json


@pytest.mark.asyncio
async def test_unknown_tool_returns_error(mcp_client):
    """Calling a nonexistent tool must return an error."""
    try:
        result = await mcp_client.call_tool("nonexistent_tool", {})
        # Some implementations return error content instead of raising
        if result.isError:
            assert True
        else:
            pytest.fail("Expected error for unknown tool")
    except Exception as e:
        # JSON-RPC error is also acceptable
        assert "not found" in str(e).lower() or "-32601" in str(e)


@pytest.mark.asyncio
async def test_missing_required_params(mcp_client):
    """Calling a tool without required params must return an error."""
    try:
        result = await mcp_client.call_tool("query_db", {})
        if hasattr(result, "isError") and result.isError:
            assert True
        else:
            # Check if the tool handled missing params gracefully
            data = json.loads(result.content[0].text)
            assert "error" in data
    except Exception:
        # JSON-RPC validation error is acceptable
        assert True


@pytest.mark.asyncio
async def test_resource_list(mcp_client):
    """Verify resources/list returns a valid response."""
    resources = await mcp_client.list_resources()
    assert isinstance(resources.resources, list)

    for resource in resources.resources:
        assert resource.uri, "Resource must have a URI"
        assert resource.name, "Resource must have a name"


@pytest.mark.asyncio
async def test_prompt_list(mcp_client):
    """Verify prompts/list returns a valid response."""
    prompts = await mcp_client.list_prompts()
    assert isinstance(prompts.prompts, list)

    for prompt in prompts.prompts:
        assert prompt.name, "Prompt must have a name"

Testing Error Boundaries

Test what happens when the tool's underlying service fails — database down, network timeout, or unexpected data:

@pytest.mark.asyncio
async def test_query_db_handles_syntax_error(setup_db, monkeypatch):
    """Verify query_db returns a useful error for SQL syntax errors."""
    monkeypatch.setattr("db_server.DATABASE_PATH", setup_db)

    result = json.loads(await query_db("SELECT * FORM users"))

    assert "error" in result
    # Error message should be informative but not expose internals
    assert len(result["error"]) > 0


@pytest.mark.asyncio
async def test_query_db_handles_missing_database(monkeypatch):
    """Verify query_db handles a missing database file gracefully."""
    monkeypatch.setattr("db_server.DATABASE_PATH", "/nonexistent/path.db")

    result = json.loads(await query_db("SELECT 1"))

    assert "error" in result

Continuous Integration Setup

Run MCP tests in CI with a test matrix covering both transport types:

# conftest.py — shared fixtures for CI
import pytest
import os


@pytest.fixture(params=["stdio", "http"])
def transport_type(request):
    """Run tests against both stdio and HTTP transports."""
    return request.param


@pytest.fixture
def server_url():
    """Get the test server URL from environment."""
    return os.environ.get("MCP_TEST_SERVER_URL", "http://localhost:8001/mcp")

A complete CI pipeline runs unit tests first (fast, no external dependencies), then integration tests against a locally spawned server, and finally compliance tests that verify protocol correctness. Gate deployments on all three passing.

FAQ

How do I test MCP servers that depend on external APIs?

Mock the external dependencies at the function level, not at the protocol level. Your MCP tool function calls an external API — mock that API call in your unit tests. For integration tests, use a test double service that mimics the external API's behavior. Never mock the MCP protocol itself in integration tests.

Should I test tool performance and latency?

Yes. Add benchmark tests that measure tool response time under load. An MCP server that responds in 50ms with one client but 5 seconds with ten concurrent clients has a concurrency bug. Use pytest-benchmark or custom timing code to catch performance regressions before they reach production.

How often should compliance tests run?

Run compliance tests on every pull request and after every MCP SDK upgrade. The MCP specification evolves, and new protocol versions may change expected behavior for edge cases. Compliance tests catch these breaking changes early, before your agents start failing in production.

#MCP #Testing #QualityAssurance #AIAgents #Python #AgenticAI #LearnAI #AIEngineering

Testing MCP Servers: Unit Tests, Integration Tests, and Compliance Validation

Why MCP Servers Need Rigorous Testing

Unit Testing Tool Functions

Integration Testing with MCP Client

Protocol Compliance Testing

Testing Error Boundaries

Continuous Integration Setup

FAQ

How do I test MCP servers that depend on external APIs?

Should I test tool performance and latency?

How often should compliance tests run?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding