Orchestrating Multiple MCP Servers: Building a Tool Ecosystem for Complex Agents

The Multi-Server Reality

Production AI agents rarely connect to a single MCP server. A customer support agent might need a CRM server for customer data, a knowledge base server for documentation, a ticketing server for issue management, and an analytics server for usage metrics. Each server is maintained by a different team, deployed independently, and versioned on its own schedule.

Orchestrating these servers into a cohesive tool ecosystem is an architectural challenge that touches on naming conflicts, connection management, error isolation, and performance.

Connecting Multiple Servers

The OpenAI Agents SDK supports multiple MCP servers natively. Each server is an independent connection:

from agents import Agent
from agents.mcp import MCPServerStdio, MCPServerStreamableHTTP

# Local server for file operations
filesystem_server = MCPServerStdio(
    name="Filesystem",
    params={
        "command": "npx",
        "args": ["-y", "@modelcontextprotocol/server-filesystem", "/data"],
    },
    cache_tools_list=True,
)

# Remote server for database queries
database_server = MCPServerStreamableHTTP(
    name="Database",
    params={"url": "http://db-mcp:8001/mcp"},
    cache_tools_list=True,
)

# Remote server for monitoring
monitoring_server = MCPServerStreamableHTTP(
    name="Monitoring",
    params={"url": "http://monitoring-mcp:8002/mcp"},
    cache_tools_list=True,
)

agent = Agent(
    name="Operations Assistant",
    instructions="""You help with operational tasks. You have access to:
    - Filesystem tools for reading and writing config files
    - Database tools for querying application data
    - Monitoring tools for checking system health and metrics

    Always check system health before making database changes.""",
    mcp_servers=[filesystem_server, database_server, monitoring_server],
)

When the agent starts, it calls tools/list on each server and presents the combined tool set to the LLM. The LLM sees a flat list of tools from all servers and can call any of them.

Namespace Management

The biggest risk with multiple servers is tool name collisions. If the database server and the monitoring server both expose a tool called query, the agent cannot distinguish between them. Solve this at the server level by using descriptive, namespaced tool names:

# Bad: generic names that will collide
@mcp_server.tool()
async def query(sql: str) -> str: ...

@mcp_server.tool()
async def search(term: str) -> str: ...

# Good: namespaced names that are unambiguous
@mcp_server.tool()
async def db_query(sql: str) -> str:
    """Execute a SQL query against the application database."""
    ...

@mcp_server.tool()
async def monitoring_search_alerts(term: str) -> str:
    """Search monitoring alerts by keyword."""
    ...

A naming convention like {domain}_{action} or {domain}_{action}_{target} prevents collisions and helps the LLM understand which server a tool belongs to.

Connection Lifecycle Management

Each MCP server connection has a lifecycle — initialization, active use, and cleanup. With multiple servers, manage these lifecycles carefully to avoid resource leaks:

from agents import Agent, Runner
from agents.mcp import MCPServerStdio, MCPServerStreamableHTTP

async def run_with_multiple_servers(user_message: str):
    """Run an agent with proper multi-server lifecycle management."""

    servers = [
        MCPServerStdio(
            name="Filesystem",
            params={"command": "npx", "args": ["-y", "@mcp/server-fs", "/data"]},
            cache_tools_list=True,
        ),
        MCPServerStreamableHTTP(
            name="Database",
            params={"url": "http://db-mcp:8001/mcp"},
            cache_tools_list=True,
        ),
    ]

    # Use async context managers to ensure cleanup
    async with servers[0] as fs_server, servers[1] as db_server:
        agent = Agent(
            name="Assistant",
            instructions="Help with file and database operations.",
            mcp_servers=[fs_server, db_server],
        )

        result = await Runner.run(agent, user_message)
        return result.final_output

The async with pattern ensures that every server connection is properly closed, even if an error occurs. For stdio servers, this means the subprocess is terminated. For HTTP servers, this means the session is closed.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

Error Isolation

When one server fails, the agent should continue functioning with the remaining servers. Implement error isolation so that a single server outage does not crash the entire agent:

import asyncio
import json

async def resilient_tool_discovery(servers: list) -> dict:
    """Discover tools from multiple servers with error isolation."""
    all_tools = {}

    async def discover_single(server):
        try:
            tools = await asyncio.wait_for(
                server.list_tools(),
                timeout=5.0,
            )
            return server.name, tools
        except asyncio.TimeoutError:
            print(f"Warning: {server.name} timed out during discovery")
            return server.name, []
        except Exception as e:
            print(f"Warning: {server.name} failed: {e}")
            return server.name, []

    results = await asyncio.gather(
        *[discover_single(s) for s in servers]
    )

    for name, tools in results:
        if tools:
            all_tools[name] = tools
            print(f"Discovered {len(tools)} tools from {name}")
        else:
            print(f"No tools available from {name}")

    return all_tools

Performance Optimization

With multiple servers, tool discovery latency multiplies. Apply these optimizations to keep startup fast.

First, enable cache_tools_list=True on every server so discovery happens only once.

Second, initialize servers in parallel rather than sequentially:

async def parallel_server_init(servers: list):
    """Initialize all MCP servers concurrently."""
    init_tasks = [server.__aenter__() for server in servers]
    results = await asyncio.gather(*init_tasks, return_exceptions=True)

    healthy_servers = []
    for server, result in zip(servers, results):
        if isinstance(result, Exception):
            print(f"Failed to initialize {server.name}: {result}")
        else:
            healthy_servers.append(server)

    return healthy_servers

Third, if certain servers are only needed for specific tasks, connect to them lazily — only when the agent first attempts to use a tool from that server.

Routing Strategies

For complex agents with many servers, consider implementing a routing layer that directs tool calls to the appropriate server based on the tool name prefix:

class ServerRouter:
    """Route tool calls to the correct MCP server by namespace."""

    def __init__(self):
        self._routes: dict[str, object] = {}

    def register(self, prefix: str, server):
        """Register a server for a tool name prefix."""
        self._routes[prefix] = server

    def resolve(self, tool_name: str):
        """Find the server responsible for a tool."""
        for prefix, server in self._routes.items():
            if tool_name.startswith(prefix):
                return server
        return None

router = ServerRouter()
router.register("db_", database_server)
router.register("fs_", filesystem_server)
router.register("monitor_", monitoring_server)

FAQ

Is there a limit to how many MCP servers an agent can connect to?

There is no protocol-level limit, but practical limits exist. Each server adds tools to the LLM's context window. With 10 servers exposing 15 tools each, you have 150 tools — which consumes significant context space and can degrade the LLM's ability to choose the right tool. Keep the active tool set under 30-40 tools by connecting only the servers relevant to the current task.

How do I handle a server that becomes unresponsive mid-conversation?

Implement timeouts on every tool call. If a call to a specific server times out, return an error message to the agent that identifies which server is unavailable. The LLM can then adjust its plan — either retrying later, using an alternative approach, or informing the user that a specific capability is temporarily unavailable.

For HTTP transport servers, yes — multiple agents can connect to the same server concurrently since each request is independent. For stdio servers, each agent typically needs its own subprocess because stdio is a single-client transport. If you need multiple agents to share a local tool, wrap it in an HTTP server instead.

#MCP #MultiServer #Architecture #AIAgents #Orchestration #AgenticAI #LearnAI #AIEngineering

Orchestrating Multiple MCP Servers: Building a Tool Ecosystem for Complex Agents

The Multi-Server Reality

Connecting Multiple Servers

Namespace Management

Connection Lifecycle Management

Error Isolation

Performance Optimization

Routing Strategies

FAQ

Is there a limit to how many MCP servers an agent can connect to?

How do I handle a server that becomes unresponsive mid-conversation?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding

The Multi-Server Reality

Connecting Multiple Servers

Namespace Management

Connection Lifecycle Management

Error Isolation

Performance Optimization

Routing Strategies

FAQ

Is there a limit to how many MCP servers an agent can connect to?

How do I handle a server that becomes unresponsive mid-conversation?

Can different agents share the same MCP server connections?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding