Skip to content
Learn Agentic AI11 min read0 views

Custom Model Providers with OpenAI Agents SDK: Using Any LLM as Your Agent Brain

Learn how to implement the Model protocol in OpenAI Agents SDK to connect any LLM — Anthropic Claude, local Ollama models, or custom endpoints — as your agent's reasoning engine with full tool-calling support.

Why Custom Model Providers Matter

The OpenAI Agents SDK ships with built-in support for OpenAI models, but production teams rarely use a single LLM vendor. You might need Claude for nuanced reasoning, a local Llama model for cost-sensitive tasks, or a fine-tuned endpoint for domain-specific work. The SDK's Model protocol lets you swap in any LLM without changing your agent logic.

This decoupling is the key architectural insight: your agent's behavior (instructions, tools, handoffs) stays the same regardless of which model powers the reasoning.

Understanding the Model Protocol

The SDK defines a Model protocol that any custom provider must implement. At its core, you need to provide a single method — get_response — that accepts the agent's conversation history and returns a structured response.

from __future__ import annotations
from agents import Agent, Runner, Model, ModelProvider
from agents.models import ModelResponse, ModelUsage
from agents.items import (
    TResponseInputItem,
    TResponseOutputItem,
    ModelResponse,
)
from dataclasses import dataclass
from typing import Any
import anthropic


@dataclass
class AnthropicModelResponse:
    output: list[TResponseOutputItem]
    usage: ModelUsage


class AnthropicModel(Model):
    """Custom model that routes agent calls to Anthropic Claude."""

    def __init__(self, model_name: str = "claude-sonnet-4-20250514"):
        self.model_name = model_name
        self.client = anthropic.AsyncAnthropic()

    async def get_response(
        self,
        system_instructions: str | None,
        input: list[TResponseInputItem],
        model_settings: Any,
        tools: list,
        output_schema: Any | None,
        handoffs: list,
        tracing: Any,
    ) -> ModelResponse:
        # Convert SDK messages to Anthropic format
        messages = self._convert_messages(input)

        response = await self.client.messages.create(
            model=self.model_name,
            max_tokens=model_settings.max_tokens or 4096,
            system=system_instructions or "",
            messages=messages,
            temperature=model_settings.temperature or 0.7,
        )

        return self._convert_response(response)

    def _convert_messages(self, input_items):
        """Transform SDK input items to Anthropic message format."""
        messages = []
        for item in input_items:
            if hasattr(item, "role") and hasattr(item, "content"):
                messages.append({
                    "role": item.role if item.role != "system" else "user",
                    "content": item.content,
                })
        return messages if messages else [{"role": "user", "content": "Hello"}]

    def _convert_response(self, response):
        """Transform Anthropic response back to SDK format."""
        # Build output items from response content blocks
        output_text = ""
        for block in response.content:
            if block.type == "text":
                output_text += block.text

        return ModelResponse(
            output=[],  # Simplified — populate with proper items
            usage=ModelUsage(
                input_tokens=response.usage.input_tokens,
                output_tokens=response.usage.output_tokens,
                requests=1,
            ),
            response_id=response.id,
        )

Building a Custom Model Provider

A ModelProvider maps model name strings to Model instances. This lets you register multiple backends under a single provider.

class MultiModelProvider(ModelProvider):
    """Routes model names to different LLM backends."""

    def __init__(self):
        self._models: dict[str, Model] = {}

    def register(self, name: str, model: Model):
        self._models[name] = model

    def get_model(self, model_name: str | None) -> Model:
        if model_name and model_name in self._models:
            return self._models[model_name]
        raise ValueError(f"Unknown model: {model_name}")


# Register providers
provider = MultiModelProvider()
provider.register("claude-sonnet", AnthropicModel("claude-sonnet-4-20250514"))
provider.register("claude-haiku", AnthropicModel("claude-haiku-4-20250514"))

Connecting a Local Ollama Model

For local inference, you can implement a provider that calls Ollama's HTTP API.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

import httpx

class OllamaModel(Model):
    def __init__(self, model_name: str = "llama3", base_url: str = "http://localhost:11434"):
        self.model_name = model_name
        self.base_url = base_url
        self.client = httpx.AsyncClient(timeout=120.0)

    async def get_response(self, system_instructions, input, model_settings, tools, output_schema, handoffs, tracing):
        messages = []
        if system_instructions:
            messages.append({"role": "system", "content": system_instructions})
        for item in input:
            if hasattr(item, "role"):
                messages.append({"role": item.role, "content": item.content})

        resp = await self.client.post(
            f"{self.base_url}/api/chat",
            json={"model": self.model_name, "messages": messages, "stream": False},
        )
        data = resp.json()
        return self._build_response(data)

Wiring It Into Your Agent

Once your provider is ready, pass it when creating an agent.

import asyncio

agent = Agent(
    name="research_assistant",
    instructions="You are a helpful research assistant.",
    model="claude-sonnet",  # This name is resolved by the provider
)

async def main():
    result = await Runner.run(
        agent,
        input="Summarize the latest advances in quantum computing.",
        run_config={"model_provider": provider},
    )
    print(result.final_output)

asyncio.run(main())

The agent code has zero awareness of which vendor is running under the hood. Switching from Claude to a local Llama model is a one-line configuration change.

When to Use Custom Providers

Custom model providers solve real production problems: cost optimization by routing simple tasks to cheaper models, compliance by keeping sensitive data on local models, redundancy by failing over between vendors, and specialization by directing domain tasks to fine-tuned endpoints.

FAQ

Can I use tool calling with custom model providers?

Yes, but your custom Model implementation must convert the SDK's tool definitions into whatever format your target LLM expects. For Anthropic, this means transforming the JSON schema into Claude's tool format. For local models without native tool calling, you can inject tool descriptions into the system prompt and parse the output yourself.

Does streaming work with custom providers?

The SDK supports a get_stream_response method alongside get_response. Implement this method to return an async iterator of chunks. If you skip it, the SDK falls back to the non-streaming path, which still works but returns the full response at once.

How do I handle authentication for multiple providers?

Each Model instance manages its own authentication. Store API keys in environment variables and read them in each model's constructor. Avoid passing keys through the agent layer — the model provider encapsulates all vendor-specific details.


#OpenAIAgentsSDK #CustomModelProvider #LLMIntegration #Anthropic #Ollama #Python #AgenticAI #LearnAI #AIEngineering

Share this article
C

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.