Claude Agent SDK: Building Production AI Agents — A Developer's Guide

What Is the Claude Agent SDK?

The Claude Agent SDK is Anthropic's framework for building AI agents powered by Claude models. It provides a structured approach to creating agents that can reason through complex tasks, use tools, maintain conversation state, and collaborate with other agents through handoffs. Unlike generic orchestration frameworks, the Claude Agent SDK is optimized for Claude's unique strengths — extended thinking, long-context reasoning, and reliable tool use.

The SDK is designed for production use from day one, with built-in support for error handling, retry logic, streaming responses, and observability. Whether you are building a customer support agent, a data analysis assistant, or a complex multi-agent workflow, the Claude Agent SDK provides the primitives you need without the overhead you do not.

Installation and Setup

Install the SDK

pip install anthropic

The agent capabilities are built into the core Anthropic Python SDK. No separate package is needed.

Configure Your API Key

Set your Anthropic API key as an environment variable:

export ANTHROPIC_API_KEY="sk-ant-your-key-here"

For production, use a secrets manager (AWS Secrets Manager, HashiCorp Vault, or Kubernetes secrets) rather than environment variables in shell profiles.

Verify the setup:

from anthropic import Anthropic

client = Anthropic()
message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=100,
    messages=[{"role": "user", "content": "Hello!"}],
)
print(message.content[0].text)

Creating Your First Agent

An agent in the Claude ecosystem is a Claude model instance configured with specific instructions, tools, and behavioral constraints. Here is a minimal agent:

from anthropic import Anthropic

client = Anthropic()

def run_agent(user_message: str) -> str:
    """Run a simple agent loop with tool use."""
    system_prompt = """You are a helpful research assistant.
    You help users find information and answer questions
    accurately. If you are unsure about something, say so
    rather than guessing."""

    messages = [{"role": "user", "content": user_message}]

    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=4096,
        system=system_prompt,
        messages=messages,
    )

    return response.content[0].text

This is a single-turn agent. To make it agentic — capable of using tools and reasoning in loops — we need to add tool definitions and an execution loop.

Defining Tools

Tools are functions that the agent can call to interact with external systems. Claude's tool use protocol requires you to define tools with JSON schemas:

tools = [
    {
        "name": "search_database",
        "description": (
            "Search the product database by name or category. "
            "Returns matching products with prices and stock."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "Search query for products",
                },
                "category": {
                    "type": "string",
                    "description": "Optional product category",
                    "enum": [
                        "electronics",
                        "clothing",
                        "home",
                        "sports",
                    ],
                },
                "max_results": {
                    "type": "integer",
                    "description": "Max results to return",
                    "default": 5,
                },
            },
            "required": ["query"],
        },
    },
    {
        "name": "get_order_status",
        "description": (
            "Look up the current status of a customer order "
            "by order ID. Returns status, tracking info, "
            "and estimated delivery date."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "order_id": {
                    "type": "string",
                    "description": "The order ID (format: ORD-XXXXX)",
                },
            },
            "required": ["order_id"],
        },
    },
    {
        "name": "create_support_ticket",
        "description": (
            "Create a support ticket for issues that require "
            "human follow-up. Use when you cannot resolve the "
            "issue directly."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "subject": {
                    "type": "string",
                    "description": "Brief subject line",
                },
                "description": {
                    "type": "string",
                    "description": "Detailed issue description",
                },
                "priority": {
                    "type": "string",
                    "enum": ["low", "medium", "high", "urgent"],
                },
            },
            "required": ["subject", "description", "priority"],
        },
    },
]

The quality of your tool descriptions directly impacts the agent's ability to use them correctly. Write descriptions as if you are explaining the tool to a new team member — be specific about what it does, when to use it, and what it returns.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

Building the Agent Loop

The agent loop is where single-turn chat becomes agentic behavior. The loop sends a message to Claude, checks if the response contains tool calls, executes those tools, feeds the results back, and repeats until Claude produces a final text response:

import json

def execute_tool(tool_name: str, tool_input: dict) -> str:
    """Execute a tool and return the result as a string."""
    if tool_name == "search_database":
        # In production, this queries your actual database
        return json.dumps({
            "results": [
                {
                    "name": "Wireless Headphones",
                    "price": 79.99,
                    "in_stock": True,
                },
                {
                    "name": "Bluetooth Speaker",
                    "price": 49.99,
                    "in_stock": True,
                },
            ]
        })
    elif tool_name == "get_order_status":
        return json.dumps({
            "order_id": tool_input["order_id"],
            "status": "shipped",
            "tracking": "1Z999AA10123456784",
            "eta": "2026-03-18",
        })
    elif tool_name == "create_support_ticket":
        return json.dumps({
            "ticket_id": "TKT-8842",
            "status": "created",
        })
    return json.dumps({"error": f"Unknown tool: {tool_name}"})


def run_agent_loop(
    user_message: str,
    system_prompt: str,
    max_iterations: int = 10,
) -> str:
    """Run the full agent loop with tool execution."""
    messages = [{"role": "user", "content": user_message}]

    for iteration in range(max_iterations):
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=4096,
            system=system_prompt,
            tools=tools,
            messages=messages,
        )

        # Check if the response contains tool use
        if response.stop_reason == "tool_use":
            # Add assistant's response to conversation
            messages.append({
                "role": "assistant",
                "content": response.content,
            })

            # Execute each tool call
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    result = execute_tool(
                        block.name, block.input
                    )
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": result,
                    })

            # Add tool results to conversation
            messages.append({
                "role": "user",
                "content": tool_results,
            })
        else:
            # Agent produced a final response
            final_text = ""
            for block in response.content:
                if hasattr(block, "text"):
                    final_text += block.text
            return final_text

    return "Agent reached maximum iterations."

This loop is the heart of every Claude agent. The pattern is always the same: send, check for tool use, execute, feed back, repeat.

Multi-Agent Patterns

Complex systems benefit from multiple specialized agents. Here is a pattern for implementing agent handoffs with Claude:

class AgentSystem:
    """Multi-agent system with handoff support."""

    def __init__(self):
        self.client = Anthropic()
        self.agents = {}
        self.active_agent = None
        self.messages = []

    def register_agent(
        self,
        name: str,
        system_prompt: str,
        agent_tools: list,
        handoff_targets: list[str],
    ):
        """Register an agent with the system."""
        # Add handoff tool dynamically
        if handoff_targets:
            handoff_tool = {
                "name": "handoff",
                "description": (
                    "Hand off the conversation to another agent. "
                    f"Available targets: {', '.join(handoff_targets)}"
                ),
                "input_schema": {
                    "type": "object",
                    "properties": {
                        "target_agent": {
                            "type": "string",
                            "enum": handoff_targets,
                        },
                        "reason": {
                            "type": "string",
                            "description": "Why the handoff is needed",
                        },
                    },
                    "required": ["target_agent", "reason"],
                },
            }
            agent_tools = agent_tools + [handoff_tool]

        self.agents[name] = {
            "system_prompt": system_prompt,
            "tools": agent_tools,
        }

    def process(self, user_input: str) -> str:
        """Process a user message through the active agent."""
        self.messages.append({
            "role": "user",
            "content": user_input,
        })

        for _ in range(15):  # max iterations
            agent = self.agents[self.active_agent]
            response = self.client.messages.create(
                model="claude-sonnet-4-20250514",
                max_tokens=4096,
                system=agent["system_prompt"],
                tools=agent["tools"],
                messages=self.messages,
            )

            if response.stop_reason == "tool_use":
                self.messages.append({
                    "role": "assistant",
                    "content": response.content,
                })

                tool_results = []
                for block in response.content:
                    if block.type != "tool_use":
                        continue
                    if block.name == "handoff":
                        target = block.input["target_agent"]
                        self.active_agent = target
                        tool_results.append({
                            "type": "tool_result",
                            "tool_use_id": block.id,
                            "content": f"Handed off to {target}",
                        })
                    else:
                        result = execute_tool(
                            block.name, block.input
                        )
                        tool_results.append({
                            "type": "tool_result",
                            "tool_use_id": block.id,
                            "content": result,
                        })

                self.messages.append({
                    "role": "user",
                    "content": tool_results,
                })
            else:
                text = ""
                for block in response.content:
                    if hasattr(block, "text"):
                        text += block.text
                self.messages.append({
                    "role": "assistant",
                    "content": text,
                })
                return text

        return "Maximum iterations reached."

Error Handling in Production

Production agents must handle failures gracefully. Key error categories and their handling strategies:

API Errors

from anthropic import (
    APIError,
    RateLimitError,
    APIConnectionError,
)
import time

def call_claude_with_retry(
    messages, system, agent_tools, max_retries=3
):
    """Call Claude with exponential backoff retry."""
    for attempt in range(max_retries):
        try:
            return client.messages.create(
                model="claude-sonnet-4-20250514",
                max_tokens=4096,
                system=system,
                tools=agent_tools,
                messages=messages,
            )
        except RateLimitError:
            wait = 2 ** attempt
            time.sleep(wait)
        except APIConnectionError:
            if attempt == max_retries - 1:
                raise
            time.sleep(1)
        except APIError as e:
            # Log and re-raise non-retryable errors
            raise
    raise Exception("Max retries exceeded")

Tool Execution Errors

Always return errors as structured data rather than raising exceptions. The agent needs error information to decide its next action — retry, try a different approach, or inform the user:

def safe_execute_tool(
    tool_name: str, tool_input: dict
) -> str:
    """Execute a tool with error handling."""
    try:
        result = execute_tool(tool_name, tool_input)
        return result
    except TimeoutError:
        return json.dumps({
            "error": "Tool timed out. Try again.",
            "tool": tool_name,
        })
    except Exception as e:
        return json.dumps({
            "error": f"Tool failed: {str(e)}",
            "tool": tool_name,
        })

Deployment Considerations

Streaming for Low Latency

In production, stream agent responses to reduce perceived latency. Claude's streaming API sends tokens as they are generated:

with client.messages.stream(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    system=system_prompt,
    messages=messages,
) as stream:
    for text in stream.text_stream:
        yield text  # Send to client via SSE or WebSocket

Cost Management

Monitor and control costs with these strategies:

Set max_tokens appropriately for each use case (do not default to 4096 for simple responses)
Cache frequently used tool results in Redis
Use Claude 3.5 Haiku for simple routing and classification tasks
Implement conversation length limits to prevent runaway sessions
Track per-conversation costs and alert on anomalies

Conversation State Persistence

For production deployments, persist conversation state to a database. At CallSphere, we use PostgreSQL with a JSONB column for the messages array, which allows both efficient storage and flexible querying:

async def save_conversation(
    conversation_id: str,
    messages: list,
    active_agent: str,
):
    await db.execute(
        """INSERT INTO conversations
           (id, messages, active_agent, updated_at)
           VALUES ($1, $2, $3, NOW())
           ON CONFLICT (id) DO UPDATE SET
             messages = $2,
             active_agent = $3,
             updated_at = NOW()""",
        conversation_id,
        json.dumps(messages),
        active_agent,
    )

Frequently Asked Questions

How does the Claude Agent SDK differ from the OpenAI Agents SDK?

The Claude Agent SDK is tightly integrated with Claude's unique features — extended thinking for complex reasoning, 200K context windows for long documents, and Anthropic's safety-first design philosophy. The OpenAI Agents SDK provides a more opinionated framework with built-in abstractions for the agent loop, handoffs, and guardrails. Choose based on your primary model provider. If you use Claude as your main model, the Anthropic SDK gives you the most direct access to Claude's capabilities with the least abstraction overhead.

Can I use Claude with other agent frameworks like LangGraph?

Yes. Claude works with any framework that supports the Anthropic API or the OpenAI-compatible API format. LangGraph, CrewAI, and AutoGen all have Anthropic integrations. However, some Claude-specific features (like extended thinking) may not be fully exposed through third-party frameworks. If you need those features, use the Anthropic SDK directly for the agent loop and use the framework only for orchestration.

What model should I use for different agent tasks?

Use Claude 3.5 Sonnet for complex reasoning, multi-step tool use, and tasks that require high accuracy. Use Claude 3.5 Haiku for high-volume, latency-sensitive tasks like intent classification, simple Q&A, and routing. For tasks that benefit from deep analysis, enable extended thinking with Claude 3.5 Sonnet — it adds latency but significantly improves accuracy on complex problems.

How do I handle long conversations that exceed the context window?

Implement a sliding window strategy. Keep the system prompt, the first few messages (for context), and the most recent N messages. For the messages in between, generate a summary and include it as a system message. Alternatively, use Claude's 200K context window — most conversations fit comfortably. For truly long-running sessions (multi-day workflows), persist state in a database and load only the relevant context for each interaction.

What are the best practices for writing Claude agent system prompts?

Keep prompts structured and specific. Start with the agent's role and responsibilities, then list behavioral rules, then describe the tools and when to use each one. Use numbered lists for multi-step procedures. Include examples of good responses for ambiguous scenarios. Avoid vague instructions like "be helpful" — instead specify exactly what helpful means in your context. Test prompts against edge cases and adversarial inputs before deploying.