Building Multi-Step Reasoning Agents with Claude Extended Thinking

What is Extended Thinking

Claude's extended thinking feature gives the model a dedicated space to reason through problems before producing a response. When enabled, Claude generates internal "thinking" tokens that are visible to the developer but are clearly separated from the final output. This is not prompt engineering — it is a model-level feature that allocates compute specifically to reasoning.

Extended thinking dramatically improves performance on tasks requiring multi-step logic: mathematical proofs, complex code analysis, strategic planning, and any scenario where the first intuition might be wrong.

Enabling Extended Thinking

Enable extended thinking by adding a thinking parameter to your API call:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000,  # Max tokens for thinking
    },
    messages=[{
        "role": "user",
        "content": "Solve this step by step: If a train leaves Station A at 60 mph and another leaves Station B (300 miles away) at 40 mph heading toward each other, when and where do they meet?"
    }]
)

# Response contains both thinking and text blocks
for block in response.content:
    if block.type == "thinking":
        print("=== THINKING ===")
        print(block.thinking)
    elif block.type == "text":
        print("=== RESPONSE ===")
        print(block.text)

The budget_tokens parameter sets the maximum number of tokens Claude can spend on thinking. Set it higher for harder problems. Claude will not always use the full budget — it stops thinking when it has enough clarity to answer.

Building a Reasoning Agent with Tools

Extended thinking combines naturally with tool use. Claude thinks through the problem, decides which tools to call, and then reasons about the results:

tools = [
    {
        "name": "execute_python",
        "description": "Execute Python code and return the output. Use for calculations, data processing, or verification.",
        "input_schema": {
            "type": "object",
            "properties": {
                "code": {"type": "string", "description": "Python code to execute"}
            },
            "required": ["code"]
        }
    },
    {
        "name": "query_knowledge_base",
        "description": "Search an internal knowledge base for facts and reference data.",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "Search query"}
            },
            "required": ["query"]
        }
    }
]

def run_reasoning_agent(question: str) -> dict:
    messages = [{"role": "user", "content": question}]
    thinking_log = []

    while True:
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=16000,
            thinking={
                "type": "enabled",
                "budget_tokens": 8000,
            },
            tools=tools,
            messages=messages,
        )

        # Capture thinking blocks
        for block in response.content:
            if block.type == "thinking":
                thinking_log.append(block.thinking)

        if response.stop_reason == "end_turn":
            final_text = [b.text for b in response.content if b.type == "text"]
            return {
                "answer": "\n".join(final_text),
                "thinking_steps": thinking_log,
            }

        # Process tool calls
        messages.append({"role": "assistant", "content": response.content})
        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                result = execute_tool(block.name, block.input)
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": str(result),
                })
        messages.append({"role": "user", "content": tool_results})

When Extended Thinking Makes a Difference

Extended thinking is not always necessary. It adds latency and token cost. Use it selectively for tasks where reasoning quality matters more than speed.

High-value use cases:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

# Complex code analysis
result = run_reasoning_agent(
    "Review this function for concurrency bugs, edge cases, and "
    "performance issues. The function handles concurrent database "
    "writes with optimistic locking:\n\n" + code_snippet
)

# Multi-step math and logic
result = run_reasoning_agent(
    "A company's revenue follows R(t) = 100e^(0.05t) - 20t^2 + 500t. "
    "Find when revenue is maximized and the maximum value."
)

# Strategic decision making
result = run_reasoning_agent(
    "Given these three architecture options for our payment system, "
    "analyze tradeoffs for latency, consistency, cost, and operational "
    "complexity:\n\n" + options_description
)

Skip extended thinking for: Simple lookups, straightforward text generation, translation, and tasks where Claude already performs well without extra reasoning time.

Controlling Thinking Budget

The budget_tokens parameter gives you fine-grained control over reasoning depth:

# Quick analysis — 2K thinking tokens
quick_response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4000,
    thinking={"type": "enabled", "budget_tokens": 2000},
    messages=[{"role": "user", "content": "What are the main pros and cons of microservices?"}]
)

# Deep analysis — 16K thinking tokens
deep_response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 16000},
    messages=[{"role": "user", "content": complex_code_review_prompt}]
)

Start with a modest budget (4,000-8,000 tokens) and increase it if you notice Claude's thinking being cut short on difficult problems. You can inspect the thinking output to calibrate.

Streaming Thinking Tokens

For long-running reasoning tasks, stream the response so you can display thinking in real time:

with client.messages.stream(
    model="claude-sonnet-4-20250514",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 10000},
    messages=[{"role": "user", "content": hard_problem}]
) as stream:
    for event in stream:
        if event.type == "content_block_start":
            if event.content_block.type == "thinking":
                print("[Thinking...]", end="", flush=True)
            elif event.content_block.type == "text":
                print("\n[Answer] ", end="", flush=True)
        elif event.type == "content_block_delta":
            if hasattr(event.delta, "thinking"):
                print(event.delta.thinking, end="", flush=True)
            elif hasattr(event.delta, "text"):
                print(event.delta.text, end="", flush=True)

FAQ

Does extended thinking work with all Claude models?

Extended thinking is available on Claude Sonnet and Claude Opus. The thinking budget limits and capabilities may vary between models. Check the Anthropic documentation for the latest model support details.

Can I use extended thinking with tool use simultaneously?

Yes. When both are enabled, Claude thinks before deciding whether to call tools, and thinks again after receiving tool results. The thinking tokens from all turns accumulate in the conversation, providing a full reasoning trace across the entire agent loop.

How much do thinking tokens cost?

Thinking tokens are billed at the same rate as output tokens for the model you are using. A budget_tokens of 10,000 means up to 10,000 additional output tokens charged at the model's per-token output rate. Monitor your thinking token usage to balance reasoning quality against cost.

#Claude #ExtendedThinking #Reasoning #ChainOfThought #Python #AgenticAI #LearnAI #AIEngineering

Building Multi-Step Reasoning Agents with Claude Extended Thinking

What is Extended Thinking

Enabling Extended Thinking

Building a Reasoning Agent with Tools

When Extended Thinking Makes a Difference

Controlling Thinking Budget

Streaming Thinking Tokens

FAQ

Does extended thinking work with all Claude models?

Can I use extended thinking with tool use simultaneously?

How much do thinking tokens cost?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding