Claude Extended Thinking: Leveraging Chain-of-Thought for Complex Reasoning
Learn how to use Claude's extended thinking feature to unlock deeper reasoning for complex agent tasks. Understand thinking blocks, budget tokens, and when extended thinking outperforms standard responses.
What Is Extended Thinking
Extended thinking is a Claude feature that lets the model "think out loud" before producing its final answer. When enabled, Claude generates an internal chain-of-thought reasoning trace — a thinking block — that works through the problem step by step before committing to a response.
This is not the same as asking Claude to "think step by step" in a prompt. Extended thinking is a model-level feature where Claude allocates dedicated compute to reasoning. The thinking happens in a structured thinking content block that is returned alongside the final text block, giving you visibility into the model's reasoning process.
Enabling Extended Thinking
Extended thinking requires a thinking configuration with a budget_tokens parameter that controls how many tokens Claude can spend on reasoning:
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 10000
},
messages=[
{"role": "user", "content": "Analyze the trade-offs between microservices and monolithic architecture for a startup with 5 engineers building a fintech product."}
]
)
# The thinking block contains the reasoning trace
for block in response.content:
if block.type == "thinking":
print("=== THINKING ===")
print(block.thinking)
elif block.type == "text":
print("=== RESPONSE ===")
print(block.text)
The budget_tokens sets the maximum tokens Claude can use for thinking. The model may use fewer tokens if it reaches a conclusion early. The max_tokens must be larger than budget_tokens to leave room for the actual response.
Understanding the Response Structure
With extended thinking enabled, the response contains multiple content blocks:
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=12000,
thinking={"type": "enabled", "budget_tokens": 8000},
messages=[
{"role": "user", "content": "Write a Python function that finds the longest palindromic substring in O(n) time using Manacher's algorithm."}
]
)
for block in response.content:
if block.type == "thinking":
print(f"Thinking used approximately {len(block.thinking.split())} words")
elif block.type == "text":
print(block.text)
# Token usage shows thinking tokens separately
print(f"Input tokens: {response.usage.input_tokens}")
print(f"Output tokens: {response.usage.output_tokens}")
The thinking block is visible to you as the developer but is not included in conversation history for subsequent turns. This means thinking does not accumulate context window usage across multi-turn conversations.
When to Use Extended Thinking
Extended thinking is most valuable for tasks that require multi-step reasoning:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
import anthropic
client = anthropic.Anthropic()
# Complex analysis task - good candidate for extended thinking
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 10000},
system="You are a code review agent. Analyze code for bugs, security issues, and performance problems.",
messages=[
{"role": "user", "content": """Review this authentication function:
def authenticate(username, password):
query = f"SELECT * FROM users WHERE username='{username}' AND password='{password}'"
result = db.execute(query)
if result:
token = base64.b64encode(f"{username}:{time.time()}".encode()).decode()
session['token'] = token
return {"status": "ok", "token": token}
return {"status": "fail"}
"""}
]
)
for block in response.content:
if block.type == "text":
print(block.text)
This is ideal for extended thinking because the model needs to evaluate SQL injection risks, password storage issues, token generation weaknesses, and session management problems — multiple distinct analyses that benefit from structured reasoning.
Budget Token Strategies
The budget allocation depends on task complexity:
import anthropic
client = anthropic.Anthropic()
def smart_query(prompt: str, complexity: str = "medium") -> str:
budgets = {
"low": 2000, # Simple factual questions
"medium": 6000, # Analysis and comparison tasks
"high": 12000, # Complex reasoning, code generation, math
}
budget = budgets.get(complexity, 6000)
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=budget + 4000,
thinking={"type": "enabled", "budget_tokens": budget},
messages=[{"role": "user", "content": prompt}]
)
return "".join(
block.text for block in response.content if block.type == "text"
)
# Low complexity - fast, cheap
answer = smart_query("What is the capital of France?", "low")
# High complexity - deep reasoning
answer = smart_query(
"Design a rate limiting system that handles 100K requests/second with geographic distribution",
"high"
)
Start with lower budgets and increase only when you observe the model cutting its reasoning short. Oversized budgets waste tokens (and money) without improving quality on simple tasks.
Extended Thinking in Agent Loops
When combining extended thinking with tool use, thinking happens before each tool call decision:
import anthropic
client = anthropic.Anthropic()
# Extended thinking works alongside tools
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 8000},
tools=[{
"name": "run_sql",
"description": "Execute a SQL query and return results.",
"input_schema": {
"type": "object",
"properties": {"query": {"type": "string"}},
"required": ["query"]
}
}],
messages=[
{"role": "user", "content": "Find the top 5 customers by lifetime revenue, excluding test accounts."}
]
)
# Response may contain: thinking -> text -> tool_use
for block in response.content:
print(f"Block type: {block.type}")
The thinking block reveals how Claude reasons about which tool to call and what arguments to provide, which is invaluable for debugging agent behavior.
FAQ
Does extended thinking increase costs?
Yes. Thinking tokens are billed as output tokens, which are more expensive than input tokens. A 10,000 token thinking budget could add significant cost per request. Use extended thinking selectively for tasks where the quality improvement justifies the cost, not for every API call.
Can I use extended thinking with streaming?
Yes. When streaming with extended thinking, you receive thinking_delta events followed by content_block_delta events for the text response. This lets you show a "reasoning" indicator to users while Claude thinks, then stream the final answer in real time.
Should I include the thinking block in conversation history?
No. The API does not include thinking blocks in the conversation history for subsequent turns. If you need to reference Claude's reasoning in follow-up turns, extract the relevant parts from the thinking block and include them as regular text content in your messages.
#Anthropic #Claude #ExtendedThinking #ChainOfThought #Reasoning #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.