Agent Reasoning and Planning: Chain-of-Thought, ReAct, and Tree-of-Thought Patterns
Deep technical exploration of reasoning patterns for AI agents: Chain-of-Thought prompting, ReAct loops combining reasoning with action, and Tree-of-Thought branching search strategies.
Why Reasoning Patterns Matter for Agents
A language model without a reasoning strategy is like a developer without a debugger — it can produce output, but it cannot systematically work through complex problems. When you ask an LLM to "find the cheapest flight from NYC to Tokyo with a layover in a city with good food," the model needs to decompose this into sub-problems, reason through constraints, take actions (search flights, evaluate cities), and synthesize results. Without an explicit reasoning pattern, the model will hallucinate an answer or give a superficial one.
Three reasoning patterns have emerged as the foundational approaches for building agents that can plan and execute multi-step tasks: Chain-of-Thought (CoT), ReAct (Reason + Act), and Tree-of-Thought (ToT). Each pattern has distinct strengths, computational costs, and ideal use cases.
Chain-of-Thought Prompting
Chain-of-Thought prompting forces the model to externalize its reasoning process step by step before arriving at an answer. Instead of jumping directly from question to answer, the model produces intermediate reasoning steps that we can inspect, debug, and build upon.
The Core Mechanism
The insight behind CoT is simple: when humans solve complex problems, they think through intermediate steps. Forcing an LLM to do the same improves accuracy on reasoning-heavy tasks by 20-60% depending on the task complexity and model size.
from dataclasses import dataclass
from typing import Optional
@dataclass
class CoTStep:
step_number: int
thought: str
conclusion: Optional[str] = None
@dataclass
class CoTResult:
steps: list[CoTStep]
final_answer: str
confidence: float
class ChainOfThoughtAgent:
"""Agent that uses explicit Chain-of-Thought reasoning."""
COT_SYSTEM_PROMPT = """You are a reasoning agent. For every question:
1. Break the problem into logical steps
2. Think through each step explicitly
3. Show your reasoning before concluding
4. If you're uncertain, say so and explain why
Format your response as:
STEP 1: [thought]
STEP 2: [thought]
...
CONCLUSION: [final answer]
CONFIDENCE: [0.0-1.0]"""
def __init__(self, llm_client):
self.llm = llm_client
async def reason(self, question: str) -> CoTResult:
response = await self.llm.chat(messages=[
{"role": "system", "content": self.COT_SYSTEM_PROMPT},
{"role": "user", "content": question},
])
return self._parse_cot_response(response.content)
async def reason_with_verification(
self, question: str
) -> CoTResult:
"""Two-pass CoT: reason, then verify the reasoning."""
# Pass 1: Initial reasoning
initial = await self.reason(question)
# Pass 2: Verify each step
verification_prompt = (
f"Verify this reasoning step by step. "
f"For each step, confirm it is logically valid or "
f"identify the error.\n\n"
f"Question: {question}\n\n"
f"Reasoning:\n"
)
for step in initial.steps:
verification_prompt += (
f"Step {step.step_number}: {step.thought}\n"
)
verification_prompt += (
f"\nConclusion: {initial.final_answer}"
)
verification = await self.llm.chat(messages=[
{"role": "system", "content": self.COT_SYSTEM_PROMPT},
{"role": "user", "content": verification_prompt},
])
# If verification found errors, re-reason with corrections
if "error" in verification.content.lower():
corrected = await self.llm.chat(messages=[
{"role": "system", "content": self.COT_SYSTEM_PROMPT},
{
"role": "user",
"content": (
f"Original question: {question}\n\n"
f"Previous attempt had errors:\n"
f"{verification.content}\n\n"
f"Please re-reason from scratch, "
f"avoiding the identified errors."
),
},
])
return self._parse_cot_response(corrected.content)
return initial
def _parse_cot_response(self, text: str) -> CoTResult:
steps = []
lines = text.strip().split("\n")
final_answer = ""
confidence = 0.5
for line in lines:
line = line.strip()
if line.startswith("STEP"):
parts = line.split(":", 1)
if len(parts) == 2:
step_num = len(steps) + 1
steps.append(CoTStep(
step_number=step_num,
thought=parts[1].strip(),
))
elif line.startswith("CONCLUSION:"):
final_answer = line.split(":", 1)[1].strip()
elif line.startswith("CONFIDENCE:"):
try:
confidence = float(
line.split(":", 1)[1].strip()
)
except ValueError:
confidence = 0.5
return CoTResult(
steps=steps,
final_answer=final_answer,
confidence=confidence,
)
When to Use Chain-of-Thought
CoT works best for:
- Mathematical reasoning and word problems
- Multi-step logical deductions
- Tasks where showing work is as important as the answer (auditing, compliance)
- Situations where you need to understand why the agent reached a particular conclusion
CoT is less effective for tasks requiring real-time interaction with external systems, because it reasons in one shot without the ability to gather new information mid-reasoning.
ReAct: Reason + Act
ReAct addresses CoT's biggest limitation: in the real world, reasoning alone is insufficient — agents need to take actions (search databases, call APIs, read files) and use the results to inform their next reasoning step. ReAct interleaves thinking with acting in a loop: Thought -> Action -> Observation -> Thought -> Action -> Observation -> ... -> Answer.
from dataclasses import dataclass, field
from typing import Any, Callable, Awaitable
@dataclass
class ReActStep:
thought: str
action: str | None = None
action_input: dict | None = None
observation: str | None = None
@dataclass
class ReActTrace:
question: str
steps: list[ReActStep] = field(default_factory=list)
final_answer: str = ""
total_tokens: int = 0
class ReActAgent:
"""Implements the ReAct (Reason + Act) pattern."""
REACT_PROMPT = """You are a reasoning agent with access to tools.
For each step, you MUST follow this exact format:
Thought: [your reasoning about what to do next]
Action: [tool_name]
Action Input: [JSON arguments for the tool]
After receiving an observation, continue with another Thought.
When you have enough information to answer, use:
Thought: I now have enough information to answer.
Final Answer: [your answer]
AVAILABLE TOOLS:
{tool_descriptions}
IMPORTANT:
- Always think before acting
- Use tools to gather facts — never guess or assume
- If a tool returns an error, reason about alternatives
- Maximum {max_steps} steps before you must provide a Final Answer"""
def __init__(
self,
llm_client,
tools: dict[str, dict],
max_steps: int = 10,
):
self.llm = llm_client
self.tools = tools
self.max_steps = max_steps
async def run(self, question: str) -> ReActTrace:
trace = ReActTrace(question=question)
tool_desc = self._format_tool_descriptions()
messages = [
{
"role": "system",
"content": self.REACT_PROMPT.format(
tool_descriptions=tool_desc,
max_steps=self.max_steps,
),
},
{"role": "user", "content": question},
]
for step_num in range(self.max_steps):
response = await self.llm.chat(
messages=messages, stop=["Observation:"]
)
text = response.content.strip()
step = self._parse_step(text)
trace.steps.append(step)
# Check if we have a final answer
if "Final Answer:" in text:
trace.final_answer = text.split(
"Final Answer:"
)[1].strip()
break
# Execute the action if one was specified
if step.action and step.action in self.tools:
observation = await self._execute_tool(
step.action, step.action_input or {}
)
step.observation = str(observation)
# Add the full step to conversation
messages.append({
"role": "assistant",
"content": text,
})
messages.append({
"role": "user",
"content": f"Observation: {step.observation}",
})
elif step.action:
# Unknown tool
step.observation = (
f"Error: Tool '{step.action}' not found. "
f"Available tools: "
f"{', '.join(self.tools.keys())}"
)
messages.append({
"role": "assistant",
"content": text,
})
messages.append({
"role": "user",
"content": f"Observation: {step.observation}",
})
if not trace.final_answer:
trace.final_answer = (
"I was unable to reach a conclusion within "
f"the maximum {self.max_steps} steps."
)
return trace
async def _execute_tool(
self, tool_name: str, args: dict
) -> Any:
tool = self.tools[tool_name]
fn = tool["function"]
try:
if asyncio.iscoroutinefunction(fn):
return await fn(**args)
return fn(**args)
except Exception as e:
return f"Tool error: {type(e).__name__}: {e}"
def _parse_step(self, text: str) -> ReActStep:
thought = ""
action = None
action_input = None
for line in text.split("\n"):
line = line.strip()
if line.startswith("Thought:"):
thought = line.split("Thought:", 1)[1].strip()
elif line.startswith("Action:"):
action = line.split("Action:", 1)[1].strip()
elif line.startswith("Action Input:"):
raw = line.split("Action Input:", 1)[1].strip()
try:
import json
action_input = json.loads(raw)
except (json.JSONDecodeError, ValueError):
action_input = {"input": raw}
return ReActStep(
thought=thought,
action=action,
action_input=action_input,
)
def _format_tool_descriptions(self) -> str:
lines = []
for name, tool in self.tools.items():
desc = tool.get("description", "No description")
params = tool.get("parameters", {})
lines.append(f"- {name}: {desc}")
if params:
lines.append(f" Parameters: {params}")
return "\n".join(lines)
ReAct in Practice: A Research Agent
Here is a concrete example of ReAct in action. A research agent that answers questions by searching the web and reading documents:
import asyncio
# Define tools
research_tools = {
"web_search": {
"description": "Search the web for information",
"parameters": {"query": "string"},
"function": web_search_fn,
},
"read_url": {
"description": "Read the content of a specific URL",
"parameters": {"url": "string"},
"function": read_url_fn,
},
"calculate": {
"description": "Evaluate a mathematical expression",
"parameters": {"expression": "string"},
"function": lambda expression: eval(expression),
},
}
agent = ReActAgent(
llm_client=llm,
tools=research_tools,
max_steps=8,
)
# Example trace for: "What is the market cap of NVIDIA
# divided by the number of employees?"
#
# Thought: I need two pieces of data: NVIDIA market cap
# and employee count. Let me search for both.
# Action: web_search
# Action Input: {"query": "NVIDIA market cap 2026"}
# Observation: NVIDIA market cap is approximately $3.2 trillion...
#
# Thought: Got the market cap. Now I need employee count.
# Action: web_search
# Action Input: {"query": "NVIDIA number of employees 2026"}
# Observation: NVIDIA has approximately 32,000 employees...
#
# Thought: Now I can calculate: $3.2T / 32,000 employees
# Action: calculate
# Action Input: {"expression": "3200000000000 / 32000"}
# Observation: 100000000.0
#
# Thought: I now have enough information to answer.
# Final Answer: NVIDIA's market cap per employee is
# approximately $100 million.
The trace above illustrates the power of ReAct: each step combines reasoning (understanding what data is needed) with action (fetching that data), and observations inform subsequent reasoning.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
Tree-of-Thought: Branching Search
Tree-of-Thought (ToT) extends Chain-of-Thought from a single reasoning chain into a tree of possible reasoning paths. At each step, the model generates multiple candidate thoughts, evaluates which paths are most promising, and explores the best branches — potentially backtracking when a path leads to a dead end.
This is analogous to how a chess engine evaluates positions: instead of committing to one move sequence, it explores multiple lines and selects the most promising one.
from dataclasses import dataclass, field
from typing import Optional
import asyncio
@dataclass
class ThoughtNode:
id: str
depth: int
thought: str
evaluation_score: float = 0.0
children: list["ThoughtNode"] = field(default_factory=list)
parent_id: Optional[str] = None
is_solution: bool = False
class TreeOfThoughtAgent:
"""Implements Tree-of-Thought reasoning with breadth-first
or best-first search."""
def __init__(
self,
llm_client,
branching_factor: int = 3,
max_depth: int = 4,
search_strategy: str = "best_first",
):
self.llm = llm_client
self.branching_factor = branching_factor
self.max_depth = max_depth
self.search_strategy = search_strategy
self._node_counter = 0
async def solve(self, problem: str) -> dict:
root = ThoughtNode(
id=self._next_id(),
depth=0,
thought=f"Problem: {problem}",
)
if self.search_strategy == "best_first":
solution = await self._best_first_search(root, problem)
else:
solution = await self._breadth_first_search(
root, problem
)
return {
"solution": solution.thought if solution else "No solution found",
"path": self._trace_path(solution) if solution else [],
"nodes_explored": self._node_counter,
}
async def _best_first_search(
self, root: ThoughtNode, problem: str
) -> Optional[ThoughtNode]:
frontier = [root]
while frontier:
# Sort by evaluation score (highest first)
frontier.sort(
key=lambda n: n.evaluation_score, reverse=True
)
current = frontier.pop(0)
if current.depth >= self.max_depth:
continue
# Generate candidate next thoughts
candidates = await self._generate_thoughts(
problem, current
)
# Evaluate each candidate
evaluated = await self._evaluate_thoughts(
problem, candidates
)
for node in evaluated:
current.children.append(node)
# Check if this is a solution
if await self._is_solution(problem, node):
node.is_solution = True
return node
frontier.append(node)
return None
async def _breadth_first_search(
self, root: ThoughtNode, problem: str
) -> Optional[ThoughtNode]:
queue = [root]
while queue:
current_level = queue[:]
queue.clear()
for node in current_level:
if node.depth >= self.max_depth:
continue
candidates = await self._generate_thoughts(
problem, node
)
evaluated = await self._evaluate_thoughts(
problem, candidates
)
# Only keep the top-k candidates at each level
top_k = sorted(
evaluated,
key=lambda n: n.evaluation_score,
reverse=True,
)[: self.branching_factor]
for child in top_k:
node.children.append(child)
if await self._is_solution(problem, child):
child.is_solution = True
return child
queue.append(child)
return None
async def _generate_thoughts(
self, problem: str, parent: ThoughtNode
) -> list[ThoughtNode]:
path = self._trace_path(parent)
path_text = "\n".join(
f"Step {i+1}: {p.thought}" for i, p in enumerate(path)
)
response = await self.llm.chat(messages=[{
"role": "user",
"content": (
f"Problem: {problem}\n\n"
f"Reasoning so far:\n{path_text}\n\n"
f"Generate {self.branching_factor} distinct possible "
f"next reasoning steps. Each should be a different "
f"approach or angle.\n"
f"Format: one step per line, prefixed with "
f"THOUGHT 1:, THOUGHT 2:, etc."
),
}])
thoughts = []
for line in response.content.strip().split("\n"):
line = line.strip()
if line.startswith("THOUGHT"):
content = line.split(":", 1)[1].strip()
thoughts.append(ThoughtNode(
id=self._next_id(),
depth=parent.depth + 1,
thought=content,
parent_id=parent.id,
))
return thoughts[:self.branching_factor]
async def _evaluate_thoughts(
self, problem: str, nodes: list[ThoughtNode]
) -> list[ThoughtNode]:
if not nodes:
return []
thoughts_text = "\n".join(
f"[{i}] {n.thought}" for i, n in enumerate(nodes)
)
response = await self.llm.chat(messages=[{
"role": "user",
"content": (
f"Problem: {problem}\n\n"
f"Rate each reasoning step on how promising it is "
f"for solving the problem (0.0 to 1.0).\n\n"
f"{thoughts_text}\n\n"
f"Return JSON: [{{'index': 0, 'score': 0.8}}, ...]"
),
}])
import json
try:
scores = json.loads(response.content)
for entry in scores:
idx = entry["index"]
if idx < len(nodes):
nodes[idx].evaluation_score = entry["score"]
except (json.JSONDecodeError, KeyError):
for node in nodes:
node.evaluation_score = 0.5
return nodes
async def _is_solution(
self, problem: str, node: ThoughtNode
) -> bool:
path = self._trace_path(node)
path_text = "\n".join(p.thought for p in path)
response = await self.llm.chat(messages=[{
"role": "user",
"content": (
f"Problem: {problem}\n\n"
f"Reasoning path:\n{path_text}\n\n"
f"Does this reasoning path provide a complete, "
f"correct solution? Answer YES or NO."
),
}])
return "YES" in response.content.upper()
def _trace_path(
self, node: Optional[ThoughtNode]
) -> list[ThoughtNode]:
if node is None:
return []
path = [node]
# Walk up via parent_id tracking
# (simplified — production uses a node index)
return path
def _next_id(self) -> str:
self._node_counter += 1
return f"node_{self._node_counter}"
Choosing the Right Pattern
| Pattern | Latency | Cost | Best For |
|---|---|---|---|
| CoT | Low (1 LLM call) | Low | Math, logic, explainable reasoning |
| ReAct | Medium (3-10 calls) | Medium | Tasks requiring external data, multi-step workflows |
| ToT | High (10-50+ calls) | High | Creative problem-solving, planning, constraint satisfaction |
Use CoT when you need a single-pass reasoned answer and the model has sufficient knowledge to answer without external lookups.
Use ReAct when the agent needs to interact with tools, databases, or APIs to gather information before it can reason to an answer. This is the most common pattern for production agents.
Use ToT when the problem has multiple valid approaches and you want to explore several before committing. Creative tasks (writing, design), planning tasks (itinerary, project plan), and constraint satisfaction problems (scheduling, resource allocation) benefit most from ToT.
Combining Patterns
In practice, production agents often combine these patterns. A common architecture uses ReAct as the outer loop (gathering data through tools) with CoT as the inner reasoning mechanism (analyzing gathered data), and ToT for specific sub-problems that benefit from exploration.
class HybridReasoningAgent:
"""Combines ReAct (outer loop) with CoT/ToT (inner reasoning)."""
def __init__(self, react_agent, cot_agent, tot_agent):
self.react = react_agent
self.cot = cot_agent
self.tot = tot_agent
async def solve(self, problem: str) -> dict:
# Use ReAct to gather information
research_trace = await self.react.run(
f"Gather all relevant information for: {problem}"
)
gathered_info = "\n".join(
step.observation or ""
for step in research_trace.steps
if step.observation
)
# Classify problem complexity
complexity = await self._assess_complexity(
problem, gathered_info
)
# Route to appropriate reasoning strategy
if complexity == "simple":
result = await self.cot.reason(
f"{problem}\n\nContext: {gathered_info}"
)
return {"answer": result.final_answer, "method": "cot"}
else:
result = await self.tot.solve(
f"{problem}\n\nContext: {gathered_info}"
)
return {"answer": result["solution"], "method": "tot"}
async def _assess_complexity(
self, problem: str, context: str
) -> str:
response = await self.react.llm.chat(messages=[{
"role": "user",
"content": (
f"Is this problem simple (single clear answer) "
f"or complex (multiple approaches, trade-offs)?\n"
f"Problem: {problem}\n"
f"Answer: simple or complex"
),
}])
return response.content.strip().lower()
FAQ
How does Chain-of-Thought differ from just asking the model to explain its reasoning?
CoT is a structured prompting technique, not just asking for an explanation. The key difference is that CoT forces the model to reason step-by-step before producing the answer, which changes the answer itself. Post-hoc explanations (reasoning after the answer) can be rationalizations rather than genuine reasoning traces. With CoT, the intermediate steps causally influence the final output because the model generates them as part of the same forward pass.
Is ReAct just function calling with extra steps?
ReAct includes function calling but adds an explicit reasoning layer. Standard function calling lets the model decide which tool to call, but the reasoning is implicit (hidden in the model's weights). ReAct makes the reasoning explicit through the Thought step, which creates an auditable trace of why the agent chose each action. This is critical for debugging, compliance, and building trust in the agent's decisions.
How many tokens does Tree-of-Thought cost compared to standard prompting?
ToT typically uses 10-50x more tokens than a single prompt, because it generates multiple candidate thoughts at each depth level and evaluates each one. With a branching factor of 3 and max depth of 4, you might generate and evaluate 3 + 9 + 27 + 81 = 120 candidate thoughts. At 200 tokens per thought plus 100 tokens per evaluation, that is roughly 36,000 tokens — compared to perhaps 500 tokens for a single CoT chain. The cost is justified only when the problem genuinely benefits from exploration, such as planning or creative tasks.
Can you use these patterns with open-source models or do they require GPT-4 class models?
All three patterns work with smaller models, but effectiveness scales with model capability. CoT shows significant improvements starting from models with 7B+ parameters. ReAct requires reliable instruction-following and tool-use capability, which is available in models like Llama 3 70B and Mixtral 8x22B. ToT requires strong evaluation capability (the model must accurately judge which reasoning paths are promising), which currently works best with frontier models. For production deployments, consider using a smaller model for action execution and a larger model for evaluation and planning.
#AgentReasoning #ChainOfThought #ReAct #TreeOfThought #Planning #AIAgents #LLM
Written by
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.