Agent Reasoning and Planning: Chain-of-Thought, ReAct, and Tree-of-Thought Patterns

Why Reasoning Patterns Matter for Agents

A language model without a reasoning strategy is like a developer without a debugger — it can produce output, but it cannot systematically work through complex problems. When you ask an LLM to "find the cheapest flight from NYC to Tokyo with a layover in a city with good food," the model needs to decompose this into sub-problems, reason through constraints, take actions (search flights, evaluate cities), and synthesize results. Without an explicit reasoning pattern, the model will hallucinate an answer or give a superficial one.

Three reasoning patterns have emerged as the foundational approaches for building agents that can plan and execute multi-step tasks: Chain-of-Thought (CoT), ReAct (Reason + Act), and Tree-of-Thought (ToT). Each pattern has distinct strengths, computational costs, and ideal use cases.

Chain-of-Thought Prompting

Chain-of-Thought prompting forces the model to externalize its reasoning process step by step before arriving at an answer. Instead of jumping directly from question to answer, the model produces intermediate reasoning steps that we can inspect, debug, and build upon.

The Core Mechanism

The insight behind CoT is simple: when humans solve complex problems, they think through intermediate steps. Forcing an LLM to do the same improves accuracy on reasoning-heavy tasks by 20-60% depending on the task complexity and model size.

from dataclasses import dataclass
from typing import Optional

@dataclass
class CoTStep:
    step_number: int
    thought: str
    conclusion: Optional[str] = None

@dataclass
class CoTResult:
    steps: list[CoTStep]
    final_answer: str
    confidence: float

class ChainOfThoughtAgent:
    """Agent that uses explicit Chain-of-Thought reasoning."""

    COT_SYSTEM_PROMPT = """You are a reasoning agent. For every question:

1. Break the problem into logical steps
2. Think through each step explicitly
3. Show your reasoning before concluding
4. If you're uncertain, say so and explain why

Format your response as:
STEP 1: [thought]
STEP 2: [thought]
...
CONCLUSION: [final answer]
CONFIDENCE: [0.0-1.0]"""

    def __init__(self, llm_client):
        self.llm = llm_client

    async def reason(self, question: str) -> CoTResult:
        response = await self.llm.chat(messages=[
            {"role": "system", "content": self.COT_SYSTEM_PROMPT},
            {"role": "user", "content": question},
        ])

        return self._parse_cot_response(response.content)

    async def reason_with_verification(
        self, question: str
    ) -> CoTResult:
        """Two-pass CoT: reason, then verify the reasoning."""
        # Pass 1: Initial reasoning
        initial = await self.reason(question)

        # Pass 2: Verify each step
        verification_prompt = (
            f"Verify this reasoning step by step. "
            f"For each step, confirm it is logically valid or "
            f"identify the error.\n\n"
            f"Question: {question}\n\n"
            f"Reasoning:\n"
        )
        for step in initial.steps:
            verification_prompt += (
                f"Step {step.step_number}: {step.thought}\n"
            )
        verification_prompt += (
            f"\nConclusion: {initial.final_answer}"
        )

        verification = await self.llm.chat(messages=[
            {"role": "system", "content": self.COT_SYSTEM_PROMPT},
            {"role": "user", "content": verification_prompt},
        ])

        # If verification found errors, re-reason with corrections
        if "error" in verification.content.lower():
            corrected = await self.llm.chat(messages=[
                {"role": "system", "content": self.COT_SYSTEM_PROMPT},
                {
                    "role": "user",
                    "content": (
                        f"Original question: {question}\n\n"
                        f"Previous attempt had errors:\n"
                        f"{verification.content}\n\n"
                        f"Please re-reason from scratch, "
                        f"avoiding the identified errors."
                    ),
                },
            ])
            return self._parse_cot_response(corrected.content)

        return initial

    def _parse_cot_response(self, text: str) -> CoTResult:
        steps = []
        lines = text.strip().split("\n")
        final_answer = ""
        confidence = 0.5

        for line in lines:
            line = line.strip()
            if line.startswith("STEP"):
                parts = line.split(":", 1)
                if len(parts) == 2:
                    step_num = len(steps) + 1
                    steps.append(CoTStep(
                        step_number=step_num,
                        thought=parts[1].strip(),
                    ))
            elif line.startswith("CONCLUSION:"):
                final_answer = line.split(":", 1)[1].strip()
            elif line.startswith("CONFIDENCE:"):
                try:
                    confidence = float(
                        line.split(":", 1)[1].strip()
                    )
                except ValueError:
                    confidence = 0.5

        return CoTResult(
            steps=steps,
            final_answer=final_answer,
            confidence=confidence,
        )

When to Use Chain-of-Thought

CoT works best for:

Mathematical reasoning and word problems
Multi-step logical deductions
Tasks where showing work is as important as the answer (auditing, compliance)
Situations where you need to understand why the agent reached a particular conclusion

CoT is less effective for tasks requiring real-time interaction with external systems, because it reasons in one shot without the ability to gather new information mid-reasoning.

ReAct: Reason + Act

ReAct addresses CoT's biggest limitation: in the real world, reasoning alone is insufficient — agents need to take actions (search databases, call APIs, read files) and use the results to inform their next reasoning step. ReAct interleaves thinking with acting in a loop: Thought -> Action -> Observation -> Thought -> Action -> Observation -> ... -> Answer.

from dataclasses import dataclass, field
from typing import Any, Callable, Awaitable

@dataclass
class ReActStep:
    thought: str
    action: str | None = None
    action_input: dict | None = None
    observation: str | None = None

@dataclass
class ReActTrace:
    question: str
    steps: list[ReActStep] = field(default_factory=list)
    final_answer: str = ""
    total_tokens: int = 0

class ReActAgent:
    """Implements the ReAct (Reason + Act) pattern."""

    REACT_PROMPT = """You are a reasoning agent with access to tools.
For each step, you MUST follow this exact format:

Thought: [your reasoning about what to do next]
Action: [tool_name]
Action Input: [JSON arguments for the tool]

After receiving an observation, continue with another Thought.
When you have enough information to answer, use:
Thought: I now have enough information to answer.
Final Answer: [your answer]

AVAILABLE TOOLS:
{tool_descriptions}

IMPORTANT:
- Always think before acting
- Use tools to gather facts — never guess or assume
- If a tool returns an error, reason about alternatives
- Maximum {max_steps} steps before you must provide a Final Answer"""

    def __init__(
        self,
        llm_client,
        tools: dict[str, dict],
        max_steps: int = 10,
    ):
        self.llm = llm_client
        self.tools = tools
        self.max_steps = max_steps

    async def run(self, question: str) -> ReActTrace:
        trace = ReActTrace(question=question)
        tool_desc = self._format_tool_descriptions()

        messages = [
            {
                "role": "system",
                "content": self.REACT_PROMPT.format(
                    tool_descriptions=tool_desc,
                    max_steps=self.max_steps,
                ),
            },
            {"role": "user", "content": question},
        ]

        for step_num in range(self.max_steps):
            response = await self.llm.chat(
                messages=messages, stop=["Observation:"]
            )
            text = response.content.strip()

            step = self._parse_step(text)
            trace.steps.append(step)

            # Check if we have a final answer
            if "Final Answer:" in text:
                trace.final_answer = text.split(
                    "Final Answer:"
                )[1].strip()
                break

            # Execute the action if one was specified
            if step.action and step.action in self.tools:
                observation = await self._execute_tool(
                    step.action, step.action_input or {}
                )
                step.observation = str(observation)

                # Add the full step to conversation
                messages.append({
                    "role": "assistant",
                    "content": text,
                })
                messages.append({
                    "role": "user",
                    "content": f"Observation: {step.observation}",
                })
            elif step.action:
                # Unknown tool
                step.observation = (
                    f"Error: Tool '{step.action}' not found. "
                    f"Available tools: "
                    f"{', '.join(self.tools.keys())}"
                )
                messages.append({
                    "role": "assistant",
                    "content": text,
                })
                messages.append({
                    "role": "user",
                    "content": f"Observation: {step.observation}",
                })

        if not trace.final_answer:
            trace.final_answer = (
                "I was unable to reach a conclusion within "
                f"the maximum {self.max_steps} steps."
            )

        return trace

    async def _execute_tool(
        self, tool_name: str, args: dict
    ) -> Any:
        tool = self.tools[tool_name]
        fn = tool["function"]
        try:
            if asyncio.iscoroutinefunction(fn):
                return await fn(**args)
            return fn(**args)
        except Exception as e:
            return f"Tool error: {type(e).__name__}: {e}"

    def _parse_step(self, text: str) -> ReActStep:
        thought = ""
        action = None
        action_input = None

        for line in text.split("\n"):
            line = line.strip()
            if line.startswith("Thought:"):
                thought = line.split("Thought:", 1)[1].strip()
            elif line.startswith("Action:"):
                action = line.split("Action:", 1)[1].strip()
            elif line.startswith("Action Input:"):
                raw = line.split("Action Input:", 1)[1].strip()
                try:
                    import json
                    action_input = json.loads(raw)
                except (json.JSONDecodeError, ValueError):
                    action_input = {"input": raw}

        return ReActStep(
            thought=thought,
            action=action,
            action_input=action_input,
        )

    def _format_tool_descriptions(self) -> str:
        lines = []
        for name, tool in self.tools.items():
            desc = tool.get("description", "No description")
            params = tool.get("parameters", {})
            lines.append(f"- {name}: {desc}")
            if params:
                lines.append(f"  Parameters: {params}")
        return "\n".join(lines)

ReAct in Practice: A Research Agent

Here is a concrete example of ReAct in action. A research agent that answers questions by searching the web and reading documents:

import asyncio

# Define tools
research_tools = {
    "web_search": {
        "description": "Search the web for information",
        "parameters": {"query": "string"},
        "function": web_search_fn,
    },
    "read_url": {
        "description": "Read the content of a specific URL",
        "parameters": {"url": "string"},
        "function": read_url_fn,
    },
    "calculate": {
        "description": "Evaluate a mathematical expression",
        "parameters": {"expression": "string"},
        "function": lambda expression: eval(expression),
    },
}

agent = ReActAgent(
    llm_client=llm,
    tools=research_tools,
    max_steps=8,
)

# Example trace for: "What is the market cap of NVIDIA
# divided by the number of employees?"
#
# Thought: I need two pieces of data: NVIDIA market cap
#          and employee count. Let me search for both.
# Action: web_search
# Action Input: {"query": "NVIDIA market cap 2026"}
# Observation: NVIDIA market cap is approximately $3.2 trillion...
#
# Thought: Got the market cap. Now I need employee count.
# Action: web_search
# Action Input: {"query": "NVIDIA number of employees 2026"}
# Observation: NVIDIA has approximately 32,000 employees...
#
# Thought: Now I can calculate: $3.2T / 32,000 employees
# Action: calculate
# Action Input: {"expression": "3200000000000 / 32000"}
# Observation: 100000000.0
#
# Thought: I now have enough information to answer.
# Final Answer: NVIDIA's market cap per employee is
#               approximately $100 million.

The trace above illustrates the power of ReAct: each step combines reasoning (understanding what data is needed) with action (fetching that data), and observations inform subsequent reasoning.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

Tree-of-Thought: Branching Search

Tree-of-Thought (ToT) extends Chain-of-Thought from a single reasoning chain into a tree of possible reasoning paths. At each step, the model generates multiple candidate thoughts, evaluates which paths are most promising, and explores the best branches — potentially backtracking when a path leads to a dead end.

This is analogous to how a chess engine evaluates positions: instead of committing to one move sequence, it explores multiple lines and selects the most promising one.

from dataclasses import dataclass, field
from typing import Optional
import asyncio

@dataclass
class ThoughtNode:
    id: str
    depth: int
    thought: str
    evaluation_score: float = 0.0
    children: list["ThoughtNode"] = field(default_factory=list)
    parent_id: Optional[str] = None
    is_solution: bool = False

class TreeOfThoughtAgent:
    """Implements Tree-of-Thought reasoning with breadth-first
    or best-first search."""

    def __init__(
        self,
        llm_client,
        branching_factor: int = 3,
        max_depth: int = 4,
        search_strategy: str = "best_first",
    ):
        self.llm = llm_client
        self.branching_factor = branching_factor
        self.max_depth = max_depth
        self.search_strategy = search_strategy
        self._node_counter = 0

    async def solve(self, problem: str) -> dict:
        root = ThoughtNode(
            id=self._next_id(),
            depth=0,
            thought=f"Problem: {problem}",
        )

        if self.search_strategy == "best_first":
            solution = await self._best_first_search(root, problem)
        else:
            solution = await self._breadth_first_search(
                root, problem
            )

        return {
            "solution": solution.thought if solution else "No solution found",
            "path": self._trace_path(solution) if solution else [],
            "nodes_explored": self._node_counter,
        }

    async def _best_first_search(
        self, root: ThoughtNode, problem: str
    ) -> Optional[ThoughtNode]:
        frontier = [root]

        while frontier:
            # Sort by evaluation score (highest first)
            frontier.sort(
                key=lambda n: n.evaluation_score, reverse=True
            )
            current = frontier.pop(0)

            if current.depth >= self.max_depth:
                continue

            # Generate candidate next thoughts
            candidates = await self._generate_thoughts(
                problem, current
            )

            # Evaluate each candidate
            evaluated = await self._evaluate_thoughts(
                problem, candidates
            )

            for node in evaluated:
                current.children.append(node)

                # Check if this is a solution
                if await self._is_solution(problem, node):
                    node.is_solution = True
                    return node

                frontier.append(node)

        return None

    async def _breadth_first_search(
        self, root: ThoughtNode, problem: str
    ) -> Optional[ThoughtNode]:
        queue = [root]

        while queue:
            current_level = queue[:]
            queue.clear()

            for node in current_level:
                if node.depth >= self.max_depth:
                    continue

                candidates = await self._generate_thoughts(
                    problem, node
                )
                evaluated = await self._evaluate_thoughts(
                    problem, candidates
                )

                # Only keep the top-k candidates at each level
                top_k = sorted(
                    evaluated,
                    key=lambda n: n.evaluation_score,
                    reverse=True,
                )[: self.branching_factor]

                for child in top_k:
                    node.children.append(child)
                    if await self._is_solution(problem, child):
                        child.is_solution = True
                        return child
                    queue.append(child)

        return None

    async def _generate_thoughts(
        self, problem: str, parent: ThoughtNode
    ) -> list[ThoughtNode]:
        path = self._trace_path(parent)
        path_text = "\n".join(
            f"Step {i+1}: {p.thought}" for i, p in enumerate(path)
        )

        response = await self.llm.chat(messages=[{
            "role": "user",
            "content": (
                f"Problem: {problem}\n\n"
                f"Reasoning so far:\n{path_text}\n\n"
                f"Generate {self.branching_factor} distinct possible "
                f"next reasoning steps. Each should be a different "
                f"approach or angle.\n"
                f"Format: one step per line, prefixed with "
                f"THOUGHT 1:, THOUGHT 2:, etc."
            ),
        }])

        thoughts = []
        for line in response.content.strip().split("\n"):
            line = line.strip()
            if line.startswith("THOUGHT"):
                content = line.split(":", 1)[1].strip()
                thoughts.append(ThoughtNode(
                    id=self._next_id(),
                    depth=parent.depth + 1,
                    thought=content,
                    parent_id=parent.id,
                ))

        return thoughts[:self.branching_factor]

    async def _evaluate_thoughts(
        self, problem: str, nodes: list[ThoughtNode]
    ) -> list[ThoughtNode]:
        if not nodes:
            return []

        thoughts_text = "\n".join(
            f"[{i}] {n.thought}" for i, n in enumerate(nodes)
        )

        response = await self.llm.chat(messages=[{
            "role": "user",
            "content": (
                f"Problem: {problem}\n\n"
                f"Rate each reasoning step on how promising it is "
                f"for solving the problem (0.0 to 1.0).\n\n"
                f"{thoughts_text}\n\n"
                f"Return JSON: [{{'index': 0, 'score': 0.8}}, ...]"
            ),
        }])

        import json
        try:
            scores = json.loads(response.content)
            for entry in scores:
                idx = entry["index"]
                if idx < len(nodes):
                    nodes[idx].evaluation_score = entry["score"]
        except (json.JSONDecodeError, KeyError):
            for node in nodes:
                node.evaluation_score = 0.5

        return nodes

    async def _is_solution(
        self, problem: str, node: ThoughtNode
    ) -> bool:
        path = self._trace_path(node)
        path_text = "\n".join(p.thought for p in path)

        response = await self.llm.chat(messages=[{
            "role": "user",
            "content": (
                f"Problem: {problem}\n\n"
                f"Reasoning path:\n{path_text}\n\n"
                f"Does this reasoning path provide a complete, "
                f"correct solution? Answer YES or NO."
            ),
        }])
        return "YES" in response.content.upper()

    def _trace_path(
        self, node: Optional[ThoughtNode]
    ) -> list[ThoughtNode]:
        if node is None:
            return []
        path = [node]
        # Walk up via parent_id tracking
        # (simplified — production uses a node index)
        return path

    def _next_id(self) -> str:
        self._node_counter += 1
        return f"node_{self._node_counter}"

Choosing the Right Pattern

Pattern	Latency	Cost	Best For
CoT	Low (1 LLM call)	Low	Math, logic, explainable reasoning
ReAct	Medium (3-10 calls)	Medium	Tasks requiring external data, multi-step workflows
ToT	High (10-50+ calls)	High	Creative problem-solving, planning, constraint satisfaction

Use CoT when you need a single-pass reasoned answer and the model has sufficient knowledge to answer without external lookups.

Use ReAct when the agent needs to interact with tools, databases, or APIs to gather information before it can reason to an answer. This is the most common pattern for production agents.

Use ToT when the problem has multiple valid approaches and you want to explore several before committing. Creative tasks (writing, design), planning tasks (itinerary, project plan), and constraint satisfaction problems (scheduling, resource allocation) benefit most from ToT.

Combining Patterns

In practice, production agents often combine these patterns. A common architecture uses ReAct as the outer loop (gathering data through tools) with CoT as the inner reasoning mechanism (analyzing gathered data), and ToT for specific sub-problems that benefit from exploration.

class HybridReasoningAgent:
    """Combines ReAct (outer loop) with CoT/ToT (inner reasoning)."""

    def __init__(self, react_agent, cot_agent, tot_agent):
        self.react = react_agent
        self.cot = cot_agent
        self.tot = tot_agent

    async def solve(self, problem: str) -> dict:
        # Use ReAct to gather information
        research_trace = await self.react.run(
            f"Gather all relevant information for: {problem}"
        )

        gathered_info = "\n".join(
            step.observation or ""
            for step in research_trace.steps
            if step.observation
        )

        # Classify problem complexity
        complexity = await self._assess_complexity(
            problem, gathered_info
        )

        # Route to appropriate reasoning strategy
        if complexity == "simple":
            result = await self.cot.reason(
                f"{problem}\n\nContext: {gathered_info}"
            )
            return {"answer": result.final_answer, "method": "cot"}
        else:
            result = await self.tot.solve(
                f"{problem}\n\nContext: {gathered_info}"
            )
            return {"answer": result["solution"], "method": "tot"}

    async def _assess_complexity(
        self, problem: str, context: str
    ) -> str:
        response = await self.react.llm.chat(messages=[{
            "role": "user",
            "content": (
                f"Is this problem simple (single clear answer) "
                f"or complex (multiple approaches, trade-offs)?\n"
                f"Problem: {problem}\n"
                f"Answer: simple or complex"
            ),
        }])
        return response.content.strip().lower()

FAQ

How does Chain-of-Thought differ from just asking the model to explain its reasoning?

CoT is a structured prompting technique, not just asking for an explanation. The key difference is that CoT forces the model to reason step-by-step before producing the answer, which changes the answer itself. Post-hoc explanations (reasoning after the answer) can be rationalizations rather than genuine reasoning traces. With CoT, the intermediate steps causally influence the final output because the model generates them as part of the same forward pass.

Is ReAct just function calling with extra steps?

ReAct includes function calling but adds an explicit reasoning layer. Standard function calling lets the model decide which tool to call, but the reasoning is implicit (hidden in the model's weights). ReAct makes the reasoning explicit through the Thought step, which creates an auditable trace of why the agent chose each action. This is critical for debugging, compliance, and building trust in the agent's decisions.

How many tokens does Tree-of-Thought cost compared to standard prompting?

ToT typically uses 10-50x more tokens than a single prompt, because it generates multiple candidate thoughts at each depth level and evaluates each one. With a branching factor of 3 and max depth of 4, you might generate and evaluate 3 + 9 + 27 + 81 = 120 candidate thoughts. At 200 tokens per thought plus 100 tokens per evaluation, that is roughly 36,000 tokens — compared to perhaps 500 tokens for a single CoT chain. The cost is justified only when the problem genuinely benefits from exploration, such as planning or creative tasks.

Can you use these patterns with open-source models or do they require GPT-4 class models?

All three patterns work with smaller models, but effectiveness scales with model capability. CoT shows significant improvements starting from models with 7B+ parameters. ReAct requires reliable instruction-following and tool-use capability, which is available in models like Llama 3 70B and Mixtral 8x22B. ToT requires strong evaluation capability (the model must accurately judge which reasoning paths are promising), which currently works best with frontier models. For production deployments, consider using a smaller model for action execution and a larger model for evaluation and planning.

#AgentReasoning #ChainOfThought #ReAct #TreeOfThought #Planning #AIAgents #LLM