Multi-Turn Reasoning: Building Agents That Think Across Multiple LLM Calls

Why Single-Call Reasoning Falls Short

A single LLM call operates within a fixed context window and produces output in a single forward pass. For simple tasks this is fine, but complex problems — analyzing a 50-page contract, debugging a multi-file codebase, or planning a multi-step research process — exceed what any model can reliably handle in one shot.

Multi-turn reasoning breaks complex problems into a sequence of focused LLM calls where each call builds on the accumulated understanding from previous calls. This mirrors how human experts work: they read, reflect, revise, and refine iteratively rather than attempting to produce a perfect answer on the first try.

The Core Pattern: Reason-Accumulate-Refine

The fundamental architecture for multi-turn reasoning involves three components: a reasoning step that analyzes a specific aspect of the problem, a state accumulator that captures key findings, and a refinement step that integrates new information with prior conclusions.

from dataclasses import dataclass, field
from openai import OpenAI

@dataclass
class ReasoningState:
    """Accumulated state across reasoning turns."""
    findings: list[str] = field(default_factory=list)
    uncertainties: list[str] = field(default_factory=list)
    conclusions: list[str] = field(default_factory=list)
    turn_count: int = 0

    def summary(self) -> str:
        parts = []
        if self.findings:
            parts.append("Findings:\n" + "\n".join(f"- {f}" for f in self.findings))
        if self.uncertainties:
            parts.append("Open questions:\n" + "\n".join(f"- {u}" for u in self.uncertainties))
        if self.conclusions:
            parts.append("Conclusions so far:\n" + "\n".join(f"- {c}" for c in self.conclusions))
        return "\n\n".join(parts)


def multi_turn_analyze(document: str, client: OpenAI, max_turns: int = 5) -> ReasoningState:
    """Analyze a document through multiple reasoning turns."""
    state = ReasoningState()
    chunks = split_into_sections(document)

    for i, chunk in enumerate(chunks[:max_turns]):
        state.turn_count += 1

        prompt = f"""You are analyzing a document section by section.

Previous analysis:
{state.summary() or "No prior analysis yet."}

Current section:
{chunk}

Provide: (1) new findings, (2) any uncertainties, (3) updated conclusions.
Return as JSON with keys: findings, uncertainties, conclusions (each a list of strings)."""

        response = client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            response_format={"type": "json_object"},
        )
        result = json.loads(response.choices[0].message.content)

        state.findings.extend(result.get("findings", []))
        state.uncertainties.extend(result.get("uncertainties", []))
        state.conclusions = result.get("conclusions", state.conclusions)

    return state

The most powerful multi-turn pattern is self-critique, where the agent reviews its own output and iteratively improves it. Each turn receives both the original task and the previous attempt, allowing the model to identify gaps, correct errors, and add nuance:

def refine_with_critique(
    task: str, client: OpenAI, max_refinements: int = 3
) -> str:
    """Generate an answer and refine it through self-critique."""
    # Initial generation
    messages = [{"role": "user", "content": task}]
    response = client.chat.completions.create(model="gpt-4", messages=messages)
    current_answer = response.choices[0].message.content

    for turn in range(max_refinements):
        critique_prompt = f"""Review this answer for accuracy, completeness, and clarity.

Original task: {task}

Current answer:
{current_answer}

List specific issues, then provide an improved version.
If the answer is already excellent, respond with exactly: SATISFACTORY"""

        critique_response = client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": critique_prompt}],
        )
        critique = critique_response.choices[0].message.content

        if "SATISFACTORY" in critique:
            break
        current_answer = critique  # the critique contains the improved version

    return current_answer

State Accumulation Strategies

How you accumulate state across turns significantly affects reasoning quality. Three common strategies:

Full history passes all previous LLM outputs into each subsequent call. This preserves maximum context but consumes tokens rapidly and may hit context limits.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

Summary compression periodically summarizes accumulated findings into a compact representation. This scales to many turns but risks losing nuanced details during summarization.

Structured extraction parses each LLM response into structured data (facts, entities, relationships) and reconstructs the context from this structured state. This is the most token-efficient and supports the most reasoning turns.

Knowing When to Stop

Multi-turn reasoning needs termination conditions. Without them, agents waste tokens refining already-good answers or loop indefinitely. Effective stopping criteria include convergence detection (consecutive turns produce no new findings), confidence thresholds (the model reports high confidence), and budget limits (maximum turns or token spend).

FAQ

How many reasoning turns should an agent use?

It depends on task complexity. Simple classification tasks rarely benefit from more than 2-3 turns. Complex analysis tasks like contract review or code audit may need 5-10 turns. Use convergence detection rather than a fixed turn count — stop when turns stop producing new insights.

Does multi-turn reasoning increase costs significantly?

Yes, each turn is a separate API call. However, the cost is often justified: a 3-turn refinement that produces a correct answer is cheaper than a single-turn answer that requires human correction. Use summary compression to keep per-turn token counts manageable.

How do I prevent the agent from contradicting its earlier reasoning?

Include a structured summary of prior conclusions in each turn's prompt and explicitly instruct the model to either build on or explicitly revise (with justification) its previous conclusions. The structured state approach makes contradictions easier to detect programmatically.

#MultiTurnReasoning #ReasoningChains #AgentArchitecture #StateManagement #AgenticAI #LearnAI #AIEngineering

Multi-Turn Reasoning: Building Agents That Think Across Multiple LLM Calls

Why Single-Call Reasoning Falls Short

The Core Pattern: Reason-Accumulate-Refine

Progressive Refinement: The Self-Critique Loop

State Accumulation Strategies

Knowing When to Stop

FAQ

How many reasoning turns should an agent use?

Does multi-turn reasoning increase costs significantly?

How do I prevent the agent from contradicting its earlier reasoning?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding