Multi-Turn Reasoning: Building Agents That Think Across Multiple LLM Calls
Learn how to architect agents that maintain reasoning chains across multiple LLM invocations, accumulate state progressively, and refine their analysis through iterative thinking.
Why Single-Call Reasoning Falls Short
A single LLM call operates within a fixed context window and produces output in a single forward pass. For simple tasks this is fine, but complex problems — analyzing a 50-page contract, debugging a multi-file codebase, or planning a multi-step research process — exceed what any model can reliably handle in one shot.
Multi-turn reasoning breaks complex problems into a sequence of focused LLM calls where each call builds on the accumulated understanding from previous calls. This mirrors how human experts work: they read, reflect, revise, and refine iteratively rather than attempting to produce a perfect answer on the first try.
The Core Pattern: Reason-Accumulate-Refine
The fundamental architecture for multi-turn reasoning involves three components: a reasoning step that analyzes a specific aspect of the problem, a state accumulator that captures key findings, and a refinement step that integrates new information with prior conclusions.
from dataclasses import dataclass, field
from openai import OpenAI
@dataclass
class ReasoningState:
"""Accumulated state across reasoning turns."""
findings: list[str] = field(default_factory=list)
uncertainties: list[str] = field(default_factory=list)
conclusions: list[str] = field(default_factory=list)
turn_count: int = 0
def summary(self) -> str:
parts = []
if self.findings:
parts.append("Findings:\n" + "\n".join(f"- {f}" for f in self.findings))
if self.uncertainties:
parts.append("Open questions:\n" + "\n".join(f"- {u}" for u in self.uncertainties))
if self.conclusions:
parts.append("Conclusions so far:\n" + "\n".join(f"- {c}" for c in self.conclusions))
return "\n\n".join(parts)
def multi_turn_analyze(document: str, client: OpenAI, max_turns: int = 5) -> ReasoningState:
"""Analyze a document through multiple reasoning turns."""
state = ReasoningState()
chunks = split_into_sections(document)
for i, chunk in enumerate(chunks[:max_turns]):
state.turn_count += 1
prompt = f"""You are analyzing a document section by section.
Previous analysis:
{state.summary() or "No prior analysis yet."}
Current section:
{chunk}
Provide: (1) new findings, (2) any uncertainties, (3) updated conclusions.
Return as JSON with keys: findings, uncertainties, conclusions (each a list of strings)."""
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
response_format={"type": "json_object"},
)
result = json.loads(response.choices[0].message.content)
state.findings.extend(result.get("findings", []))
state.uncertainties.extend(result.get("uncertainties", []))
state.conclusions = result.get("conclusions", state.conclusions)
return state
Progressive Refinement: The Self-Critique Loop
The most powerful multi-turn pattern is self-critique, where the agent reviews its own output and iteratively improves it. Each turn receives both the original task and the previous attempt, allowing the model to identify gaps, correct errors, and add nuance:
def refine_with_critique(
task: str, client: OpenAI, max_refinements: int = 3
) -> str:
"""Generate an answer and refine it through self-critique."""
# Initial generation
messages = [{"role": "user", "content": task}]
response = client.chat.completions.create(model="gpt-4", messages=messages)
current_answer = response.choices[0].message.content
for turn in range(max_refinements):
critique_prompt = f"""Review this answer for accuracy, completeness, and clarity.
Original task: {task}
Current answer:
{current_answer}
List specific issues, then provide an improved version.
If the answer is already excellent, respond with exactly: SATISFACTORY"""
critique_response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": critique_prompt}],
)
critique = critique_response.choices[0].message.content
if "SATISFACTORY" in critique:
break
current_answer = critique # the critique contains the improved version
return current_answer
State Accumulation Strategies
How you accumulate state across turns significantly affects reasoning quality. Three common strategies:
Full history passes all previous LLM outputs into each subsequent call. This preserves maximum context but consumes tokens rapidly and may hit context limits.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
Summary compression periodically summarizes accumulated findings into a compact representation. This scales to many turns but risks losing nuanced details during summarization.
Structured extraction parses each LLM response into structured data (facts, entities, relationships) and reconstructs the context from this structured state. This is the most token-efficient and supports the most reasoning turns.
Knowing When to Stop
Multi-turn reasoning needs termination conditions. Without them, agents waste tokens refining already-good answers or loop indefinitely. Effective stopping criteria include convergence detection (consecutive turns produce no new findings), confidence thresholds (the model reports high confidence), and budget limits (maximum turns or token spend).
FAQ
How many reasoning turns should an agent use?
It depends on task complexity. Simple classification tasks rarely benefit from more than 2-3 turns. Complex analysis tasks like contract review or code audit may need 5-10 turns. Use convergence detection rather than a fixed turn count — stop when turns stop producing new insights.
Does multi-turn reasoning increase costs significantly?
Yes, each turn is a separate API call. However, the cost is often justified: a 3-turn refinement that produces a correct answer is cheaper than a single-turn answer that requires human correction. Use summary compression to keep per-turn token counts manageable.
How do I prevent the agent from contradicting its earlier reasoning?
Include a structured summary of prior conclusions in each turn's prompt and explicitly instruct the model to either build on or explicitly revise (with justification) its previous conclusions. The structured state approach makes contradictions easier to detect programmatically.
#MultiTurnReasoning #ReasoningChains #AgentArchitecture #StateManagement #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.