Building a Legal Reasoning Agent: Multi-Step Argument Construction with Evidence

Why Legal Reasoning Is Hard for AI

Legal reasoning is fundamentally different from factual Q&A. A lawyer does not just retrieve facts — they construct arguments. Each argument has a claim, supporting evidence, a legal basis (statutes or precedent), and must withstand counter-arguments. This multi-step, adversarial structure makes legal reasoning an excellent test case for advanced agent architectures.

This tutorial builds a legal reasoning agent that can analyze a legal question, search for relevant precedents, construct structured arguments, and generate counter-arguments — all while maintaining proper evidence chains.

The Argument Data Model

Legal arguments have a recursive structure: claims are supported by evidence, which may themselves be claims requiring further support.

from pydantic import BaseModel
from enum import Enum

class EvidenceType(str, Enum):
    STATUTE = "statute"
    CASE_LAW = "case_law"
    REGULATION = "regulation"
    EXPERT_OPINION = "expert_opinion"
    FACTUAL = "factual"

class Evidence(BaseModel):
    source: str
    content: str
    evidence_type: EvidenceType
    relevance_score: float  # 0.0 to 1.0
    citation: str

class LegalArgument(BaseModel):
    claim: str
    supporting_evidence: list[Evidence]
    reasoning_chain: list[str]  # step-by-step logic
    strength: float  # 0.0 to 1.0
    counter_arguments: list["LegalArgument"] = []

class LegalAnalysis(BaseModel):
    question: str
    arguments_for: list[LegalArgument]
    arguments_against: list[LegalArgument]
    conclusion: str
    confidence: float

Precedent Search

The agent needs a way to find relevant legal precedents. In production this would hit a legal database API (Westlaw, LexisNexis). Here we simulate it with a structured retrieval pattern:

from openai import OpenAI
import json

client = OpenAI()

def search_precedents(legal_issue: str, jurisdiction: str = "US Federal") -> list[Evidence]:
    """Search for relevant legal precedents."""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": (
                "You are a legal research assistant. Given a legal issue, "
                "identify the most relevant cases, statutes, and regulations. "
                "For each, provide the citation, key holding, and relevance. "
                "Return JSON array of evidence objects."
            )},
            {"role": "user", "content": (
                f"Legal issue: {legal_issue}\n"
                f"Jurisdiction: {jurisdiction}\n"
                "Find 3-5 most relevant precedents."
            )},
        ],
        response_format={"type": "json_object"},
    )
    data = json.loads(response.choices[0].message.content)
    return [Evidence(**e) for e in data.get("evidence", [])]

Multi-Step Argument Construction

The argument builder works in three phases: (1) identify possible claims, (2) gather evidence for each, (3) construct the reasoning chain connecting evidence to claim.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

def construct_argument(
    claim: str,
    evidence: list[Evidence],
    legal_question: str,
) -> LegalArgument:
    """Build a structured legal argument from claim and evidence."""
    evidence_summary = "\n".join(
        f"[{e.evidence_type.value}] {e.citation}: {e.content}"
        for e in evidence
    )

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": """You are a legal reasoning agent.
Construct a rigorous legal argument by:
1. Stating the claim clearly
2. Building a step-by-step reasoning chain from evidence to claim
3. Each step must cite specific evidence
4. Assess the overall strength of the argument (0.0-1.0)
5. Identify the weakest link in the reasoning chain

Return JSON with: reasoning_chain (list of steps), strength (float)."""},
            {"role": "user", "content": (
                f"Legal question: {legal_question}\n"
                f"Claim to support: {claim}\n"
                f"Available evidence:\n{evidence_summary}"
            )},
        ],
        response_format={"type": "json_object"},
    )
    data = json.loads(response.choices[0].message.content)
    return LegalArgument(
        claim=claim,
        supporting_evidence=evidence,
        reasoning_chain=data["reasoning_chain"],
        strength=data["strength"],
    )

Counter-Argument Generation

A good legal analysis must address opposing views. The counter-argument generator takes an existing argument and attacks it:

def generate_counter_arguments(
    argument: LegalArgument,
    legal_question: str,
) -> list[LegalArgument]:
    """Generate counter-arguments that challenge the given argument."""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": """You are an opposing counsel.
Your job is to find flaws in the given argument and construct counter-arguments.
Attack strategies:
- Distinguish cited cases on facts
- Challenge the reasoning chain logic
- Cite conflicting precedent
- Argue policy implications
Return 2-3 counter-arguments as JSON."""},
            {"role": "user", "content": (
                f"Question: {legal_question}\n"
                f"Argument to counter:\n"
                f"Claim: {argument.claim}\n"
                f"Reasoning: {argument.reasoning_chain}"
            )},
        ],
        response_format={"type": "json_object"},
    )
    data = json.loads(response.choices[0].message.content)
    counters = []
    for c in data.get("counter_arguments", []):
        counters.append(LegalArgument(
            claim=c["claim"],
            supporting_evidence=[],
            reasoning_chain=c["reasoning_chain"],
            strength=c["strength"],
        ))
    return counters

The Full Analysis Pipeline

def analyze_legal_question(question: str) -> LegalAnalysis:
    # 1. Search for relevant precedents
    evidence = search_precedents(question)

    # 2. Identify claims for and against
    claims = identify_claims(question, evidence)

    # 3. Construct arguments for each side
    args_for = [construct_argument(c, evidence, question) for c in claims["for"]]
    args_against = [construct_argument(c, evidence, question) for c in claims["against"]]

    # 4. Generate counter-arguments
    for arg in args_for:
        arg.counter_arguments = generate_counter_arguments(arg, question)

    # 5. Synthesize conclusion
    conclusion = synthesize_conclusion(question, args_for, args_against)

    return LegalAnalysis(
        question=question,
        arguments_for=args_for,
        arguments_against=args_against,
        conclusion=conclusion,
        confidence=0.7,
    )

Important Disclaimers

This agent is a reasoning tool, not a replacement for licensed attorneys. It cannot guarantee legal accuracy, may miss jurisdiction-specific nuances, and should never be the sole basis for legal decisions.

FAQ

How do you ensure the agent cites real cases?

In production, connect the precedent search to a real legal database API. When using LLM-generated citations, always flag them as "AI-generated — verify before citing" and implement a validation step against a case law database.

Can this handle multiple jurisdictions?

Yes, by parameterizing the precedent search with jurisdiction and instructing the reasoning agent to consider jurisdictional differences. Multi-jurisdiction analysis requires separate evidence gathering for each jurisdiction and explicit conflict-of-law analysis.

How do you evaluate argument quality?

Use a separate evaluator agent that scores arguments on: logical validity (does the conclusion follow from the premises?), evidence quality (are sources authoritative and relevant?), and completeness (are there obvious gaps in the reasoning chain?).

#LegalAI #LegalReasoning #ArgumentConstruction #EvidenceChains #AgenticAI #PythonAI #AIForLaw #ReasoningAgents

Building a Legal Reasoning Agent: Multi-Step Argument Construction with Evidence

Why Legal Reasoning Is Hard for AI

The Argument Data Model

Precedent Search

Multi-Step Argument Construction

Counter-Argument Generation

The Full Analysis Pipeline

Important Disclaimers

FAQ

How do you ensure the agent cites real cases?

Can this handle multiple jurisdictions?

How do you evaluate argument quality?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding