Building a Contract Review Agent: Clause Extraction, Risk Analysis, and Summary
Learn how to build an AI agent that parses legal contracts, extracts key clauses, scores risk levels, and generates executive summaries using Python and LLMs.
Why Contract Review Needs AI Agents
Legal teams spend thousands of hours each year reviewing contracts manually. A single commercial agreement can run 40 to 80 pages, and spotting a problematic indemnification clause buried on page 57 requires both concentration and domain expertise. AI agents can automate the repetitive extraction work, flag high-risk language, and produce structured summaries — letting attorneys focus on judgment calls rather than reading marathons.
In this tutorial, you will build a contract review agent that parses documents, identifies key clauses, assigns risk scores, and generates an executive summary.
Architecture Overview
The agent pipeline has four stages:
- Document Parsing — extract text from PDF or DOCX files
- Clause Extraction — identify and classify contract sections
- Risk Scoring — evaluate each clause against a risk rubric
- Summary Generation — produce a structured executive summary
Step 1: Document Parsing
Before the LLM can analyze anything, you need clean text. We use pdfplumber for PDFs and python-docx for Word documents.
import pdfplumber
from docx import Document
from pathlib import Path
def extract_text(file_path: str) -> str:
"""Extract text from PDF or DOCX files."""
path = Path(file_path)
if path.suffix.lower() == ".pdf":
with pdfplumber.open(path) as pdf:
pages = [page.extract_text() or "" for page in pdf.pages]
return "\n\n".join(pages)
elif path.suffix.lower() == ".docx":
doc = Document(path)
return "\n\n".join([p.text for p in doc.paragraphs if p.text.strip()])
else:
raise ValueError(f"Unsupported file type: {path.suffix}")
Step 2: Clause Extraction with an LLM
Once you have plain text, the agent identifies standard contract clauses. We define a structured output schema using Pydantic so the LLM returns machine-readable results.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
from pydantic import BaseModel
from openai import OpenAI
client = OpenAI()
class ContractClause(BaseModel):
clause_type: str # e.g., "Indemnification", "Termination"
text: str
section_number: str
class ClauseExtractionResult(BaseModel):
clauses: list[ContractClause]
parties: list[str]
effective_date: str
governing_law: str
def extract_clauses(contract_text: str) -> ClauseExtractionResult:
"""Use an LLM to identify and classify contract clauses."""
response = client.beta.chat.completions.parse(
model="gpt-4o",
messages=[
{
"role": "system",
"content": (
"You are a contract analysis expert. Extract all key "
"clauses from the contract, identify the parties, "
"effective date, and governing law."
),
},
{"role": "user", "content": contract_text},
],
response_format=ClauseExtractionResult,
)
return response.choices[0].message.parsed
Step 3: Risk Scoring
Each extracted clause gets evaluated against a risk rubric. The agent checks for common red flags like unlimited liability, unilateral termination rights, or broad IP assignment.
class ClauseRisk(BaseModel):
clause_type: str
risk_level: str # "low", "medium", "high", "critical"
risk_score: int # 1-10
concerns: list[str]
recommendation: str
class RiskReport(BaseModel):
clause_risks: list[ClauseRisk]
overall_risk_score: float
top_concerns: list[str]
RISK_RUBRIC = """
Score each clause from 1 (minimal risk) to 10 (severe risk):
- Indemnification: flag unlimited/uncapped liability
- Termination: flag unilateral termination without cure period
- IP Assignment: flag broad or perpetual IP transfers
- Limitation of Liability: flag exclusion of consequential damages
- Confidentiality: flag indefinite obligations or overly broad scope
- Non-compete: flag excessive duration or geographic scope
"""
def score_risks(clauses: ClauseExtractionResult) -> RiskReport:
"""Score risk for each extracted clause."""
clauses_text = "\n\n".join(
f"[{c.clause_type}] {c.text}" for c in clauses.clauses
)
response = client.beta.chat.completions.parse(
model="gpt-4o",
messages=[
{
"role": "system",
"content": f"You are a legal risk analyst.\n\n{RISK_RUBRIC}",
},
{"role": "user", "content": clauses_text},
],
response_format=RiskReport,
)
return response.choices[0].message.parsed
Step 4: Executive Summary Generation
The final stage produces a human-readable summary combining extraction results and risk findings.
def generate_summary(
clauses: ClauseExtractionResult, risks: RiskReport
) -> str:
"""Generate an executive summary of the contract review."""
context = (
f"Parties: {', '.join(clauses.parties)}\n"
f"Effective Date: {clauses.effective_date}\n"
f"Governing Law: {clauses.governing_law}\n"
f"Overall Risk Score: {risks.overall_risk_score}/10\n"
f"Top Concerns: {', '.join(risks.top_concerns)}\n\n"
"Clause Details:\n"
)
for cr in risks.clause_risks:
context += (
f"- {cr.clause_type}: Risk {cr.risk_score}/10 "
f"({cr.risk_level})\n"
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": (
"Write a concise executive summary of this contract "
"review for a general counsel. Highlight the top risks "
"and recommended actions."
),
},
{"role": "user", "content": context},
],
)
return response.choices[0].message.content
Putting It All Together
def review_contract(file_path: str) -> dict:
"""Full contract review pipeline."""
text = extract_text(file_path)
clauses = extract_clauses(text)
risks = score_risks(clauses)
summary = generate_summary(clauses, risks)
return {
"parties": clauses.parties,
"effective_date": clauses.effective_date,
"governing_law": clauses.governing_law,
"clauses_found": len(clauses.clauses),
"overall_risk": risks.overall_risk_score,
"top_concerns": risks.top_concerns,
"summary": summary,
}
result = review_contract("vendor_agreement.pdf")
print(f"Risk Score: {result['overall_risk']}/10")
print(f"Summary:\n{result['summary']}")
Production Considerations
When deploying a contract review agent, keep these points in mind. First, chunk long documents — contracts exceeding the context window should be split by section headers, processed individually, then merged. Second, add confidence scores — have the LLM output a confidence field for each extraction so reviewers know where to double-check. Third, maintain an audit trail — log every LLM call with the input hash, model version, and output for regulatory compliance.
FAQ
How accurate is LLM-based clause extraction compared to manual review?
Modern LLMs achieve 85-95% accuracy on standard clause identification in well-formatted contracts. However, accuracy drops on scanned documents with OCR artifacts or contracts with unusual formatting. Always pair AI extraction with human review for high-stakes agreements.
Can this agent handle contracts in languages other than English?
Yes. Models like GPT-4o support multilingual analysis. You would adjust the system prompt to specify the target language and update the risk rubric to reflect jurisdiction-specific legal standards.
How do you handle confidentiality when sending contracts to an LLM API?
Use API providers that offer data processing agreements and do not train on your data. For maximum security, deploy a self-hosted model like Llama or Mistral behind your firewall. Always redact personally identifiable information before sending documents to external APIs.
#ContractReview #LegalAI #NLP #RiskAnalysis #DocumentParsing #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.