Skip to content
Learn Agentic AI13 min read0 views

AI Agent for Document Generation: Contracts, Proposals, and Reports on Demand

Build an AI agent that generates professional documents like contracts, proposals, and reports by combining template engines, dynamic data injection, and PDF rendering with version tracking.

From Manual Documents to Automated Generation

Every business produces documents: contracts for new clients, proposals for deals, weekly reports for stakeholders, and invoices for accounting. These documents follow consistent templates but require unique data for each instance. A document generation agent combines template engines for structure, LLM reasoning for dynamic content, and PDF rendering for professional output.

This guide walks through building a complete document generation agent that accepts structured data, fills templates, generates custom sections with AI, renders PDFs, and tracks versions.

Defining Document Templates

We use Jinja2 as the template engine. Each template is an HTML file with placeholders for dynamic data. HTML-to-PDF conversion produces professional output with CSS styling:

from jinja2 import Environment, FileSystemLoader
from dataclasses import dataclass, field
from datetime import datetime
from typing import Any

@dataclass
class DocumentTemplate:
    name: str
    template_file: str
    required_fields: list[str]
    ai_sections: list[str] = field(default_factory=list)

TEMPLATES = {
    "contract": DocumentTemplate(
        name="Service Agreement",
        template_file="contract.html",
        required_fields=["client_name", "client_address", "service_description",
                         "start_date", "end_date", "total_amount"],
        ai_sections=["scope_of_work", "termination_clause"],
    ),
    "proposal": DocumentTemplate(
        name="Business Proposal",
        template_file="proposal.html",
        required_fields=["prospect_name", "company", "problem_statement",
                         "budget_range"],
        ai_sections=["executive_summary", "proposed_solution", "timeline"],
    ),
    "report": DocumentTemplate(
        name="Weekly Report",
        template_file="report.html",
        required_fields=["team_name", "week_start", "metrics", "highlights"],
        ai_sections=["analysis", "recommendations"],
    ),
}

env = Environment(loader=FileSystemLoader("templates"))

Each template declares which fields are required from the user and which sections should be generated by the AI. This separation keeps humans in control of factual data while delegating narrative writing to the LLM.

Generating AI-Powered Sections

The agent generates document sections based on the structured data and the document type. Each section gets a targeted prompt:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

from openai import OpenAI

client = OpenAI()

SECTION_PROMPTS = {
    "executive_summary": (
        "Write a concise executive summary for a business proposal. "
        "Focus on the client's problem and why our solution is the best fit. "
        "Keep it under 150 words. Use a professional but approachable tone."
    ),
    "proposed_solution": (
        "Describe the proposed solution in detail. Include methodology, "
        "deliverables, and key differentiators. Use bullet points for clarity."
    ),
    "scope_of_work": (
        "Write a clear scope of work clause for a service agreement. "
        "Be specific about what is included and what is excluded."
    ),
    "termination_clause": (
        "Write a standard termination clause. Include notice period, "
        "grounds for termination, and obligations upon termination."
    ),
    "analysis": (
        "Analyze the metrics and highlights provided. Identify trends, "
        "areas of concern, and positive developments."
    ),
    "recommendations": (
        "Based on the analysis, provide 3-5 actionable recommendations "
        "for the next week. Be specific and prioritized."
    ),
    "timeline": (
        "Create a realistic project timeline with milestones. "
        "Include discovery, implementation, testing, and launch phases."
    ),
}

def generate_section(section_name: str, context: dict[str, Any]) -> str:
    """Generate a document section using an LLM."""
    prompt = SECTION_PROMPTS.get(section_name, f"Write the {section_name} section.")
    context_str = "\n".join(f"{k}: {v}" for k, v in context.items())

    response = client.chat.completions.create(
        model="gpt-4o",
        temperature=0.4,
        messages=[
            {"role": "system", "content": prompt},
            {"role": "user", "content": f"Document context:\n{context_str}"},
        ],
    )
    return response.choices[0].message.content

Building the Document Assembly Pipeline

The assembly pipeline validates input data, generates AI sections, renders the template, and produces a PDF:

import hashlib
import json

@dataclass
class GeneratedDocument:
    template_name: str
    html_content: str
    data: dict[str, Any]
    version_hash: str
    created_at: str

def assemble_document(template_key: str, data: dict[str, Any]) -> GeneratedDocument:
    """Assemble a complete document from template and data."""
    template_def = TEMPLATES[template_key]

    # Validate required fields
    missing = [f for f in template_def.required_fields if f not in data]
    if missing:
        raise ValueError(f"Missing required fields: {missing}")

    # Generate AI sections
    for section in template_def.ai_sections:
        if section not in data:
            data[section] = generate_section(section, data)

    # Render HTML template
    template = env.get_template(template_def.template_file)
    html = template.render(**data, generated_date=datetime.now().strftime("%B %d, %Y"))

    # Compute version hash for tracking
    content_hash = hashlib.sha256(json.dumps(data, sort_keys=True).encode()).hexdigest()[:12]

    return GeneratedDocument(
        template_name=template_def.name,
        html_content=html,
        data=data,
        version_hash=content_hash,
        created_at=datetime.now().isoformat(),
    )

Rendering PDFs with WeasyPrint

WeasyPrint converts HTML with CSS directly to PDF. It handles page breaks, headers, footers, and professional typography:

from weasyprint import HTML
from pathlib import Path

def render_pdf(document: GeneratedDocument, output_dir: str = "output") -> str:
    """Render an assembled document to PDF."""
    Path(output_dir).mkdir(exist_ok=True)
    filename = (
        f"{document.template_name.replace(' ', '_').lower()}"
        f"_{document.version_hash}.pdf"
    )
    filepath = Path(output_dir) / filename

    HTML(string=document.html_content).write_pdf(str(filepath))
    return str(filepath)

Version Tracking and Storage

Every generated document is tracked with its input data, version hash, and metadata. This enables auditing and regeneration:

import sqlite3

def init_db(db_path: str = "documents.db"):
    conn = sqlite3.connect(db_path)
    conn.execute("""
        CREATE TABLE IF NOT EXISTS documents (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            template_name TEXT NOT NULL,
            version_hash TEXT NOT NULL,
            input_data TEXT NOT NULL,
            pdf_path TEXT,
            created_at TEXT NOT NULL
        )
    """)
    conn.commit()
    return conn

def save_document_record(conn: sqlite3.Connection, doc: GeneratedDocument, pdf_path: str):
    conn.execute(
        "INSERT INTO documents (template_name, version_hash, input_data, pdf_path, created_at) "
        "VALUES (?, ?, ?, ?, ?)",
        (doc.template_name, doc.version_hash, json.dumps(doc.data), pdf_path, doc.created_at),
    )
    conn.commit()

FAQ

Never deploy AI-generated legal text without lawyer review. Use the AI to generate first drafts based on your approved clause library, then flag all AI-generated sections for human review. Store approved clause variants as few-shot examples in your prompts to improve consistency.

Can I add custom branding like logos and company colors?

Yes. The HTML templates support full CSS including custom fonts, colors, and embedded images. Use base64-encoded images in the template or reference files in the templates directory. WeasyPrint handles CSS print media queries for page-specific styling.

How do I handle document revisions and track changes?

Store each version with its input data and version hash. To show changes between versions, diff the rendered HTML or the input data dictionaries. The version hash changes whenever any input field changes, making it easy to detect modifications.


#DocumentGeneration #AIAgents #PDFGeneration #TemplateEngine #WorkflowAutomation #Python #AgenticAI #LearnAI #AIEngineering

Share this article
C

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.