Receipt and Invoice Processing with Vision AI: End-to-End Expense Automation

Why Receipt Processing Is Harder Than It Looks

Receipts and invoices come in hundreds of formats. A grocery store receipt is a narrow thermal printout. A SaaS invoice is a polished PDF. A contractor invoice might be a handwritten note on letterhead. Despite this variety, your accounting system needs the same structured data from all of them: vendor, date, line items, tax, total, and payment method.

Vision AI agents solve this by combining OCR with LLM-powered understanding. The OCR reads the text; the LLM understands the semantic meaning of each field regardless of layout or format.

Defining the Data Model

Start with a clear schema for what you want to extract:

from pydantic import BaseModel, Field
from datetime import date
from enum import Enum


class ExpenseCategory(str, Enum):
    MEALS = "meals"
    TRAVEL = "travel"
    OFFICE = "office_supplies"
    SOFTWARE = "software"
    UTILITIES = "utilities"
    EQUIPMENT = "equipment"
    OTHER = "other"


class LineItem(BaseModel):
    description: str
    quantity: float = 1.0
    unit_price: float
    total: float


class ReceiptData(BaseModel):
    vendor_name: str
    vendor_address: str | None = None
    receipt_date: date | None = None
    currency: str = "USD"
    line_items: list[LineItem] = []
    subtotal: float | None = None
    tax_amount: float | None = None
    tip_amount: float | None = None
    total: float
    payment_method: str | None = None
    category: ExpenseCategory = ExpenseCategory.OTHER
    confidence: float = Field(ge=0.0, le=1.0)

The Receipt Scanning Pipeline

The pipeline reads an image, runs OCR, sends the text to an LLM for field extraction, and validates the results:

import pytesseract
from PIL import Image
from openai import OpenAI
import json


def scan_receipt(image_path: str) -> str:
    """Extract raw text from a receipt image."""
    img = Image.open(image_path)

    # Receipts are often narrow, so set page segmentation accordingly
    custom_config = r"--oem 3 --psm 4"
    text = pytesseract.image_to_string(img, config=custom_config)

    return text


def extract_receipt_fields(raw_text: str) -> ReceiptData:
    """Use an LLM to extract structured fields from receipt text."""
    client = OpenAI()

    response = client.beta.chat.completions.parse(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": (
                "You are a receipt processing expert. Extract structured "
                "data from the following receipt text. Identify all line "
                "items, totals, tax, vendor info, and payment method. "
                "Assign a confidence score from 0 to 1 based on how "
                "clearly the information could be read."
            )},
            {"role": "user", "content": raw_text},
        ],
        response_format=ReceiptData,
    )

    return response.choices[0].message.parsed

Expense Categorization

Use keyword matching as a fast first pass, then fall back to LLM classification for ambiguous cases:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

CATEGORY_KEYWORDS = {
    ExpenseCategory.MEALS: [
        "restaurant", "cafe", "coffee", "pizza", "burger",
        "grubhub", "doordash", "uber eats"
    ],
    ExpenseCategory.TRAVEL: [
        "airline", "hotel", "uber", "lyft", "parking",
        "gas station", "fuel"
    ],
    ExpenseCategory.SOFTWARE: [
        "github", "aws", "google cloud", "azure",
        "subscription", "saas"
    ],
    ExpenseCategory.OFFICE: [
        "staples", "office depot", "paper", "ink",
        "printer", "stationery"
    ],
}


def categorize_expense(receipt: ReceiptData) -> ExpenseCategory:
    """Categorize expense based on vendor name and line items."""
    text = (receipt.vendor_name + " " + " ".join(
        item.description for item in receipt.line_items
    )).lower()

    for category, keywords in CATEGORY_KEYWORDS.items():
        if any(kw in text for kw in keywords):
            return category

    return ExpenseCategory.OTHER

Validation and Cross-Checking

Always validate that the extracted numbers add up. Receipts with arithmetic errors likely have OCR issues:

def validate_receipt(receipt: ReceiptData) -> list[str]:
    """Validate extracted receipt data for consistency."""
    warnings = []

    # Check line item totals
    computed_subtotal = sum(item.total for item in receipt.line_items)
    if receipt.subtotal and abs(computed_subtotal - receipt.subtotal) > 0.02:
        warnings.append(
            f"Line items sum to {computed_subtotal:.2f} but "
            f"subtotal says {receipt.subtotal:.2f}"
        )

    # Check overall total
    expected_total = (receipt.subtotal or computed_subtotal)
    if receipt.tax_amount:
        expected_total += receipt.tax_amount
    if receipt.tip_amount:
        expected_total += receipt.tip_amount

    if abs(expected_total - receipt.total) > 0.05:
        warnings.append(
            f"Computed total {expected_total:.2f} does not match "
            f"stated total {receipt.total:.2f}"
        )

    # Flag low confidence
    if receipt.confidence < 0.7:
        warnings.append("Low OCR confidence — manual review recommended")

    return warnings

Accounting System Integration

Once validated, push the data to your accounting system. Here is an example for a generic API:

import httpx
from datetime import datetime


async def push_to_accounting(
    receipt: ReceiptData,
    api_url: str,
    api_key: str
) -> dict:
    """Send processed receipt to accounting system."""
    payload = {
        "vendor": receipt.vendor_name,
        "date": receipt.receipt_date.isoformat() if receipt.receipt_date else None,
        "total": receipt.total,
        "tax": receipt.tax_amount or 0,
        "currency": receipt.currency,
        "category": receipt.category.value,
        "line_items": [
            {
                "description": item.description,
                "amount": item.total,
                "quantity": item.quantity,
            }
            for item in receipt.line_items
        ],
        "processed_at": datetime.utcnow().isoformat(),
    }

    async with httpx.AsyncClient() as client:
        response = await client.post(
            f"{api_url}/expenses",
            json=payload,
            headers={"Authorization": f"Bearer {api_key}"},
        )
        response.raise_for_status()
        return response.json()

Batch Processing Multiple Receipts

For processing expense reports with many receipts at once:

import asyncio
from pathlib import Path


async def process_expense_report(
    image_dir: str,
) -> dict:
    """Process all receipt images in a directory."""
    results = {"processed": [], "flagged": [], "errors": []}

    for path in Path(image_dir).glob("*.{jpg,png,jpeg}"):
        try:
            raw_text = scan_receipt(str(path))
            receipt = extract_receipt_fields(raw_text)
            receipt.category = categorize_expense(receipt)
            warnings = validate_receipt(receipt)

            if warnings:
                results["flagged"].append({
                    "file": path.name,
                    "receipt": receipt,
                    "warnings": warnings,
                })
            else:
                results["processed"].append({
                    "file": path.name,
                    "receipt": receipt,
                })
        except Exception as e:
            results["errors"].append({
                "file": path.name,
                "error": str(e),
            })

    return results

FAQ

How do I handle receipts in different languages and currencies?

Use Tesseract language packs for OCR (e.g., --l fra for French) and instruct the LLM to detect and extract the currency symbol. Most LLMs handle multi-language receipts well in the extraction stage. For currency conversion, use a reliable exchange rate API and store both the original and converted amounts.

What about privacy when processing receipts through cloud APIs?

Receipts contain sensitive financial data. For compliance-critical environments, run OCR locally with Tesseract and use a self-hosted LLM for extraction. If using cloud APIs, ensure your provider agreement covers data processing requirements, and never store raw receipt images longer than necessary. Redact personal identifiers before logging.

How accurate is automated receipt processing compared to manual entry?

Well-tuned pipelines achieve 90-95% field-level accuracy on standard printed receipts. The biggest error sources are faded thermal paper, crumpled receipts, and handwritten additions. Building in validation checks (like verifying totals add up) catches most extraction errors automatically, bringing effective accuracy above 98% for validated entries.

#ReceiptProcessing #InvoiceAI #ExpenseAutomation #VisionAI #DocumentProcessing #AccountingAI #Python #AgenticAI

Receipt and Invoice Processing with Vision AI: End-to-End Expense Automation

Why Receipt Processing Is Harder Than It Looks

Defining the Data Model

The Receipt Scanning Pipeline

Expense Categorization

Validation and Cross-Checking

Accounting System Integration

Batch Processing Multiple Receipts

FAQ

How do I handle receipts in different languages and currencies?

What about privacy when processing receipts through cloud APIs?

How accurate is automated receipt processing compared to manual entry?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding