Receipt and Invoice Processing with Vision AI: End-to-End Expense Automation
Build a vision AI pipeline that scans receipts and invoices, extracts vendor names, dates, line items, and totals, categorizes expenses, and integrates with accounting systems for fully automated expense processing.
Why Receipt Processing Is Harder Than It Looks
Receipts and invoices come in hundreds of formats. A grocery store receipt is a narrow thermal printout. A SaaS invoice is a polished PDF. A contractor invoice might be a handwritten note on letterhead. Despite this variety, your accounting system needs the same structured data from all of them: vendor, date, line items, tax, total, and payment method.
Vision AI agents solve this by combining OCR with LLM-powered understanding. The OCR reads the text; the LLM understands the semantic meaning of each field regardless of layout or format.
Defining the Data Model
Start with a clear schema for what you want to extract:
from pydantic import BaseModel, Field
from datetime import date
from enum import Enum
class ExpenseCategory(str, Enum):
MEALS = "meals"
TRAVEL = "travel"
OFFICE = "office_supplies"
SOFTWARE = "software"
UTILITIES = "utilities"
EQUIPMENT = "equipment"
OTHER = "other"
class LineItem(BaseModel):
description: str
quantity: float = 1.0
unit_price: float
total: float
class ReceiptData(BaseModel):
vendor_name: str
vendor_address: str | None = None
receipt_date: date | None = None
currency: str = "USD"
line_items: list[LineItem] = []
subtotal: float | None = None
tax_amount: float | None = None
tip_amount: float | None = None
total: float
payment_method: str | None = None
category: ExpenseCategory = ExpenseCategory.OTHER
confidence: float = Field(ge=0.0, le=1.0)
The Receipt Scanning Pipeline
The pipeline reads an image, runs OCR, sends the text to an LLM for field extraction, and validates the results:
import pytesseract
from PIL import Image
from openai import OpenAI
import json
def scan_receipt(image_path: str) -> str:
"""Extract raw text from a receipt image."""
img = Image.open(image_path)
# Receipts are often narrow, so set page segmentation accordingly
custom_config = r"--oem 3 --psm 4"
text = pytesseract.image_to_string(img, config=custom_config)
return text
def extract_receipt_fields(raw_text: str) -> ReceiptData:
"""Use an LLM to extract structured fields from receipt text."""
client = OpenAI()
response = client.beta.chat.completions.parse(
model="gpt-4o",
messages=[
{"role": "system", "content": (
"You are a receipt processing expert. Extract structured "
"data from the following receipt text. Identify all line "
"items, totals, tax, vendor info, and payment method. "
"Assign a confidence score from 0 to 1 based on how "
"clearly the information could be read."
)},
{"role": "user", "content": raw_text},
],
response_format=ReceiptData,
)
return response.choices[0].message.parsed
Expense Categorization
Use keyword matching as a fast first pass, then fall back to LLM classification for ambiguous cases:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
CATEGORY_KEYWORDS = {
ExpenseCategory.MEALS: [
"restaurant", "cafe", "coffee", "pizza", "burger",
"grubhub", "doordash", "uber eats"
],
ExpenseCategory.TRAVEL: [
"airline", "hotel", "uber", "lyft", "parking",
"gas station", "fuel"
],
ExpenseCategory.SOFTWARE: [
"github", "aws", "google cloud", "azure",
"subscription", "saas"
],
ExpenseCategory.OFFICE: [
"staples", "office depot", "paper", "ink",
"printer", "stationery"
],
}
def categorize_expense(receipt: ReceiptData) -> ExpenseCategory:
"""Categorize expense based on vendor name and line items."""
text = (receipt.vendor_name + " " + " ".join(
item.description for item in receipt.line_items
)).lower()
for category, keywords in CATEGORY_KEYWORDS.items():
if any(kw in text for kw in keywords):
return category
return ExpenseCategory.OTHER
Validation and Cross-Checking
Always validate that the extracted numbers add up. Receipts with arithmetic errors likely have OCR issues:
def validate_receipt(receipt: ReceiptData) -> list[str]:
"""Validate extracted receipt data for consistency."""
warnings = []
# Check line item totals
computed_subtotal = sum(item.total for item in receipt.line_items)
if receipt.subtotal and abs(computed_subtotal - receipt.subtotal) > 0.02:
warnings.append(
f"Line items sum to {computed_subtotal:.2f} but "
f"subtotal says {receipt.subtotal:.2f}"
)
# Check overall total
expected_total = (receipt.subtotal or computed_subtotal)
if receipt.tax_amount:
expected_total += receipt.tax_amount
if receipt.tip_amount:
expected_total += receipt.tip_amount
if abs(expected_total - receipt.total) > 0.05:
warnings.append(
f"Computed total {expected_total:.2f} does not match "
f"stated total {receipt.total:.2f}"
)
# Flag low confidence
if receipt.confidence < 0.7:
warnings.append("Low OCR confidence — manual review recommended")
return warnings
Accounting System Integration
Once validated, push the data to your accounting system. Here is an example for a generic API:
import httpx
from datetime import datetime
async def push_to_accounting(
receipt: ReceiptData,
api_url: str,
api_key: str
) -> dict:
"""Send processed receipt to accounting system."""
payload = {
"vendor": receipt.vendor_name,
"date": receipt.receipt_date.isoformat() if receipt.receipt_date else None,
"total": receipt.total,
"tax": receipt.tax_amount or 0,
"currency": receipt.currency,
"category": receipt.category.value,
"line_items": [
{
"description": item.description,
"amount": item.total,
"quantity": item.quantity,
}
for item in receipt.line_items
],
"processed_at": datetime.utcnow().isoformat(),
}
async with httpx.AsyncClient() as client:
response = await client.post(
f"{api_url}/expenses",
json=payload,
headers={"Authorization": f"Bearer {api_key}"},
)
response.raise_for_status()
return response.json()
Batch Processing Multiple Receipts
For processing expense reports with many receipts at once:
import asyncio
from pathlib import Path
async def process_expense_report(
image_dir: str,
) -> dict:
"""Process all receipt images in a directory."""
results = {"processed": [], "flagged": [], "errors": []}
for path in Path(image_dir).glob("*.{jpg,png,jpeg}"):
try:
raw_text = scan_receipt(str(path))
receipt = extract_receipt_fields(raw_text)
receipt.category = categorize_expense(receipt)
warnings = validate_receipt(receipt)
if warnings:
results["flagged"].append({
"file": path.name,
"receipt": receipt,
"warnings": warnings,
})
else:
results["processed"].append({
"file": path.name,
"receipt": receipt,
})
except Exception as e:
results["errors"].append({
"file": path.name,
"error": str(e),
})
return results
FAQ
How do I handle receipts in different languages and currencies?
Use Tesseract language packs for OCR (e.g., --l fra for French) and instruct the LLM to detect and extract the currency symbol. Most LLMs handle multi-language receipts well in the extraction stage. For currency conversion, use a reliable exchange rate API and store both the original and converted amounts.
What about privacy when processing receipts through cloud APIs?
Receipts contain sensitive financial data. For compliance-critical environments, run OCR locally with Tesseract and use a self-hosted LLM for extraction. If using cloud APIs, ensure your provider agreement covers data processing requirements, and never store raw receipt images longer than necessary. Redact personal identifiers before logging.
How accurate is automated receipt processing compared to manual entry?
Well-tuned pipelines achieve 90-95% field-level accuracy on standard printed receipts. The biggest error sources are faded thermal paper, crumpled receipts, and handwritten additions. Building in validation checks (like verifying totals add up) catches most extraction errors automatically, bringing effective accuracy above 98% for validated entries.
#ReceiptProcessing #InvoiceAI #ExpenseAutomation #VisionAI #DocumentProcessing #AccountingAI #Python #AgenticAI
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.