Validating LLM Outputs: Custom Validators, Business Rules, and Data Quality Checks

The Validation Gap

Structured outputs guarantee valid JSON that conforms to a schema. But schema conformance is the lowest bar. A JSON object where every field has the right type can still contain:

A person's name in the email field
A date in the future for a historical event
A price that violates your business pricing rules
An address that is syntactically valid but does not exist
A summary that contradicts the source document

Validation is where you bridge the gap between "structurally correct" and "actually correct." Pydantic gives you the tools to build validation layers that catch these issues before bad data reaches your database or your users.

Field-Level Validators

Start with individual field constraints. Pydantic offers two approaches: Field constraints for simple rules and field_validator for complex logic:

from pydantic import BaseModel, Field, field_validator
from typing import List, Optional
import re

class ExtractedCompany(BaseModel):
    name: str = Field(min_length=2, max_length=200)
    ticker: Optional[str] = Field(default=None, pattern=r"^[A-Z]{1,5}$")
    employee_count: Optional[int] = Field(default=None, ge=1, le=10_000_000)
    founded_year: Optional[int] = Field(default=None, ge=1600, le=2026)
    website: Optional[str] = None
    revenue_usd: Optional[float] = Field(default=None, ge=0)

    @field_validator("name")
    @classmethod
    def clean_company_name(cls, v: str) -> str:
        # Remove common LLM artifacts
        v = v.strip().strip('"').strip("'")
        # Reject obviously wrong names
        if v.lower() in {"n/a", "unknown", "none", "null", "company"}:
            raise ValueError(f"'{v}' is not a valid company name")
        return v

    @field_validator("website")
    @classmethod
    def validate_url(cls, v: Optional[str]) -> Optional[str]:
        if v is None:
            return v
        url_pattern = re.compile(
            r"^https?://[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?"
            r"(.[a-zA-Z]{2,})+(/.*)?$"
        )
        if not url_pattern.match(v):
            raise ValueError(f"Invalid URL format: '{v}'")
        return v

The founded_year field rejects years before 1600 and after 2026. This catches the common hallucination where the model invents a founding year that is clearly wrong.

Cross-Field Validation

Many business rules involve relationships between fields. Use model_validator to enforce them:

from pydantic import model_validator

class JobPosting(BaseModel):
    title: str
    company: str
    salary_min: Optional[float] = None
    salary_max: Optional[float] = None
    salary_currency: str = "USD"
    experience_min_years: Optional[int] = Field(default=None, ge=0)
    experience_max_years: Optional[int] = Field(default=None, ge=0)
    remote: bool = False
    location: Optional[str] = None

    @model_validator(mode="after")
    def validate_salary_range(self) -> "JobPosting":
        if self.salary_min is not None and self.salary_max is not None:
            if self.salary_min > self.salary_max:
                raise ValueError(
                    f"salary_min ({self.salary_min}) cannot exceed "
                    f"salary_max ({self.salary_max})"
                )
            if self.salary_max > 10 * self.salary_min:
                raise ValueError(
                    f"Salary range too wide: {self.salary_min}-{self.salary_max}. "
                    "This likely indicates an extraction error."
                )
        return self

    @model_validator(mode="after")
    def validate_experience_range(self) -> "JobPosting":
        if self.experience_min_years is not None and self.experience_max_years is not None:
            if self.experience_min_years > self.experience_max_years:
                raise ValueError(
                    f"experience_min ({self.experience_min_years}) exceeds "
                    f"experience_max ({self.experience_max_years})"
                )
        return self

    @model_validator(mode="after")
    def validate_location_for_non_remote(self) -> "JobPosting":
        if not self.remote and not self.location:
            raise ValueError("Non-remote jobs must have a location specified")
        return self

The salary range validator catches a subtle issue: if the model extracts a min of 50000 and a max of 500000, the 10x ratio flag triggers, indicating the model probably misread the salary.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

Domain-Specific Constraint Libraries

For complex domains, build a reusable validation library:

class MedicalValidators:
    """Validation functions for medical data extraction."""

    VALID_BLOOD_TYPES = {"A+", "A-", "B+", "B-", "AB+", "AB-", "O+", "O-"}

    @staticmethod
    def validate_icd10_code(code: str) -> str:
        """Validate ICD-10 diagnosis code format."""
        pattern = re.compile(r"^[A-Z]\d{2}(\.\d{1,4})?$")
        if not pattern.match(code):
            raise ValueError(f"Invalid ICD-10 code format: '{code}'")
        return code

    @staticmethod
    def validate_npi(npi: str) -> str:
        """Validate National Provider Identifier (10-digit)."""
        if not re.match(r"^\d{10}$", npi):
            raise ValueError(f"NPI must be exactly 10 digits, got: '{npi}'")
        return npi

class PatientRecord(BaseModel):
    name: str
    date_of_birth: str
    blood_type: Optional[str] = None
    diagnoses: List[str] = Field(default_factory=list)
    provider_npi: Optional[str] = None

    @field_validator("blood_type")
    @classmethod
    def check_blood_type(cls, v: Optional[str]) -> Optional[str]:
        if v and v not in MedicalValidators.VALID_BLOOD_TYPES:
            raise ValueError(f"Invalid blood type: '{v}'")
        return v

    @field_validator("diagnoses")
    @classmethod
    def check_icd_codes(cls, v: List[str]) -> List[str]:
        return [MedicalValidators.validate_icd10_code(code) for code in v]

Data Quality Scoring

Instead of binary pass/fail, assign a quality score to each extraction:

@dataclass
class QualityReport:
    score: float  # 0.0 to 1.0
    issues: List[str]
    field_scores: dict[str, float]

def assess_extraction_quality(extracted: BaseModel, source_text: str) -> QualityReport:
    """Score the quality of an extraction result."""
    issues = []
    field_scores = {}
    total_fields = 0
    filled_fields = 0

    for field_name, field_info in extracted.model_fields.items():
        total_fields += 1
        value = getattr(extracted, field_name)

        if value is None or value == [] or value == "":
            field_scores[field_name] = 0.0
        else:
            filled_fields += 1
            # Check if extracted value appears in source text
            str_value = str(value).lower()
            if len(str_value) > 3 and str_value not in source_text.lower():
                issues.append(f"'{field_name}' value '{value}' not found in source text")
                field_scores[field_name] = 0.5  # Suspicious but not necessarily wrong
            else:
                field_scores[field_name] = 1.0

    completeness = filled_fields / total_fields if total_fields > 0 else 0
    accuracy = sum(field_scores.values()) / total_fields if total_fields > 0 else 0
    overall = (completeness * 0.4) + (accuracy * 0.6)

    if completeness < 0.5:
        issues.append(f"Low completeness: only {filled_fields}/{total_fields} fields filled")

    return QualityReport(score=overall, issues=issues, field_scores=field_scores)

The source text check is powerful: if the extracted value does not appear anywhere in the input document, it is likely hallucinated. This catches the most dangerous failure mode of LLM extraction.

Putting It All Together: Validated Extraction Pipeline

import instructor
from openai import OpenAI

client = instructor.from_openai(OpenAI())

def validated_extract(text: str) -> tuple[ExtractedCompany | None, QualityReport]:
    """Extract, validate, and score a company extraction."""
    try:
        company = client.chat.completions.create(
            model="gpt-4o",
            response_model=ExtractedCompany,
            max_retries=3,
            messages=[
                {
                    "role": "system",
                    "content": (
                        "Extract company information. Only include data "
                        "explicitly stated in the text. Use null for missing fields."
                    )
                },
                {"role": "user", "content": text}
            ],
        )

        quality = assess_extraction_quality(company, text)

        if quality.score < 0.3:
            return None, quality  # Reject low-quality extractions

        return company, quality

    except Exception as e:
        return None, QualityReport(
            score=0.0,
            issues=[f"Extraction failed: {str(e)}"],
            field_scores={},
        )

# Usage
text = "Acme Corp (ACME) was founded in 2015. They have about 500 employees."
company, quality = validated_extract(text)
if company:
    print(f"Extracted: {company.name} (quality: {quality.score:.2f})")
    for issue in quality.issues:
        print(f"  Warning: {issue}")

FAQ

How strict should my validators be?

Start permissive and tighten based on data. Track which validators trigger most often and examine the rejected data manually. If a validator rejects more than 20% of extractions, it is probably too strict — either loosen it or improve your extraction prompt. Production systems typically stabilize at 3-5% rejection rate.

Should I validate LLM outputs differently than user inputs?

Yes. User input validation focuses on security (SQL injection, XSS). LLM output validation focuses on correctness (hallucination detection, domain constraint enforcement). You still need basic security checks on LLM output if it is ever rendered as HTML or used in database queries, but the primary concern is data accuracy.

How do I handle validation failures in a user-facing application?

Never show raw validation errors to end users. Map internal validation failures to user-friendly messages: "We could not extract all the details from this document. Please review the highlighted fields." Log the full validation context for debugging, and provide a manual override for users to correct extracted values.

#Validation #DataQuality #Pydantic #BusinessRules #Python #AgenticAI #LearnAI #AIEngineering

Validating LLM Outputs: Custom Validators, Business Rules, and Data Quality Checks

The Validation Gap

Field-Level Validators

Cross-Field Validation

Domain-Specific Constraint Libraries

Data Quality Scoring

Putting It All Together: Validated Extraction Pipeline

FAQ

How strict should my validators be?

Should I validate LLM outputs differently than user inputs?

How do I handle validation failures in a user-facing application?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding