Skip to content
Learn Agentic AI12 min read0 views

Handling Structured Output Failures: Retries, Fallbacks, and Partial Parsing

Build resilient structured output systems that handle LLM failures gracefully. Learn retry strategies with exponential backoff, fallback schemas, partial result recovery, and graceful degradation patterns.

Why Structured Outputs Fail

Even with OpenAI's constrained decoding and Pydantic validation, structured output extraction fails in production. Common failure modes include:

  • API errors: Rate limits (429), server errors (500), timeouts
  • Validation errors: The model returns valid JSON that fails your business logic validators
  • Content refusals: The model refuses to process the input due to safety filters
  • Malformed output: Rare with strict mode, but possible with JSON mode or local models
  • Hallucination: The JSON is valid and schema-conforming, but the extracted values are wrong

A production system must handle every one of these gracefully. Crashing on the first validation error is not acceptable.

Retry with Exponential Backoff

The simplest resilience pattern. Retry API failures with increasing delays:

import time
import random
from typing import TypeVar, Callable
from openai import RateLimitError, APITimeoutError, APIConnectionError

T = TypeVar("T")

def retry_with_backoff(
    func: Callable[..., T],
    max_retries: int = 5,
    base_delay: float = 1.0,
    max_delay: float = 60.0,
    retryable_errors: tuple = (RateLimitError, APITimeoutError, APIConnectionError),
) -> Callable[..., T]:
    """Decorator that retries a function with exponential backoff."""

    def wrapper(*args, **kwargs) -> T:
        last_error = None
        for attempt in range(max_retries):
            try:
                return func(*args, **kwargs)
            except retryable_errors as e:
                last_error = e
                delay = min(base_delay * (2 ** attempt) + random.uniform(0, 1), max_delay)
                print(f"Attempt {attempt + 1} failed: {e}. Retrying in {delay:.1f}s")
                time.sleep(delay)
            except Exception:
                raise  # Non-retryable errors propagate immediately
        raise last_error

    return wrapper

The jitter (random.uniform(0, 1)) prevents thundering herd problems when multiple processes hit rate limits simultaneously.

Validation-Aware Retries with Instructor

Instructor's built-in retry mechanism feeds validation errors back to the model. Customize this behavior:

import instructor
from openai import OpenAI
from pydantic import BaseModel, Field, field_validator
from typing import List

client = instructor.from_openai(OpenAI())

class StrictProduct(BaseModel):
    name: str
    price: float = Field(gt=0)
    currency: str = Field(pattern=r"^[A-Z]{3}$")
    categories: List[str] = Field(min_length=1, max_length=5)

    @field_validator("name")
    @classmethod
    def name_not_generic(cls, v: str) -> str:
        generic_names = {"product", "item", "thing", "unknown"}
        if v.lower().strip() in generic_names:
            raise ValueError(f"Name '{v}' is too generic. Extract the actual product name.")
        return v

# Instructor automatically retries with validation errors in the prompt
product = client.chat.completions.create(
    model="gpt-4o",
    response_model=StrictProduct,
    max_retries=3,  # Will retry up to 3 times on validation failure
    messages=[
        {"role": "user", "content": "The new widget costs fifteen dollars."}
    ],
)

On each retry, the model sees its previous output and the exact validation error, allowing it to self-correct.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Fallback Schemas

When a detailed extraction fails repeatedly, fall back to a simpler schema that captures partial data:

class DetailedExtraction(BaseModel):
    company_name: str
    founding_year: int
    revenue: float
    employee_count: int
    headquarters_city: str
    headquarters_country: str
    industry: str
    ceo_name: str

class FallbackExtraction(BaseModel):
    company_name: str
    raw_details: str = Field(description="Any other details as free text")
    extraction_complete: bool = False

def extract_with_fallback(text: str) -> DetailedExtraction | FallbackExtraction:
    """Try detailed extraction first, fall back to simple on failure."""
    try:
        return client.chat.completions.create(
            model="gpt-4o",
            response_model=DetailedExtraction,
            max_retries=2,
            messages=[
                {"role": "system", "content": "Extract company details precisely."},
                {"role": "user", "content": text}
            ],
        )
    except Exception as e:
        print(f"Detailed extraction failed: {e}. Trying fallback.")
        return client.chat.completions.create(
            model="gpt-4o",
            response_model=FallbackExtraction,
            max_retries=1,
            messages=[
                {"role": "system", "content": "Extract whatever company information you can."},
                {"role": "user", "content": text}
            ],
        )

The fallback captures the company name (almost always extractable) and dumps everything else into free text. This is better than returning nothing — downstream systems can still use the partial data.

Partial Result Recovery

When extracting a list of items, some may validate while others fail. Recover the valid ones:

from pydantic import ValidationError

class Transaction(BaseModel):
    date: str = Field(pattern=r"^\d{4}-\d{2}-\d{2}$")
    amount: float = Field(gt=0)
    merchant: str
    category: str

def extract_transactions_with_recovery(raw_items: list[dict]) -> tuple[list[Transaction], list[dict]]:
    """Parse a list of raw dicts, separating valid from invalid."""
    valid = []
    invalid = []

    for item in raw_items:
        try:
            valid.append(Transaction.model_validate(item))
        except ValidationError as e:
            invalid.append({"data": item, "errors": e.errors()})

    return valid, invalid

# Example usage after getting raw JSON from LLM
import json

raw_response = '''[
    {"date": "2025-03-15", "amount": 42.50, "merchant": "Coffee Shop", "category": "food"},
    {"date": "March 15", "amount": -10, "merchant": "", "category": "other"},
    {"date": "2025-03-16", "amount": 120.00, "merchant": "Gas Station", "category": "transport"}
]'''

raw_items = json.loads(raw_response)
valid, invalid = extract_transactions_with_recovery(raw_items)
print(f"Recovered {len(valid)} of {len(raw_items)} transactions")
print(f"Failed items: {len(invalid)}")

Graceful Degradation Pipeline

Combine all patterns into a complete resilience pipeline:

from dataclasses import dataclass
from typing import Any, Optional

@dataclass
class ExtractionResult:
    data: Any
    quality: str  # "full", "partial", "fallback", "failed"
    errors: list[str]
    attempts: int

def resilient_extract(text: str) -> ExtractionResult:
    errors = []

    # Attempt 1: Full extraction with strict model
    try:
        result = client.chat.completions.create(
            model="gpt-4o",
            response_model=DetailedExtraction,
            max_retries=2,
            messages=[
                {"role": "system", "content": "Extract all company details."},
                {"role": "user", "content": text}
            ],
        )
        return ExtractionResult(data=result, quality="full", errors=[], attempts=1)
    except Exception as e:
        errors.append(f"Full extraction failed: {e}")

    # Attempt 2: Fallback schema with cheaper model
    try:
        result = client.chat.completions.create(
            model="gpt-4o-mini",
            response_model=FallbackExtraction,
            max_retries=1,
            messages=[
                {"role": "system", "content": "Extract basic company info."},
                {"role": "user", "content": text}
            ],
        )
        return ExtractionResult(data=result, quality="fallback", errors=errors, attempts=2)
    except Exception as e:
        errors.append(f"Fallback extraction failed: {e}")

    return ExtractionResult(data=None, quality="failed", errors=errors, attempts=2)

FAQ

How many retries should I configure for production systems?

For API errors (rate limits, timeouts), use 3-5 retries with exponential backoff. For validation errors via Instructor, use 2-3 retries — if the model cannot produce valid output in 3 attempts, more retries rarely help and you should fall back to a simpler schema. Total retry budget should stay under 30 seconds for user-facing applications.

How do I log structured output failures for debugging?

Log the full context: input text, raw LLM response, validation errors, retry count, and which fallback stage succeeded. Use structured logging (JSON format) so you can query failures by error type, schema, and model. This data is invaluable for identifying which validators are too strict and which input patterns cause consistent failures.

Should I use circuit breakers for LLM extraction?

Yes, especially in high-throughput systems. If the LLM API returns errors on 50%+ of recent requests, stop sending new requests for a cooldown period (30-60 seconds). This prevents cascading failures and wasted API spend. Libraries like tenacity and pybreaker make this easy to implement.


#ErrorHandling #Resilience #StructuredOutputs #Production #Python #AgenticAI #LearnAI #AIEngineering

Share this article
C

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.