Skip to content
Learn Agentic AI10 min read0 views

Health Monitoring for AI Agent Dependencies: Checking LLM, Database, and Tool Availability

Build comprehensive health monitoring for AI agent systems that checks LLM providers, databases, and tool integrations. Learn health check patterns, dependency graphs, degraded state detection, and alerting.

Your Agent Is Only as Healthy as Its Weakest Dependency

An AI agent typically depends on at least three external services: an LLM provider for reasoning, a database for state, and one or more tool APIs for actions. If any of these goes down and the agent does not know, it will make broken promises — attempting tool calls that fail, returning stale data, or hanging on unresponsive APIs.

Health monitoring gives the agent system awareness of its own dependency status, enabling proactive degradation and fast alerting before users are affected.

Defining Health Check Contracts

Every dependency gets a health check that returns a structured result.

from dataclasses import dataclass, field
from enum import Enum
from datetime import datetime
from typing import Optional
import time

class HealthStatus(Enum):
    HEALTHY = "healthy"
    DEGRADED = "degraded"
    UNHEALTHY = "unhealthy"

@dataclass
class HealthCheckResult:
    name: str
    status: HealthStatus
    latency_ms: float
    message: str = ""
    checked_at: datetime = field(default_factory=datetime.utcnow)
    metadata: dict = field(default_factory=dict)

class HealthCheck:
    def __init__(self, name: str, timeout_seconds: float = 5.0,
                 degraded_threshold_ms: float = 2000.0):
        self.name = name
        self.timeout = timeout_seconds
        self.degraded_threshold_ms = degraded_threshold_ms

    async def check(self) -> HealthCheckResult:
        raise NotImplementedError

LLM Provider Health Check

Check the LLM by sending a minimal prompt and verifying the response. This catches rate limits, authentication issues, and model availability.

import httpx
import asyncio

class LLMHealthCheck(HealthCheck):
    def __init__(self, provider_name: str, api_url: str, api_key: str,
                 model: str, **kwargs):
        super().__init__(name=f"llm_{provider_name}", **kwargs)
        self.api_url = api_url
        self.api_key = api_key
        self.model = model

    async def check(self) -> HealthCheckResult:
        start = time.monotonic()
        try:
            async with httpx.AsyncClient() as client:
                resp = await client.post(
                    self.api_url,
                    json={
                        "model": self.model,
                        "messages": [{"role": "user", "content": "ping"}],
                        "max_tokens": 5,
                    },
                    headers={"Authorization": f"Bearer {self.api_key}"},
                    timeout=self.timeout,
                )
            latency = (time.monotonic() - start) * 1000

            if resp.status_code == 200:
                status = (HealthStatus.DEGRADED if latency > self.degraded_threshold_ms
                          else HealthStatus.HEALTHY)
                return HealthCheckResult(
                    name=self.name, status=status, latency_ms=latency,
                    message=f"Model {self.model} responding",
                )
            elif resp.status_code == 429:
                return HealthCheckResult(
                    name=self.name, status=HealthStatus.DEGRADED,
                    latency_ms=latency, message="Rate limited",
                )
            else:
                return HealthCheckResult(
                    name=self.name, status=HealthStatus.UNHEALTHY,
                    latency_ms=latency, message=f"HTTP {resp.status_code}",
                )
        except Exception as exc:
            latency = (time.monotonic() - start) * 1000
            return HealthCheckResult(
                name=self.name, status=HealthStatus.UNHEALTHY,
                latency_ms=latency, message=str(exc),
            )

Database Health Check

Test the database connection, query execution, and write capability.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

class PostgresHealthCheck(HealthCheck):
    def __init__(self, pool, **kwargs):
        super().__init__(name="postgres", **kwargs)
        self.pool = pool

    async def check(self) -> HealthCheckResult:
        start = time.monotonic()
        try:
            async with self.pool.acquire() as conn:
                row = await conn.fetchval("SELECT 1")
                assert row == 1
            latency = (time.monotonic() - start) * 1000
            status = (HealthStatus.DEGRADED if latency > self.degraded_threshold_ms
                      else HealthStatus.HEALTHY)
            return HealthCheckResult(
                name=self.name, status=status, latency_ms=latency,
                metadata={"pool_size": self.pool.get_size()},
            )
        except Exception as exc:
            latency = (time.monotonic() - start) * 1000
            return HealthCheckResult(
                name=self.name, status=HealthStatus.UNHEALTHY,
                latency_ms=latency, message=str(exc),
            )

Aggregate Health Monitor

The monitor runs all checks on a schedule and exposes the aggregate status.

class AgentHealthMonitor:
    def __init__(self):
        self.checks: list[HealthCheck] = []
        self.latest_results: dict[str, HealthCheckResult] = {}
        self._running = False

    def register(self, check: HealthCheck):
        self.checks.append(check)

    async def run_all(self) -> dict[str, HealthCheckResult]:
        tasks = [check.check() for check in self.checks]
        results = await asyncio.gather(*tasks, return_exceptions=True)

        for i, result in enumerate(results):
            if isinstance(result, Exception):
                result = HealthCheckResult(
                    name=self.checks[i].name,
                    status=HealthStatus.UNHEALTHY,
                    latency_ms=0,
                    message=f"Check itself failed: {result}",
                )
            self.latest_results[result.name] = result

        return self.latest_results

    def overall_status(self) -> HealthStatus:
        if not self.latest_results:
            return HealthStatus.UNHEALTHY
        statuses = [r.status for r in self.latest_results.values()]
        if any(s == HealthStatus.UNHEALTHY for s in statuses):
            return HealthStatus.UNHEALTHY
        if any(s == HealthStatus.DEGRADED for s in statuses):
            return HealthStatus.DEGRADED
        return HealthStatus.HEALTHY

    async def start_periodic(self, interval_seconds: float = 30.0):
        self._running = True
        while self._running:
            await self.run_all()
            await asyncio.sleep(interval_seconds)

    def stop(self):
        self._running = False

Exposing Health via API

Expose the health status through an HTTP endpoint for load balancers and monitoring tools.

from fastapi import FastAPI

app = FastAPI()
monitor = AgentHealthMonitor()

@app.get("/health")
async def health_endpoint():
    results = await monitor.run_all()
    overall = monitor.overall_status()
    status_code = 200 if overall == HealthStatus.HEALTHY else 503
    return {
        "status": overall.value,
        "checks": {name: r.status.value for name, r in results.items()},
        "timestamp": datetime.utcnow().isoformat(),
    }

@app.get("/health/detailed")
async def detailed_health():
    results = await monitor.run_all()
    return {
        name: {
            "status": r.status.value,
            "latency_ms": r.latency_ms,
            "message": r.message,
            "metadata": r.metadata,
        }
        for name, r in results.items()
    }

FAQ

How often should health checks run?

For internal monitoring, every 15 to 30 seconds is typical. For load balancer health checks, match the load balancer's probe interval (usually 10 seconds). Avoid checking too frequently — a health check that sends a real LLM prompt every 5 seconds will burn tokens and count against your rate limit. Use the lightest possible check that still validates real connectivity.

Should health checks use the same credentials as production traffic?

Yes, always. A health check that uses separate credentials might pass while production traffic fails due to key expiration or permission changes. Use the same API keys, connection pools, and network paths. The health check should exercise the exact same code path as a real request, just with minimal payload.

How do I handle health check flapping?

Require multiple consecutive failures before marking a dependency as unhealthy (a "failure threshold" of 3 is common). Similarly, require multiple consecutive successes before promoting from unhealthy back to healthy. This prevents transient network blips from triggering degradation mode and then immediately recovering, which confuses users and generates alert noise.


#HealthMonitoring #DependencyChecks #Observability #AIAgents #Python #AgenticAI #LearnAI #AIEngineering

Share this article
C

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.