Health Monitoring for AI Agent Dependencies: Checking LLM, Database, and Tool Availability
Build comprehensive health monitoring for AI agent systems that checks LLM providers, databases, and tool integrations. Learn health check patterns, dependency graphs, degraded state detection, and alerting.
Your Agent Is Only as Healthy as Its Weakest Dependency
An AI agent typically depends on at least three external services: an LLM provider for reasoning, a database for state, and one or more tool APIs for actions. If any of these goes down and the agent does not know, it will make broken promises — attempting tool calls that fail, returning stale data, or hanging on unresponsive APIs.
Health monitoring gives the agent system awareness of its own dependency status, enabling proactive degradation and fast alerting before users are affected.
Defining Health Check Contracts
Every dependency gets a health check that returns a structured result.
from dataclasses import dataclass, field
from enum import Enum
from datetime import datetime
from typing import Optional
import time
class HealthStatus(Enum):
HEALTHY = "healthy"
DEGRADED = "degraded"
UNHEALTHY = "unhealthy"
@dataclass
class HealthCheckResult:
name: str
status: HealthStatus
latency_ms: float
message: str = ""
checked_at: datetime = field(default_factory=datetime.utcnow)
metadata: dict = field(default_factory=dict)
class HealthCheck:
def __init__(self, name: str, timeout_seconds: float = 5.0,
degraded_threshold_ms: float = 2000.0):
self.name = name
self.timeout = timeout_seconds
self.degraded_threshold_ms = degraded_threshold_ms
async def check(self) -> HealthCheckResult:
raise NotImplementedError
LLM Provider Health Check
Check the LLM by sending a minimal prompt and verifying the response. This catches rate limits, authentication issues, and model availability.
import httpx
import asyncio
class LLMHealthCheck(HealthCheck):
def __init__(self, provider_name: str, api_url: str, api_key: str,
model: str, **kwargs):
super().__init__(name=f"llm_{provider_name}", **kwargs)
self.api_url = api_url
self.api_key = api_key
self.model = model
async def check(self) -> HealthCheckResult:
start = time.monotonic()
try:
async with httpx.AsyncClient() as client:
resp = await client.post(
self.api_url,
json={
"model": self.model,
"messages": [{"role": "user", "content": "ping"}],
"max_tokens": 5,
},
headers={"Authorization": f"Bearer {self.api_key}"},
timeout=self.timeout,
)
latency = (time.monotonic() - start) * 1000
if resp.status_code == 200:
status = (HealthStatus.DEGRADED if latency > self.degraded_threshold_ms
else HealthStatus.HEALTHY)
return HealthCheckResult(
name=self.name, status=status, latency_ms=latency,
message=f"Model {self.model} responding",
)
elif resp.status_code == 429:
return HealthCheckResult(
name=self.name, status=HealthStatus.DEGRADED,
latency_ms=latency, message="Rate limited",
)
else:
return HealthCheckResult(
name=self.name, status=HealthStatus.UNHEALTHY,
latency_ms=latency, message=f"HTTP {resp.status_code}",
)
except Exception as exc:
latency = (time.monotonic() - start) * 1000
return HealthCheckResult(
name=self.name, status=HealthStatus.UNHEALTHY,
latency_ms=latency, message=str(exc),
)
Database Health Check
Test the database connection, query execution, and write capability.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
class PostgresHealthCheck(HealthCheck):
def __init__(self, pool, **kwargs):
super().__init__(name="postgres", **kwargs)
self.pool = pool
async def check(self) -> HealthCheckResult:
start = time.monotonic()
try:
async with self.pool.acquire() as conn:
row = await conn.fetchval("SELECT 1")
assert row == 1
latency = (time.monotonic() - start) * 1000
status = (HealthStatus.DEGRADED if latency > self.degraded_threshold_ms
else HealthStatus.HEALTHY)
return HealthCheckResult(
name=self.name, status=status, latency_ms=latency,
metadata={"pool_size": self.pool.get_size()},
)
except Exception as exc:
latency = (time.monotonic() - start) * 1000
return HealthCheckResult(
name=self.name, status=HealthStatus.UNHEALTHY,
latency_ms=latency, message=str(exc),
)
Aggregate Health Monitor
The monitor runs all checks on a schedule and exposes the aggregate status.
class AgentHealthMonitor:
def __init__(self):
self.checks: list[HealthCheck] = []
self.latest_results: dict[str, HealthCheckResult] = {}
self._running = False
def register(self, check: HealthCheck):
self.checks.append(check)
async def run_all(self) -> dict[str, HealthCheckResult]:
tasks = [check.check() for check in self.checks]
results = await asyncio.gather(*tasks, return_exceptions=True)
for i, result in enumerate(results):
if isinstance(result, Exception):
result = HealthCheckResult(
name=self.checks[i].name,
status=HealthStatus.UNHEALTHY,
latency_ms=0,
message=f"Check itself failed: {result}",
)
self.latest_results[result.name] = result
return self.latest_results
def overall_status(self) -> HealthStatus:
if not self.latest_results:
return HealthStatus.UNHEALTHY
statuses = [r.status for r in self.latest_results.values()]
if any(s == HealthStatus.UNHEALTHY for s in statuses):
return HealthStatus.UNHEALTHY
if any(s == HealthStatus.DEGRADED for s in statuses):
return HealthStatus.DEGRADED
return HealthStatus.HEALTHY
async def start_periodic(self, interval_seconds: float = 30.0):
self._running = True
while self._running:
await self.run_all()
await asyncio.sleep(interval_seconds)
def stop(self):
self._running = False
Exposing Health via API
Expose the health status through an HTTP endpoint for load balancers and monitoring tools.
from fastapi import FastAPI
app = FastAPI()
monitor = AgentHealthMonitor()
@app.get("/health")
async def health_endpoint():
results = await monitor.run_all()
overall = monitor.overall_status()
status_code = 200 if overall == HealthStatus.HEALTHY else 503
return {
"status": overall.value,
"checks": {name: r.status.value for name, r in results.items()},
"timestamp": datetime.utcnow().isoformat(),
}
@app.get("/health/detailed")
async def detailed_health():
results = await monitor.run_all()
return {
name: {
"status": r.status.value,
"latency_ms": r.latency_ms,
"message": r.message,
"metadata": r.metadata,
}
for name, r in results.items()
}
FAQ
How often should health checks run?
For internal monitoring, every 15 to 30 seconds is typical. For load balancer health checks, match the load balancer's probe interval (usually 10 seconds). Avoid checking too frequently — a health check that sends a real LLM prompt every 5 seconds will burn tokens and count against your rate limit. Use the lightest possible check that still validates real connectivity.
Should health checks use the same credentials as production traffic?
Yes, always. A health check that uses separate credentials might pass while production traffic fails due to key expiration or permission changes. Use the same API keys, connection pools, and network paths. The health check should exercise the exact same code path as a real request, just with minimal payload.
How do I handle health check flapping?
Require multiple consecutive failures before marking a dependency as unhealthy (a "failure threshold" of 3 is common). Similarly, require multiple consecutive successes before promoting from unhealthy back to healthy. This prevents transient network blips from triggering degradation mode and then immediately recovering, which confuses users and generates alert noise.
#HealthMonitoring #DependencyChecks #Observability #AIAgents #Python #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.