Agent Reputation Systems: Tracking Reliability and Quality Across Multi-Agent Workflows
Build a reputation system that tracks agent reliability and output quality over time. Learn scoring mechanisms, trust propagation, penalty systems, and how to rehabilitate underperforming agents.
Why Track Agent Reputation?
In a multi-agent system with dozens of agents handling thousands of requests, you need to know which agents are reliable and which are degrading. Without reputation tracking, a malfunctioning agent can silently corrupt outputs for hours before anyone notices.
A reputation system continuously scores each agent based on its outputs, enabling automated decisions: route important tasks to high-reputation agents, quarantine agents whose scores drop below a threshold, adjust consensus weights based on track records, and trigger alerts when an agent's reputation trends downward.
The Reputation Score Model
from dataclasses import dataclass, field
from datetime import datetime, timedelta
from collections import deque
import statistics
@dataclass
class InteractionRecord:
timestamp: str
task_type: str
success: bool
quality_score: float # 0.0 to 1.0
latency_ms: float
feedback_source: str # "automated", "human", "peer_agent"
class AgentReputation:
def __init__(
self,
agent_id: str,
initial_score: float = 0.7,
window_size: int = 100,
decay_factor: float = 0.95,
):
self.agent_id = agent_id
self.score = initial_score
self.window_size = window_size
self.decay_factor = decay_factor
self.history: deque[InteractionRecord] = deque(maxlen=window_size)
self.total_interactions = 0
self.penalties: list[dict] = []
def record_interaction(self, record: InteractionRecord):
self.history.append(record)
self.total_interactions += 1
self._recalculate_score()
def _recalculate_score(self):
if not self.history:
return
weights = []
scores = []
for i, record in enumerate(self.history):
recency_weight = self.decay_factor ** (
len(self.history) - 1 - i
)
source_weight = {
"human": 1.5, "automated": 1.0, "peer_agent": 0.8
}.get(record.feedback_source, 1.0)
weight = recency_weight * source_weight
score = record.quality_score if record.success else 0.0
weights.append(weight)
scores.append(score)
total_weight = sum(weights)
self.score = sum(
s * w for s, w in zip(scores, weights)
) / total_weight
# Apply active penalties
active_penalties = [
p for p in self.penalties if not p.get("expired", False)
]
for penalty in active_penalties:
self.score *= (1 - penalty["severity"])
def get_reliability_rate(self) -> float:
if not self.history:
return 0.0
successes = sum(1 for r in self.history if r.success)
return successes / len(self.history)
def get_trend(self, recent_n: int = 20) -> str:
if len(self.history) < recent_n * 2:
return "insufficient_data"
recent = list(self.history)[-recent_n:]
older = list(self.history)[-recent_n * 2:-recent_n]
recent_avg = statistics.mean(r.quality_score for r in recent)
older_avg = statistics.mean(r.quality_score for r in older)
diff = recent_avg - older_avg
if diff > 0.05:
return "improving"
elif diff < -0.05:
return "degrading"
return "stable"
Penalty and Rehabilitation System
When an agent's reputation drops below a threshold, automatic penalties kick in. But penalties should not be permanent — agents should have a path back to full trust.
class ReputationManager:
def __init__(
self,
warning_threshold: float = 0.5,
quarantine_threshold: float = 0.3,
rehabilitation_period: int = 50,
):
self.agents: dict[str, AgentReputation] = {}
self.warning_threshold = warning_threshold
self.quarantine_threshold = quarantine_threshold
self.rehabilitation_period = rehabilitation_period
self.quarantined: set[str] = set()
def register_agent(self, agent_id: str, initial_score: float = 0.7):
self.agents[agent_id] = AgentReputation(
agent_id=agent_id, initial_score=initial_score
)
def report_outcome(
self, agent_id: str, record: InteractionRecord
) -> dict[str, any]:
rep = self.agents.get(agent_id)
if not rep:
raise KeyError(f"Unknown agent: {agent_id}")
rep.record_interaction(record)
actions = []
if rep.score < self.quarantine_threshold:
if agent_id not in self.quarantined:
self.quarantined.add(agent_id)
rep.penalties.append({
"type": "quarantine",
"severity": 0.5,
"reason": f"Score dropped to {rep.score:.2f}",
"expired": False,
})
actions.append("quarantined")
elif rep.score < self.warning_threshold:
actions.append("warning_issued")
# Rehabilitation check
if agent_id in self.quarantined:
recent = list(rep.history)[-self.rehabilitation_period:]
if len(recent) >= self.rehabilitation_period:
recent_avg = statistics.mean(
r.quality_score for r in recent
)
if recent_avg > self.warning_threshold:
self.quarantined.discard(agent_id)
for p in rep.penalties:
if p["type"] == "quarantine":
p["expired"] = True
rep._recalculate_score()
actions.append("rehabilitated")
return {
"agent_id": agent_id,
"current_score": round(rep.score, 3),
"trend": rep.get_trend(),
"actions": actions,
}
def get_top_agents(
self, task_type: str | None = None, n: int = 5
) -> list[dict]:
available = [
(aid, rep) for aid, rep in self.agents.items()
if aid not in self.quarantined
]
available.sort(key=lambda x: x[1].score, reverse=True)
return [
{
"agent_id": aid,
"score": round(rep.score, 3),
"reliability": round(rep.get_reliability_rate(), 3),
"total_interactions": rep.total_interactions,
}
for aid, rep in available[:n]
]
Trust Propagation Across Agent Chains
When agents work in chains (Agent A's output feeds into Agent B), reputation should propagate. If Agent B produces a bad result, but the root cause was Agent A providing garbage input, Agent A's reputation should take the hit, not Agent B's.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
class TrustPropagator:
def __init__(self, manager: ReputationManager):
self.manager = manager
def propagate_blame(
self,
chain: list[str],
final_quality: float,
individual_scores: dict[str, float],
):
"""Distribute reputation impact across a chain of agents."""
if final_quality >= 0.7:
# Good outcome — credit everyone proportionally
for agent_id in chain:
self.manager.report_outcome(
agent_id,
InteractionRecord(
timestamp=datetime.now().isoformat(),
task_type="chain_task",
success=True,
quality_score=individual_scores.get(
agent_id, final_quality
),
latency_ms=0,
feedback_source="automated",
),
)
return
# Bad outcome — find the weakest link
min_score_agent = min(
individual_scores, key=individual_scores.get
)
for agent_id in chain:
is_blame = agent_id == min_score_agent
quality = 0.2 if is_blame else max(
0.5, individual_scores.get(agent_id, 0.5)
)
self.manager.report_outcome(
agent_id,
InteractionRecord(
timestamp=datetime.now().isoformat(),
task_type="chain_task",
success=not is_blame,
quality_score=quality,
latency_ms=0,
feedback_source="automated",
),
)
Integrating Reputation with Routing
The simplest integration is using reputation scores as weights when routing tasks.
def reputation_weighted_routing(
manager: ReputationManager,
task_type: str,
candidate_agents: list[str],
) -> str:
import random
scores = {}
for agent_id in candidate_agents:
rep = manager.agents.get(agent_id)
if rep and agent_id not in manager.quarantined:
scores[agent_id] = rep.score
if not scores:
raise RuntimeError("No available agents for routing")
agents = list(scores.keys())
weights = [scores[a] ** 2 for a in agents] # square to amplify gaps
return random.choices(agents, weights=weights, k=1)[0]
FAQ
How do I bootstrap reputation for new agents with no history?
Start new agents at a neutral score (0.7) and route them a small percentage of traffic alongside proven agents. Compare their outputs against the established agents' outputs for the same queries. This "shadow mode" builds a reputation track record without risking production quality. Promote the agent to full traffic once it has 50+ interactions with a score above your warning threshold.
Should I use human feedback or automated evaluation for reputation scoring?
Both, with different weights. Human feedback is more reliable but expensive and slow. Automated evaluation (LLM-as-judge, test case pass rates, format validation) is fast and cheap but can miss nuance. Weight human feedback at 1.5x and automated at 1.0x. Use automated evaluation for volume and human evaluation for calibration.
How do I prevent reputation gaming where agents optimize for the metric rather than actual quality?
Use diverse evaluation criteria that are hard to game simultaneously — factual accuracy, completeness, formatting, latency, and user satisfaction. Rotate evaluation prompts periodically. Most importantly, include real user outcomes (did the user follow up with a complaint? did they complete their task?) as the highest-weighted signal, since that is the metric that actually matters.
#AgentReputation #TrustSystems #QualityTracking #MultiAgentSystems #Python #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.