Agent Reputation Systems: Tracking Reliability and Quality Across Multi-Agent Workflows

Why Track Agent Reputation?

In a multi-agent system with dozens of agents handling thousands of requests, you need to know which agents are reliable and which are degrading. Without reputation tracking, a malfunctioning agent can silently corrupt outputs for hours before anyone notices.

A reputation system continuously scores each agent based on its outputs, enabling automated decisions: route important tasks to high-reputation agents, quarantine agents whose scores drop below a threshold, adjust consensus weights based on track records, and trigger alerts when an agent's reputation trends downward.

The Reputation Score Model

from dataclasses import dataclass, field
from datetime import datetime, timedelta
from collections import deque
import statistics

@dataclass
class InteractionRecord:
    timestamp: str
    task_type: str
    success: bool
    quality_score: float  # 0.0 to 1.0
    latency_ms: float
    feedback_source: str  # "automated", "human", "peer_agent"

class AgentReputation:
    def __init__(
        self,
        agent_id: str,
        initial_score: float = 0.7,
        window_size: int = 100,
        decay_factor: float = 0.95,
    ):
        self.agent_id = agent_id
        self.score = initial_score
        self.window_size = window_size
        self.decay_factor = decay_factor
        self.history: deque[InteractionRecord] = deque(maxlen=window_size)
        self.total_interactions = 0
        self.penalties: list[dict] = []

    def record_interaction(self, record: InteractionRecord):
        self.history.append(record)
        self.total_interactions += 1
        self._recalculate_score()

    def _recalculate_score(self):
        if not self.history:
            return

        weights = []
        scores = []
        for i, record in enumerate(self.history):
            recency_weight = self.decay_factor ** (
                len(self.history) - 1 - i
            )
            source_weight = {
                "human": 1.5, "automated": 1.0, "peer_agent": 0.8
            }.get(record.feedback_source, 1.0)

            weight = recency_weight * source_weight
            score = record.quality_score if record.success else 0.0
            weights.append(weight)
            scores.append(score)

        total_weight = sum(weights)
        self.score = sum(
            s * w for s, w in zip(scores, weights)
        ) / total_weight

        # Apply active penalties
        active_penalties = [
            p for p in self.penalties if not p.get("expired", False)
        ]
        for penalty in active_penalties:
            self.score *= (1 - penalty["severity"])

    def get_reliability_rate(self) -> float:
        if not self.history:
            return 0.0
        successes = sum(1 for r in self.history if r.success)
        return successes / len(self.history)

    def get_trend(self, recent_n: int = 20) -> str:
        if len(self.history) < recent_n * 2:
            return "insufficient_data"

        recent = list(self.history)[-recent_n:]
        older = list(self.history)[-recent_n * 2:-recent_n]

        recent_avg = statistics.mean(r.quality_score for r in recent)
        older_avg = statistics.mean(r.quality_score for r in older)

        diff = recent_avg - older_avg
        if diff > 0.05:
            return "improving"
        elif diff < -0.05:
            return "degrading"
        return "stable"

Penalty and Rehabilitation System

When an agent's reputation drops below a threshold, automatic penalties kick in. But penalties should not be permanent — agents should have a path back to full trust.

class ReputationManager:
    def __init__(
        self,
        warning_threshold: float = 0.5,
        quarantine_threshold: float = 0.3,
        rehabilitation_period: int = 50,
    ):
        self.agents: dict[str, AgentReputation] = {}
        self.warning_threshold = warning_threshold
        self.quarantine_threshold = quarantine_threshold
        self.rehabilitation_period = rehabilitation_period
        self.quarantined: set[str] = set()

    def register_agent(self, agent_id: str, initial_score: float = 0.7):
        self.agents[agent_id] = AgentReputation(
            agent_id=agent_id, initial_score=initial_score
        )

    def report_outcome(
        self, agent_id: str, record: InteractionRecord
    ) -> dict[str, any]:
        rep = self.agents.get(agent_id)
        if not rep:
            raise KeyError(f"Unknown agent: {agent_id}")

        rep.record_interaction(record)
        actions = []

        if rep.score < self.quarantine_threshold:
            if agent_id not in self.quarantined:
                self.quarantined.add(agent_id)
                rep.penalties.append({
                    "type": "quarantine",
                    "severity": 0.5,
                    "reason": f"Score dropped to {rep.score:.2f}",
                    "expired": False,
                })
                actions.append("quarantined")

        elif rep.score < self.warning_threshold:
            actions.append("warning_issued")

        # Rehabilitation check
        if agent_id in self.quarantined:
            recent = list(rep.history)[-self.rehabilitation_period:]
            if len(recent) >= self.rehabilitation_period:
                recent_avg = statistics.mean(
                    r.quality_score for r in recent
                )
                if recent_avg > self.warning_threshold:
                    self.quarantined.discard(agent_id)
                    for p in rep.penalties:
                        if p["type"] == "quarantine":
                            p["expired"] = True
                    rep._recalculate_score()
                    actions.append("rehabilitated")

        return {
            "agent_id": agent_id,
            "current_score": round(rep.score, 3),
            "trend": rep.get_trend(),
            "actions": actions,
        }

    def get_top_agents(
        self, task_type: str | None = None, n: int = 5
    ) -> list[dict]:
        available = [
            (aid, rep) for aid, rep in self.agents.items()
            if aid not in self.quarantined
        ]
        available.sort(key=lambda x: x[1].score, reverse=True)

        return [
            {
                "agent_id": aid,
                "score": round(rep.score, 3),
                "reliability": round(rep.get_reliability_rate(), 3),
                "total_interactions": rep.total_interactions,
            }
            for aid, rep in available[:n]
        ]

Trust Propagation Across Agent Chains

When agents work in chains (Agent A's output feeds into Agent B), reputation should propagate. If Agent B produces a bad result, but the root cause was Agent A providing garbage input, Agent A's reputation should take the hit, not Agent B's.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

class TrustPropagator:
    def __init__(self, manager: ReputationManager):
        self.manager = manager

    def propagate_blame(
        self,
        chain: list[str],
        final_quality: float,
        individual_scores: dict[str, float],
    ):
        """Distribute reputation impact across a chain of agents."""
        if final_quality >= 0.7:
            # Good outcome — credit everyone proportionally
            for agent_id in chain:
                self.manager.report_outcome(
                    agent_id,
                    InteractionRecord(
                        timestamp=datetime.now().isoformat(),
                        task_type="chain_task",
                        success=True,
                        quality_score=individual_scores.get(
                            agent_id, final_quality
                        ),
                        latency_ms=0,
                        feedback_source="automated",
                    ),
                )
            return

        # Bad outcome — find the weakest link
        min_score_agent = min(
            individual_scores, key=individual_scores.get
        )
        for agent_id in chain:
            is_blame = agent_id == min_score_agent
            quality = 0.2 if is_blame else max(
                0.5, individual_scores.get(agent_id, 0.5)
            )
            self.manager.report_outcome(
                agent_id,
                InteractionRecord(
                    timestamp=datetime.now().isoformat(),
                    task_type="chain_task",
                    success=not is_blame,
                    quality_score=quality,
                    latency_ms=0,
                    feedback_source="automated",
                ),
            )

Integrating Reputation with Routing

The simplest integration is using reputation scores as weights when routing tasks.

def reputation_weighted_routing(
    manager: ReputationManager,
    task_type: str,
    candidate_agents: list[str],
) -> str:
    import random

    scores = {}
    for agent_id in candidate_agents:
        rep = manager.agents.get(agent_id)
        if rep and agent_id not in manager.quarantined:
            scores[agent_id] = rep.score

    if not scores:
        raise RuntimeError("No available agents for routing")

    agents = list(scores.keys())
    weights = [scores[a] ** 2 for a in agents]  # square to amplify gaps
    return random.choices(agents, weights=weights, k=1)[0]

FAQ

How do I bootstrap reputation for new agents with no history?

Start new agents at a neutral score (0.7) and route them a small percentage of traffic alongside proven agents. Compare their outputs against the established agents' outputs for the same queries. This "shadow mode" builds a reputation track record without risking production quality. Promote the agent to full traffic once it has 50+ interactions with a score above your warning threshold.

Should I use human feedback or automated evaluation for reputation scoring?

Both, with different weights. Human feedback is more reliable but expensive and slow. Automated evaluation (LLM-as-judge, test case pass rates, format validation) is fast and cheap but can miss nuance. Weight human feedback at 1.5x and automated at 1.0x. Use automated evaluation for volume and human evaluation for calibration.

How do I prevent reputation gaming where agents optimize for the metric rather than actual quality?

Use diverse evaluation criteria that are hard to game simultaneously — factual accuracy, completeness, formatting, latency, and user satisfaction. Rotate evaluation prompts periodically. Most importantly, include real user outcomes (did the user follow up with a complaint? did they complete their task?) as the highest-weighted signal, since that is the metric that actually matters.

#AgentReputation #TrustSystems #QualityTracking #MultiAgentSystems #Python #AgenticAI #LearnAI #AIEngineering

Agent Reputation Systems: Tracking Reliability and Quality Across Multi-Agent Workflows

Why Track Agent Reputation?

The Reputation Score Model

Penalty and Rehabilitation System

Trust Propagation Across Agent Chains

Integrating Reputation with Routing

FAQ

How do I bootstrap reputation for new agents with no history?

Should I use human feedback or automated evaluation for reputation scoring?

How do I prevent reputation gaming where agents optimize for the metric rather than actual quality?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding