Text Classification for Agent Routing: Intent Detection and Topic Categorization

Why Classification Is the Agent's First Decision

Every message that reaches a multi-agent system must be routed to the right handler. A billing question should go to the billing agent. A technical issue should go to the support agent. A sales inquiry should go to the sales agent. Text classification is the mechanism that makes this routing fast and accurate.

Intent detection is a specialized form of text classification where the classes represent user goals: "check_balance," "reset_password," "schedule_appointment." Topic categorization groups messages by subject matter: "billing," "technical," "general." Both are essential for agents that handle diverse user requests.

Building an Intent Classifier with scikit-learn

For agents with a known set of intents and training data, a traditional ML classifier is fast and reliable.

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
import joblib

training_data = [
    ("What is my account balance?", "check_balance"),
    ("How much do I owe?", "check_balance"),
    ("I forgot my password", "reset_password"),
    ("Can't log in to my account", "reset_password"),
    ("I want to cancel my subscription", "cancel_subscription"),
    ("How do I unsubscribe?", "cancel_subscription"),
    ("Can I speak to a manager?", "escalate"),
    ("I need to talk to a real person", "escalate"),
]

texts, labels = zip(*training_data)

classifier = Pipeline([
    ("tfidf", TfidfVectorizer(ngram_range=(1, 2), max_features=5000)),
    ("clf", LogisticRegression(max_iter=1000)),
])

classifier.fit(texts, labels)

def classify_intent(text: str) -> dict:
    prediction = classifier.predict([text])[0]
    probabilities = classifier.predict_proba([text])[0]
    confidence = max(probabilities)
    return {"intent": prediction, "confidence": round(confidence, 3)}

print(classify_intent("I need to check how much money is in my account"))
# {'intent': 'check_balance', 'confidence': 0.82}

Zero-Shot Classification: No Training Data Required

When you cannot collect labeled training data — or when new intents appear frequently — zero-shot classification lets you define categories at inference time.

from transformers import pipeline

zero_shot = pipeline(
    "zero-shot-classification",
    model="facebook/bart-large-mnli",
)

def classify_zero_shot(
    text: str,
    candidate_labels: list[str],
    multi_label: bool = False,
) -> dict:
    result = zero_shot(
        text,
        candidate_labels=candidate_labels,
        multi_label=multi_label,
    )
    return {
        label: round(score, 3)
        for label, score in zip(result["labels"], result["scores"])
    }

labels = ["billing", "technical_support", "sales", "general_inquiry"]
message = "My invoice shows a charge I didn't authorize"

print(classify_zero_shot(message, labels))
# {'billing': 0.891, 'general_inquiry': 0.054,
#  'technical_support': 0.033, 'sales': 0.022}

Zero-shot models use natural language inference under the hood. They evaluate whether the text "entails" each candidate label. This means your label names matter — "billing_dispute" will perform differently than "billing" for the same input.

Multi-Label Classification

Users often express multiple intents in a single message: "I need to update my address and also check when my next payment is due." Multi-label classification assigns multiple categories simultaneously.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

def classify_multi_intent(text: str, labels: list[str]) -> list[dict]:
    result = zero_shot(
        text,
        candidate_labels=labels,
        multi_label=True,
    )
    return [
        {"label": label, "score": round(score, 3)}
        for label, score in zip(result["labels"], result["scores"])
        if score > 0.5
    ]

labels = ["update_info", "check_payment", "billing", "technical"]
message = "Please change my address and tell me when my next bill is due"

print(classify_multi_intent(message, labels))
# [{'label': 'update_info', 'score': 0.92},
#  {'label': 'check_payment', 'score': 0.87},
#  {'label': 'billing', 'score': 0.61}]

Confidence-Based Routing

Raw classification output needs a decision layer. Here is a routing pattern that handles low-confidence predictions gracefully.

from dataclasses import dataclass
from typing import Optional

@dataclass
class RouteDecision:
    target_agent: str
    confidence: float
    fallback_used: bool

class IntentRouter:
    def __init__(self, classifier, threshold: float = 0.7):
        self.classifier = classifier
        self.threshold = threshold
        self.agent_map = {
            "check_balance": "billing_agent",
            "reset_password": "auth_agent",
            "cancel_subscription": "retention_agent",
            "escalate": "human_agent",
        }

    def route(self, message: str) -> RouteDecision:
        result = self.classifier(message)
        intent = result["intent"]
        confidence = result["confidence"]

        if confidence >= self.threshold and intent in self.agent_map:
            return RouteDecision(
                target_agent=self.agent_map[intent],
                confidence=confidence,
                fallback_used=False,
            )

        return RouteDecision(
            target_agent="general_agent",
            confidence=confidence,
            fallback_used=True,
        )

The threshold is critical. Set it too low and misrouted messages frustrate users. Set it too high and too many messages fall through to the general agent. Start at 0.7 and adjust based on production metrics.

Combining Classification Approaches

Production agents often layer multiple classifiers. A fast keyword-based filter catches obvious cases, a traditional ML model handles most traffic, and a zero-shot model handles the long tail.

class HybridClassifier:
    def __init__(self, keyword_rules, ml_model, zero_shot_model):
        self.keyword_rules = keyword_rules
        self.ml_model = ml_model
        self.zero_shot_model = zero_shot_model

    def classify(self, text: str, labels: list[str]) -> dict:
        # Tier 1: keyword match (microseconds)
        for pattern, label in self.keyword_rules.items():
            if pattern.lower() in text.lower():
                return {"label": label, "confidence": 1.0, "tier": "keyword"}

        # Tier 2: ML model (milliseconds)
        ml_result = self.ml_model(text)
        if ml_result["confidence"] > 0.8:
            return {**ml_result, "tier": "ml"}

        # Tier 3: zero-shot (slower but flexible)
        zs_result = self.zero_shot_model(text, labels)
        top_label = max(zs_result, key=zs_result.get)
        return {
            "label": top_label,
            "confidence": zs_result[top_label],
            "tier": "zero_shot",
        }

FAQ

How many training examples do I need per intent for a reliable classifier?

For traditional ML classifiers like logistic regression with TF-IDF, aim for at least 50 to 100 examples per intent. For fine-tuned transformer models, 20 to 50 examples per class can be sufficient. If you have fewer than 20 examples, use zero-shot classification instead and gradually collect labeled data from production traffic to build a training set.

How do I handle intents that overlap semantically?

Overlapping intents are a design problem, not a model problem. If "check_balance" and "billing_inquiry" are hard to distinguish, consider merging them into a single intent and using a second-stage classifier or rule-based logic to differentiate subtypes. Alternatively, use multi-label classification and let the downstream agent handle both intents.

What is the best way to add new intents without retraining from scratch?

Use a zero-shot model as your primary classifier and maintain a mapping from zero-shot labels to agent routes. Adding a new intent is as simple as adding a new label string. Once you collect enough production examples for the new intent, fine-tune your ML model and redeploy. This gives you immediate coverage with zero-shot and improved accuracy over time with supervised learning.

#TextClassification #IntentDetection #NLP #ZeroShot #AIAgents #Python #AgenticAI #LearnAI #AIEngineering

Text Classification for Agent Routing: Intent Detection and Topic Categorization

Why Classification Is the Agent's First Decision

Building an Intent Classifier with scikit-learn

Zero-Shot Classification: No Training Data Required

Multi-Label Classification

Confidence-Based Routing

Combining Classification Approaches

FAQ

How many training examples do I need per intent for a reliable classifier?

How do I handle intents that overlap semantically?

What is the best way to add new intents without retraining from scratch?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding