The Problem with Static FAQ Pages

Traditional FAQ pages fail customers in two ways. First, customers must guess the exact wording the company used to describe their problem. Second, the list grows unwieldy over time — a 200-item FAQ page helps no one. An FAQ agent solves both problems by understanding the customer's question semantically and retrieving the most relevant answer regardless of how it was phrased.

Architecture Overview

An FAQ agent has three core components: a knowledge base with embeddings, a retrieval layer that finds relevant answers, and a generation layer that synthesizes a natural response with confidence scoring.

from dataclasses import dataclass
from openai import AsyncOpenAI
import numpy as np

@dataclass
class FAQEntry:
    id: str
    question: str
    answer: str
    embedding: list[float]
    category: str
    last_updated: str

@dataclass
class RetrievalResult:
    entry: FAQEntry
    similarity: float

class FAQKnowledgeBase:
    def __init__(self, client: AsyncOpenAI):
        self.client = client
        self.entries: list[FAQEntry] = []

    async def embed_text(self, text: str) -> list[float]:
        response = await self.client.embeddings.create(
            model="text-embedding-3-small",
            input=text,
        )
        return response.data[0].embedding

    async def add_entry(
        self, id: str, question: str, answer: str, category: str
    ):
        embedding = await self.embed_text(question)
        self.entries.append(
            FAQEntry(
                id=id,
                question=question,
                answer=answer,
                embedding=embedding,
                category=category,
                last_updated="2026-03-17",
            )
        )

Semantic Retrieval with Confidence Scoring

The retrieval layer computes cosine similarity between the user question and every FAQ entry. This is where confidence thresholds become critical — returning a wrong answer is far worse than admitting the agent does not know.

def cosine_similarity(a: list[float], b: list[float]) -> float:
    a_arr, b_arr = np.array(a), np.array(b)
    return float(
        np.dot(a_arr, b_arr)
        / (np.linalg.norm(a_arr) * np.linalg.norm(b_arr))
    )

class FAQRetriever:
    def __init__(self, kb: FAQKnowledgeBase):
        self.kb = kb
        self.high_confidence = 0.85
        self.low_confidence = 0.65

    async def retrieve(
        self, query: str, top_k: int = 3
    ) -> list[RetrievalResult]:
        query_embedding = await self.kb.embed_text(query)
        results = []
        for entry in self.kb.entries:
            sim = cosine_similarity(query_embedding, entry.embedding)
            results.append(RetrievalResult(entry=entry, similarity=sim))
        results.sort(key=lambda r: r.similarity, reverse=True)
        return results[:top_k]

    async def answer(self, query: str) -> dict:
        results = await self.retrieve(query)
        if not results:
            return {
                "answer": None,
                "confidence": "none",
                "should_track": True,
            }
        top = results[0]
        if top.similarity >= self.high_confidence:
            return {
                "answer": top.entry.answer,
                "confidence": "high",
                "source_id": top.entry.id,
                "similarity": top.similarity,
                "should_track": False,
            }
        elif top.similarity >= self.low_confidence:
            return {
                "answer": top.entry.answer,
                "confidence": "medium",
                "source_id": top.entry.id,
                "similarity": top.similarity,
                "should_track": True,
            }
        else:
            return {
                "answer": None,
                "confidence": "low",
                "should_track": True,
            }

Tracking Unanswered Questions

Every question the agent cannot confidently answer is an opportunity to improve the knowledge base. An unanswered question tracker clusters similar failures and surfaces the most impactful gaps.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

from datetime import datetime
from collections import defaultdict

class UnansweredTracker:
    def __init__(self):
        self.questions: list[dict] = []

    def track(self, query: str, confidence: str, top_similarity: float):
        self.questions.append({
            "query": query,
            "confidence": confidence,
            "top_similarity": top_similarity,
            "timestamp": datetime.utcnow().isoformat(),
        })

    def get_gap_report(self, min_occurrences: int = 3) -> list[dict]:
        """Group similar unanswered questions and rank by frequency."""
        clusters = defaultdict(list)
        for q in self.questions:
            # Simple grouping by first 5 words
            key = " ".join(q["query"].lower().split()[:5])
            clusters[key].append(q)

        gaps = []
        for key, items in clusters.items():
            if len(items) >= min_occurrences:
                gaps.append({
                    "cluster_key": key,
                    "count": len(items),
                    "sample_queries": [i["query"] for i in items[:3]],
                    "avg_similarity": sum(
                        i["top_similarity"] for i in items
                    ) / len(items),
                })
        gaps.sort(key=lambda g: g["count"], reverse=True)
        return gaps

Generating Natural Responses

Rather than returning raw FAQ text, the agent uses an LLM to synthesize a conversational answer grounded in the retrieved content. This prevents hallucination by constraining the model to only use provided sources.

async def generate_faq_response(
    client: AsyncOpenAI, query: str, faq_answer: str
) -> str:
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a customer support assistant. Answer the "
                    "customer question using ONLY the provided FAQ "
                    "content. Do not add information not present in "
                    "the source. Be concise and helpful."
                ),
            },
            {
                "role": "user",
                "content": (
                    f"Customer question: {query}\n\n"
                    f"FAQ source: {faq_answer}"
                ),
            },
        ],
        max_tokens=300,
    )
    return response.choices[0].message.content

FAQ

What embedding model should I use for FAQ retrieval?

OpenAI's text-embedding-3-small offers an excellent balance of quality and cost for FAQ workloads. It handles paraphrases well and runs at a fraction of the cost of larger models. For multilingual FAQs, text-embedding-3-large performs better across languages.

How do I set the right confidence threshold?

Start with a high threshold (0.85) and measure your false positive rate — cases where the agent returns a wrong answer confidently. Then lower the threshold gradually while monitoring accuracy. Most teams settle between 0.75 and 0.85 depending on their tolerance for incorrect responses versus unanswered questions.

How often should I update the knowledge base?

Review your unanswered question tracker weekly. Any cluster with more than five occurrences represents a meaningful gap. Also re-embed entries whenever the underlying answer content changes, since stale embeddings paired with updated text create inconsistencies.

#FAQAgent #KnowledgeBase #SemanticSearch #RAG #CustomerSupport #AgenticAI #LearnAI #AIEngineering

Building an FAQ Agent: Automatic Question Answering from Knowledge Bases

The Problem with Static FAQ Pages

Architecture Overview

Semantic Retrieval with Confidence Scoring

Tracking Unanswered Questions

Generating Natural Responses

FAQ

What embedding model should I use for FAQ retrieval?

How do I set the right confidence threshold?

How often should I update the knowledge base?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding