The Problem with Keyword FAQ Search

Traditional FAQ systems match user questions to answers using keyword overlap or simple string matching. A customer asking "Can I get my money back?" will not match an FAQ titled "Refund Policy" because they share no common words. Semantic FAQ systems solve this by embedding both the question and the FAQ entries into a shared vector space, where meaning determines relevance.

Designing the FAQ Data Model

A semantic FAQ system stores each FAQ entry with multiple question variations. Different users phrase the same question differently, and pre-computing embeddings for several phrasings dramatically improves match quality.

from dataclasses import dataclass, field
from typing import List, Optional
import numpy as np

@dataclass
class FAQEntry:
    id: str
    canonical_question: str
    answer: str
    question_variations: List[str] = field(default_factory=list)
    category: str = "general"
    metadata: dict = field(default_factory=dict)

    @property
    def all_questions(self) -> List[str]:
        return [self.canonical_question] + self.question_variations


# Example FAQ data
faqs = [
    FAQEntry(
        id="refund-001",
        canonical_question="What is your refund policy?",
        answer="We offer a full refund within 30 days of purchase...",
        question_variations=[
            "Can I get my money back?",
            "How do I request a refund?",
            "What if I'm not satisfied with my purchase?",
            "Is there a money-back guarantee?",
        ],
        category="billing",
    ),
    FAQEntry(
        id="shipping-001",
        canonical_question="How long does shipping take?",
        answer="Standard shipping takes 5-7 business days...",
        question_variations=[
            "When will my order arrive?",
            "What are the delivery times?",
            "How fast do you ship?",
        ],
        category="shipping",
    ),
]

Building the Semantic FAQ Engine

The engine embeds all question variations and maps them back to their parent FAQ entries. When a user asks a question, we find the closest variation and return the corresponding answer.

from sentence_transformers import SentenceTransformer
import numpy as np
from typing import Tuple

class SemanticFAQEngine:
    def __init__(self, model_name: str = "all-MiniLM-L6-v2"):
        self.model = SentenceTransformer(model_name)
        self.faqs: List[FAQEntry] = []
        self.embeddings: Optional[np.ndarray] = None
        self.variation_to_faq: List[int] = []  # maps variation index -> FAQ index

    def load_faqs(self, faqs: List[FAQEntry]):
        """Embed all question variations and build the index."""
        self.faqs = faqs
        all_questions = []
        self.variation_to_faq = []

        for faq_idx, faq in enumerate(faqs):
            for question in faq.all_questions:
                all_questions.append(question)
                self.variation_to_faq.append(faq_idx)

        self.embeddings = self.model.encode(
            all_questions, normalize_embeddings=True
        )
        print(
            f"Indexed {len(faqs)} FAQs with "
            f"{len(all_questions)} total variations"
        )

    def find_answer(
        self,
        user_question: str,
        top_k: int = 3,
        threshold: float = 0.55,
    ) -> List[dict]:
        """Find the most relevant FAQ answers for a user question."""
        query_emb = self.model.encode(
            [user_question], normalize_embeddings=True
        )
        similarities = np.dot(self.embeddings, query_emb.T).flatten()

        top_indices = np.argsort(similarities)[::-1][:top_k * 3]

        seen_faq_ids = set()
        results = []

        for idx in top_indices:
            score = float(similarities[idx])
            if score < threshold:
                break

            faq_idx = self.variation_to_faq[idx]
            faq = self.faqs[faq_idx]

            if faq.id in seen_faq_ids:
                continue
            seen_faq_ids.add(faq.id)

            results.append({
                "faq_id": faq.id,
                "question": faq.canonical_question,
                "answer": faq.answer,
                "confidence": score,
                "category": faq.category,
            })

            if len(results) >= top_k:
                break

        return results

Threshold Tuning

The similarity threshold is critical. Too high and you miss valid matches; too low and you return irrelevant answers. Here is a systematic approach to finding the right threshold.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

def tune_threshold(
    engine: SemanticFAQEngine,
    test_queries: List[dict],  # {"query": str, "expected_faq_id": str}
):
    """Find the threshold that maximizes F1 score."""
    thresholds = np.arange(0.30, 0.80, 0.05)
    best_f1 = 0
    best_threshold = 0.5

    for threshold in thresholds:
        tp, fp, fn = 0, 0, 0
        for test in test_queries:
            results = engine.find_answer(
                test["query"], top_k=1, threshold=threshold
            )
            if results:
                if results[0]["faq_id"] == test["expected_faq_id"]:
                    tp += 1
                else:
                    fp += 1
            else:
                fn += 1

        precision = tp / (tp + fp) if (tp + fp) > 0 else 0
        recall = tp / (tp + fn) if (tp + fn) > 0 else 0
        f1 = (2 * precision * recall / (precision + recall)
              if (precision + recall) > 0 else 0)

        if f1 > best_f1:
            best_f1 = f1
            best_threshold = threshold

        print(f"Threshold={threshold:.2f}: P={precision:.2f} "
              f"R={recall:.2f} F1={f1:.2f}")

    print(f"\nBest threshold: {best_threshold:.2f} (F1={best_f1:.2f})")
    return best_threshold

Graceful Fallback

When no FAQ matches above the threshold, the system should offer a helpful fallback rather than showing nothing.

def answer_with_fallback(
    engine: SemanticFAQEngine,
    user_question: str,
    threshold: float = 0.55,
) -> dict:
    """Return best FAQ answer or a structured fallback response."""
    results = engine.find_answer(user_question, top_k=3, threshold=threshold)

    if results and results[0]["confidence"] > 0.75:
        return {
            "type": "confident_match",
            "answer": results[0]["answer"],
            "confidence": results[0]["confidence"],
        }
    elif results:
        return {
            "type": "suggestions",
            "message": "I found some related questions:",
            "suggestions": [
                {"question": r["question"], "confidence": r["confidence"]}
                for r in results
            ],
        }
    else:
        return {
            "type": "fallback",
            "message": "I could not find a matching answer. "
                       "Would you like to contact support?",
            "query_logged": True,
        }

FAQ

How many question variations should each FAQ entry have?

Aim for 3-5 variations per FAQ entry. Each variation should represent a genuinely different phrasing, not just minor word swaps. Collect real user questions from support logs or chat transcripts to create authentic variations. More variations improve recall but also increase index size.

Should I embed the answer text as well?

Generally no. Embedding the question is more effective because users typically phrase their input as a question, and the FAQ answer text often contains detailed explanations that dilute the semantic signal. If you find that some answers contain key phrases users search for, consider adding those phrases as additional question variations instead.

How do I handle FAQ entries that are very similar to each other?

If two FAQ entries have similarity above 0.85, consider merging them or adding a disambiguation step. In the search results, you can group highly similar FAQs and present them as related topics, letting the user choose the most relevant one.

#FAQSystem #VectorSimilarity #SemanticSearch #CustomerSupport #NLP #AgenticAI #LearnAI #AIEngineering

Building a Semantic FAQ System: Finding Answers Using Vector Similarity

The Problem with Keyword FAQ Search

Designing the FAQ Data Model

Building the Semantic FAQ Engine

Threshold Tuning

Graceful Fallback

FAQ

How many question variations should each FAQ entry have?

Should I embed the answer text as well?

How do I handle FAQ entries that are very similar to each other?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding