Building a Semantic FAQ System: Finding Answers Using Vector Similarity
Build an intelligent FAQ system that understands user questions by meaning rather than keywords, using vector similarity to match queries to answers with confidence thresholds and graceful fallback behavior.
The Problem with Keyword FAQ Search
Traditional FAQ systems match user questions to answers using keyword overlap or simple string matching. A customer asking "Can I get my money back?" will not match an FAQ titled "Refund Policy" because they share no common words. Semantic FAQ systems solve this by embedding both the question and the FAQ entries into a shared vector space, where meaning determines relevance.
Designing the FAQ Data Model
A semantic FAQ system stores each FAQ entry with multiple question variations. Different users phrase the same question differently, and pre-computing embeddings for several phrasings dramatically improves match quality.
from dataclasses import dataclass, field
from typing import List, Optional
import numpy as np
@dataclass
class FAQEntry:
id: str
canonical_question: str
answer: str
question_variations: List[str] = field(default_factory=list)
category: str = "general"
metadata: dict = field(default_factory=dict)
@property
def all_questions(self) -> List[str]:
return [self.canonical_question] + self.question_variations
# Example FAQ data
faqs = [
FAQEntry(
id="refund-001",
canonical_question="What is your refund policy?",
answer="We offer a full refund within 30 days of purchase...",
question_variations=[
"Can I get my money back?",
"How do I request a refund?",
"What if I'm not satisfied with my purchase?",
"Is there a money-back guarantee?",
],
category="billing",
),
FAQEntry(
id="shipping-001",
canonical_question="How long does shipping take?",
answer="Standard shipping takes 5-7 business days...",
question_variations=[
"When will my order arrive?",
"What are the delivery times?",
"How fast do you ship?",
],
category="shipping",
),
]
Building the Semantic FAQ Engine
The engine embeds all question variations and maps them back to their parent FAQ entries. When a user asks a question, we find the closest variation and return the corresponding answer.
from sentence_transformers import SentenceTransformer
import numpy as np
from typing import Tuple
class SemanticFAQEngine:
def __init__(self, model_name: str = "all-MiniLM-L6-v2"):
self.model = SentenceTransformer(model_name)
self.faqs: List[FAQEntry] = []
self.embeddings: Optional[np.ndarray] = None
self.variation_to_faq: List[int] = [] # maps variation index -> FAQ index
def load_faqs(self, faqs: List[FAQEntry]):
"""Embed all question variations and build the index."""
self.faqs = faqs
all_questions = []
self.variation_to_faq = []
for faq_idx, faq in enumerate(faqs):
for question in faq.all_questions:
all_questions.append(question)
self.variation_to_faq.append(faq_idx)
self.embeddings = self.model.encode(
all_questions, normalize_embeddings=True
)
print(
f"Indexed {len(faqs)} FAQs with "
f"{len(all_questions)} total variations"
)
def find_answer(
self,
user_question: str,
top_k: int = 3,
threshold: float = 0.55,
) -> List[dict]:
"""Find the most relevant FAQ answers for a user question."""
query_emb = self.model.encode(
[user_question], normalize_embeddings=True
)
similarities = np.dot(self.embeddings, query_emb.T).flatten()
top_indices = np.argsort(similarities)[::-1][:top_k * 3]
seen_faq_ids = set()
results = []
for idx in top_indices:
score = float(similarities[idx])
if score < threshold:
break
faq_idx = self.variation_to_faq[idx]
faq = self.faqs[faq_idx]
if faq.id in seen_faq_ids:
continue
seen_faq_ids.add(faq.id)
results.append({
"faq_id": faq.id,
"question": faq.canonical_question,
"answer": faq.answer,
"confidence": score,
"category": faq.category,
})
if len(results) >= top_k:
break
return results
Threshold Tuning
The similarity threshold is critical. Too high and you miss valid matches; too low and you return irrelevant answers. Here is a systematic approach to finding the right threshold.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
def tune_threshold(
engine: SemanticFAQEngine,
test_queries: List[dict], # {"query": str, "expected_faq_id": str}
):
"""Find the threshold that maximizes F1 score."""
thresholds = np.arange(0.30, 0.80, 0.05)
best_f1 = 0
best_threshold = 0.5
for threshold in thresholds:
tp, fp, fn = 0, 0, 0
for test in test_queries:
results = engine.find_answer(
test["query"], top_k=1, threshold=threshold
)
if results:
if results[0]["faq_id"] == test["expected_faq_id"]:
tp += 1
else:
fp += 1
else:
fn += 1
precision = tp / (tp + fp) if (tp + fp) > 0 else 0
recall = tp / (tp + fn) if (tp + fn) > 0 else 0
f1 = (2 * precision * recall / (precision + recall)
if (precision + recall) > 0 else 0)
if f1 > best_f1:
best_f1 = f1
best_threshold = threshold
print(f"Threshold={threshold:.2f}: P={precision:.2f} "
f"R={recall:.2f} F1={f1:.2f}")
print(f"\nBest threshold: {best_threshold:.2f} (F1={best_f1:.2f})")
return best_threshold
Graceful Fallback
When no FAQ matches above the threshold, the system should offer a helpful fallback rather than showing nothing.
def answer_with_fallback(
engine: SemanticFAQEngine,
user_question: str,
threshold: float = 0.55,
) -> dict:
"""Return best FAQ answer or a structured fallback response."""
results = engine.find_answer(user_question, top_k=3, threshold=threshold)
if results and results[0]["confidence"] > 0.75:
return {
"type": "confident_match",
"answer": results[0]["answer"],
"confidence": results[0]["confidence"],
}
elif results:
return {
"type": "suggestions",
"message": "I found some related questions:",
"suggestions": [
{"question": r["question"], "confidence": r["confidence"]}
for r in results
],
}
else:
return {
"type": "fallback",
"message": "I could not find a matching answer. "
"Would you like to contact support?",
"query_logged": True,
}
FAQ
How many question variations should each FAQ entry have?
Aim for 3-5 variations per FAQ entry. Each variation should represent a genuinely different phrasing, not just minor word swaps. Collect real user questions from support logs or chat transcripts to create authentic variations. More variations improve recall but also increase index size.
Should I embed the answer text as well?
Generally no. Embedding the question is more effective because users typically phrase their input as a question, and the FAQ answer text often contains detailed explanations that dilute the semantic signal. If you find that some answers contain key phrases users search for, consider adding those phrases as additional question variations instead.
How do I handle FAQ entries that are very similar to each other?
If two FAQ entries have similarity above 0.85, consider merging them or adding a disambiguation step. In the search results, you can group highly similar FAQs and present them as related topics, letting the user choose the most relevant one.
#FAQSystem #VectorSimilarity #SemanticSearch #CustomerSupport #NLP #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.