AI-Powered Search for SaaS Applications: Semantic Search Over Product Data
Build semantic search for your SaaS product using vector embeddings, enabling users to find records by meaning rather than exact keyword matches.
Why Keyword Search Falls Short
Traditional keyword search works by matching exact tokens. When a user in your CRM searches for "companies that are struggling financially," keyword search returns nothing — because no record contains those exact words. Semantic search uses vector embeddings to match by meaning, so that query finds records tagged "at risk," "payment overdue," or "churn likelihood: high."
For SaaS products with rich, structured data, semantic search transforms how users discover and interact with their information.
Architecture: Indexing Pipeline
The indexing pipeline converts your product data into searchable vector embeddings. It runs on data changes (inserts, updates, deletes) and keeps the vector index in sync with your primary database.
# Embedding indexer that processes data changes
from openai import OpenAI
import numpy as np
from dataclasses import dataclass
client = OpenAI()
@dataclass
class SearchDocument:
entity_type: str
entity_id: str
tenant_id: str
text: str
metadata: dict
def create_embedding(text: str) -> list[float]:
response = client.embeddings.create(
model="text-embedding-3-small",
input=text,
)
return response.data[0].embedding
def build_search_text(entity_type: str, record: dict) -> str:
"""Convert a database record into searchable text."""
builders = {
"contact": lambda r: (
f"Contact: {r['name']}. Company: {r.get('company', 'N/A')}. "
f"Title: {r.get('title', 'N/A')}. Notes: {r.get('notes', '')}. "
f"Tags: {', '.join(r.get('tags', []))}."
),
"deal": lambda r: (
f"Deal: {r['name']}. Value: ${r.get('value', 0):,.2f}. "
f"Stage: {r.get('stage', 'unknown')}. "
f"Description: {r.get('description', '')}."
),
"ticket": lambda r: (
f"Support ticket: {r['subject']}. Status: {r.get('status', 'open')}. "
f"Priority: {r.get('priority', 'normal')}. Body: {r.get('body', '')}."
),
}
builder = builders.get(entity_type)
if not builder:
raise ValueError(f"Unknown entity type: {entity_type}")
return builder(record)
Storing Embeddings with pgvector
Use PostgreSQL with pgvector to keep embeddings alongside your existing data, avoiding the operational overhead of a separate vector database.
# pgvector storage and retrieval
import asyncpg
EMBED_DIM = 1536 # text-embedding-3-small dimension
async def setup_vector_table(pool: asyncpg.Pool):
async with pool.acquire() as conn:
await conn.execute("CREATE EXTENSION IF NOT EXISTS vector;")
await conn.execute(f"""
CREATE TABLE IF NOT EXISTS search_embeddings (
id SERIAL PRIMARY KEY,
tenant_id UUID NOT NULL,
entity_type VARCHAR(50) NOT NULL,
entity_id UUID NOT NULL,
content TEXT NOT NULL,
embedding vector({EMBED_DIM}) NOT NULL,
metadata JSONB DEFAULT '{{}}',
updated_at TIMESTAMPTZ DEFAULT NOW(),
UNIQUE(entity_type, entity_id)
);
""")
await conn.execute("""
CREATE INDEX IF NOT EXISTS idx_search_embed_tenant
ON search_embeddings (tenant_id);
""")
async def upsert_embedding(pool: asyncpg.Pool, doc: SearchDocument):
embedding = create_embedding(doc.text)
embedding_str = "[" + ",".join(str(x) for x in embedding) + "]"
async with pool.acquire() as conn:
await conn.execute("""
INSERT INTO search_embeddings
(tenant_id, entity_type, entity_id, content, embedding, metadata)
VALUES ($1, $2, $3, $4, $5::vector, $6)
ON CONFLICT (entity_type, entity_id)
DO UPDATE SET content = $4, embedding = $5::vector,
metadata = $6, updated_at = NOW();
""", doc.tenant_id, doc.entity_type, doc.entity_id,
doc.text, embedding_str, doc.metadata)
Search API
The search endpoint accepts a natural language query, embeds it, and performs a cosine similarity search scoped to the user's tenant.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
from fastapi import FastAPI, Depends, Query
from pydantic import BaseModel
app = FastAPI()
class SearchResult(BaseModel):
entity_type: str
entity_id: str
content: str
score: float
metadata: dict
@app.get("/api/search", response_model=list[SearchResult])
async def semantic_search(
q: str = Query(..., min_length=2, max_length=500),
entity_type: str | None = Query(None),
limit: int = Query(10, ge=1, le=50),
tenant_id: str = Depends(get_current_tenant),
pool: asyncpg.Pool = Depends(get_db_pool),
):
query_embedding = create_embedding(q)
embedding_str = "[" + ",".join(str(x) for x in query_embedding) + "]"
type_filter = "AND entity_type = $3" if entity_type else ""
params = [tenant_id, embedding_str]
if entity_type:
params.append(entity_type)
async with pool.acquire() as conn:
rows = await conn.fetch(f"""
SELECT entity_type, entity_id, content, metadata,
1 - (embedding <=> $2::vector) AS score
FROM search_embeddings
WHERE tenant_id = $1 {type_filter}
ORDER BY embedding <=> $2::vector
LIMIT {limit};
""", *params)
return [
SearchResult(
entity_type=r["entity_type"],
entity_id=str(r["entity_id"]),
content=r["content"],
score=round(float(r["score"]), 4),
metadata=r["metadata"],
)
for r in rows
]
Relevance Tuning
Combine vector similarity with keyword matching and recency boosting for better results.
# Hybrid scoring: vector similarity + keyword BM25 + recency
async def hybrid_search(pool: asyncpg.Pool, query: str,
tenant_id: str, limit: int = 10):
query_embedding = create_embedding(query)
embedding_str = "[" + ",".join(str(x) for x in query_embedding) + "]"
async with pool.acquire() as conn:
rows = await conn.fetch("""
SELECT entity_type, entity_id, content, metadata,
1 - (embedding <=> $2::vector) AS vector_score,
ts_rank(to_tsvector('english', content),
plainto_tsquery('english', $3)) AS keyword_score,
EXTRACT(EPOCH FROM (NOW() - updated_at)) AS age_seconds
FROM search_embeddings
WHERE tenant_id = $1
ORDER BY (
0.7 * (1 - (embedding <=> $2::vector)) +
0.2 * ts_rank(to_tsvector('english', content),
plainto_tsquery('english', $3)) +
0.1 * (1.0 / (1.0 + EXTRACT(EPOCH FROM (NOW() - updated_at)) / 86400))
) DESC
LIMIT $4;
""", tenant_id, embedding_str, query, limit)
return rows
FAQ
How do I keep the vector index in sync with my primary data?
Use database triggers or change data capture (CDC) to detect inserts, updates, and deletes. Queue these changes to a background worker that recomputes embeddings and upserts them. For deletes, remove the corresponding row from the search_embeddings table. A 30-second indexing delay is acceptable for most SaaS applications.
Should I use pgvector or a dedicated vector database?
pgvector is the right choice for most SaaS products under 10 million records. It keeps your stack simple — one database, one backup strategy, one connection pool. Switch to a dedicated vector database like Pinecone or Weaviate only if you need sub-10ms latency at scale or advanced filtering that pgvector does not support.
How do I handle multi-language search?
Use a multilingual embedding model like text-embedding-3-small (which supports 100+ languages natively). Index all content as-is without translation. The embedding model maps semantically similar content to nearby vectors regardless of language, so a query in Spanish will find relevant records written in English.
#SemanticSearch #VectorEmbeddings #SaaS #SearchAPI #Python #Pgvector #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.