Adding AI Chat to Your SaaS Product: Architecture and Implementation Guide

Why AI Chat Belongs Inside Your Product

Adding AI chat to a SaaS product is not the same as dropping a third-party chatbot on your marketing site. Product-embedded AI chat needs access to the user's data, must respect their permissions, and should understand the current application context. A customer viewing an invoice should be able to ask "Why is this total different from last month?" and get a real, data-backed answer — not a generic FAQ response.

This guide covers the architecture for building an AI chat system that lives inside your SaaS application as a first-class feature.

Architecture Overview

The system has four layers: the frontend widget, a WebSocket gateway, an AI orchestration service, and your existing product APIs.

# Backend: FastAPI WebSocket endpoint for AI chat
from fastapi import FastAPI, WebSocket, Depends
from typing import Optional
import json

app = FastAPI()

class ChatContext:
    """Captures the user's current product context."""
    def __init__(self, user_id: str, tenant_id: str, current_page: str,
                 entity_type: Optional[str] = None,
                 entity_id: Optional[str] = None):
        self.user_id = user_id
        self.tenant_id = tenant_id
        self.current_page = current_page
        self.entity_type = entity_type
        self.entity_id = entity_id

    def to_system_prompt(self) -> str:
        context = f"User is on page: {self.current_page}."
        if self.entity_type and self.entity_id:
            context += f" They are viewing {self.entity_type} with ID {self.entity_id}."
        return context


@app.websocket("/ws/chat")
async def chat_endpoint(websocket: WebSocket):
    await websocket.accept()
    # Authenticate from token in first message
    auth_msg = await websocket.receive_json()
    user = await authenticate_ws_token(auth_msg["token"])
    if not user:
        await websocket.close(code=4001)
        return

    while True:
        data = await websocket.receive_json()
        context = ChatContext(
            user_id=user.id,
            tenant_id=user.tenant_id,
            current_page=data.get("page", "/"),
            entity_type=data.get("entity_type"),
            entity_id=data.get("entity_id"),
        )
        response = await generate_ai_response(
            message=data["message"],
            context=context,
            permissions=user.permissions,
        )
        await websocket.send_json({"reply": response})

The chat widget mounts as a floating component that tracks the user's current route and sends page context with every message.

// React chat widget that sends page context
import { useEffect, useRef, useState } from "react";
import { usePathname } from "next/navigation";

interface ChatMessage {
  role: "user" | "assistant";
  content: string;
}

export function AIChatWidget({ authToken }: { authToken: string }) {
  const [messages, setMessages] = useState<ChatMessage[]>([]);
  const [input, setInput] = useState("");
  const wsRef = useRef<WebSocket | null>(null);
  const pathname = usePathname();

  useEffect(() => {
    const ws = new WebSocket(`wss://api.example.com/ws/chat`);
    ws.onopen = () => ws.send(JSON.stringify({ token: authToken }));
    ws.onmessage = (event) => {
      const data = JSON.parse(event.data);
      setMessages((prev) => [...prev, { role: "assistant", content: data.reply }]);
    };
    wsRef.current = ws;
    return () => ws.close();
  }, [authToken]);

  const sendMessage = () => {
    if (!input.trim() || !wsRef.current) return;
    const payload = {
      message: input,
      page: pathname,
      entity_type: extractEntityType(pathname),
      entity_id: extractEntityId(pathname),
    };
    wsRef.current.send(JSON.stringify(payload));
    setMessages((prev) => [...prev, { role: "user", content: input }]);
    setInput("");
  };

  return (
    <div className="fixed bottom-4 right-4 w-96 bg-white shadow-xl rounded-lg">
      <div className="h-80 overflow-y-auto p-4">
        {messages.map((msg, i) => (
          <div key={i} className={msg.role === "user" ? "text-right" : "text-left"}>
            <p className="inline-block p-2 rounded-lg bg-gray-100">{msg.content}</p>
          </div>
        ))}
      </div>
      <div className="flex p-2 border-t">
        <input value={input} onChange={(e) => setInput(e.target.value)}
          className="flex-1 border rounded-l px-3" placeholder="Ask anything..." />
        <button onClick={sendMessage} className="bg-blue-600 text-white px-4 rounded-r">
          Send
        </button>
      </div>
    </div>
  );
}

Permission-Scoped Data Access

The AI must never return data the user is not authorized to see. Inject the user's permission set into the tool layer so every data fetch is scoped.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

async def generate_ai_response(message: str, context: ChatContext,
                                permissions: list[str]) -> str:
    tools = build_scoped_tools(context.tenant_id, context.user_id, permissions)

    system_prompt = f"""You are a helpful assistant inside our SaaS product.
{context.to_system_prompt()}
Only use the provided tools to fetch data. Never fabricate data.
The user has these permissions: {', '.join(permissions)}.
Do not attempt to access data outside their permission scope."""

    response = await call_llm(
        system=system_prompt,
        messages=[{"role": "user", "content": message}],
        tools=tools,
    )
    return response


def build_scoped_tools(tenant_id: str, user_id: str,
                       permissions: list[str]) -> list:
    tools = []
    if "invoices:read" in permissions:
        tools.append(InvoiceLookupTool(tenant_id=tenant_id))
    if "analytics:read" in permissions:
        tools.append(AnalyticsQueryTool(tenant_id=tenant_id))
    if "users:read" in permissions:
        tools.append(UserDirectoryTool(tenant_id=tenant_id))
    return tools

Conversation Management

Store conversations so users can return to previous threads. Use a simple schema with tenant isolation built in.

# SQLAlchemy model for chat history
from sqlalchemy import Column, String, Text, DateTime, ForeignKey
from sqlalchemy.dialects.postgresql import UUID
import uuid
from datetime import datetime

class ChatConversation(Base):
    __tablename__ = "chat_conversations"
    id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
    tenant_id = Column(UUID(as_uuid=True), nullable=False, index=True)
    user_id = Column(UUID(as_uuid=True), ForeignKey("users.id"), nullable=False)
    title = Column(String(255))
    created_at = Column(DateTime, default=datetime.utcnow)

class ChatMessage(Base):
    __tablename__ = "chat_messages"
    id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
    conversation_id = Column(UUID(as_uuid=True),
                             ForeignKey("chat_conversations.id"), nullable=False, index=True)
    role = Column(String(20), nullable=False)
    content = Column(Text, nullable=False)
    created_at = Column(DateTime, default=datetime.utcnow)

FAQ

How do I prevent the AI from leaking data between tenants?

Every database query and tool invocation must be scoped by tenant_id. Pass the tenant ID from the authenticated session into every tool constructor, and add it as a mandatory WHERE clause. Never rely on the LLM to filter data — enforce it at the data access layer.

Should I use WebSockets or HTTP streaming for chat?

WebSockets are better for bidirectional, long-lived conversations where the server might push updates (typing indicators, tool progress). HTTP streaming with Server-Sent Events works well if your infrastructure does not support WebSocket scaling. For most SaaS products, WebSockets provide the best user experience.

How do I handle rate limiting for the AI chat?

Implement rate limiting at two levels: per-user message rate (e.g., 20 messages per minute) and per-tenant token budget (e.g., 100,000 tokens per day). Track usage in Redis with sliding window counters and return clear error messages when limits are hit.

#AIChat #SaaS #WidgetArchitecture #ContextInjection #Python #TypeScript #AgenticAI #LearnAI #AIEngineering

Adding AI Chat to Your SaaS Product: Architecture and Implementation Guide

Why AI Chat Belongs Inside Your Product

Architecture Overview

Frontend Widget Design

Permission-Scoped Data Access

Conversation Management

FAQ

How do I prevent the AI from leaking data between tenants?

Should I use WebSockets or HTTP streaming for chat?

How do I handle rate limiting for the AI chat?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding