Platform

How CallSphere Works

Technical reference for the CallSphere AI voice and chat agent platform. Architecture, pipeline, execution model, and safety controls.

System Definition

What It Is

An agentic AI platform that conducts voice and chat conversations with customers. It understands intent, executes actions via tools, and responds in natural language.

What It Replaces

IVR phone trees, rule-based chatbots, and after-hours voicemail. Handles tasks that previously required a human agent for each interaction.

What It Does Not Replace

Human agents for complex escalations, licensed professionals (legal, medical), and empathy-primary interactions. CallSphere augments your team, not eliminates it.

Mechanistic Workflow

Every voice interaction follows these 9 steps from inbound signal to response delivery.

Inbound Signal

Call arrives via SIP trunk, WebRTC, or WebSocket. The transport layer establishes a bidirectional audio stream.

ASR Transcription

Automatic speech recognition converts audio to text in real time. Supports 57 languages with speaker diarization.

Turn Detection

Voice activity detection (VAD) and endpointing determine when the caller has finished speaking. Silence threshold: 600ms configurable.

Intent Recognition

The LLM analyzes the transcript against the system prompt and conversation history to identify caller intent.

Tool Selection

Based on intent, the LLM selects zero or more tools from the agent's allowlist. Tool definitions include name, description, and parameter schema.

Tool Execution

Selected tools execute against external APIs (CRM, calendar, payment processor). Results return as structured JSON.

Response Generation

The LLM composes a natural-language response incorporating tool results, conversation context, and guardrail constraints.

TTS Synthesis

Text-to-speech converts the response to audio. Voice, speed, and tone are configurable per agent.

Delivery

Audio streams back to the caller. Barge-in detection allows the caller to interrupt at any point, restarting from step 3.

Agent Architecture

The platform is organized into 6 layers. Each layer is independently replaceable.

Transport

WebRTC, SIP, WebSocket, PSTN. Manages bidirectional audio/text streams and session lifecycle.

Speech

ASR (speech-to-text), TTS (text-to-speech), VAD (voice activity detection), endpointing, barge-in handling.

Reasoning

LLM with system prompt, conversation history, and structured output. Supports GPT-4o, Claude, and Gemini.

Actions

Tool calling engine. Executes API calls, database queries, and workflow triggers based on LLM decisions.

Safety

Guardrails, PII redaction, topic deny-lists, confidence thresholds, and escalation triggers.

Integrations

CRM, calendar, payments, ticketing, knowledge base, and custom webhook connectors.

Voice Pipeline

ASR Providers	Deepgram, Google Speech, Whisper
ASR Latency	~300ms per utterance
Turn Detection	VAD + endpointing, 600ms configurable silence threshold
Total Latency Budget	<1.5 seconds end-to-end
TTS Providers	ElevenLabs, Google TTS, Azure Neural TTS
Interruption Handling	Barge-in restarts pipeline from turn detection
Languages	57 languages with accent-aware models

Action Execution Model

Agent actions fall into 4 modes depending on risk and reversibility.

Deterministic

Fixed-logic actions like looking up business hours or reading a menu. No LLM reasoning required.

API Call

Agent invokes an external API (e.g., book appointment, check inventory). Parameters are extracted from conversation context.

Approval-Required

Agent proposes an action and waits for caller confirmation before executing. Used for payments and irreversible operations.

Human Handoff

Agent transfers the call to a human operator with full conversation context. Triggered by policy rules or caller request.

Enterprise Safety & Control

Tool allowlists per agent prevent unauthorized actions
Topic deny-lists block discussion of excluded subjects
PII redaction masks sensitive data before storage
Confidence thresholds trigger escalation when the agent is uncertain
Turn limits prevent infinite conversation loops
Rate limiting protects against abuse
Immutable audit logs record every action and tool invocation
HIPAA, PCI-DSS, and GDPR compliance controls available

When Not to Use CallSphere

CallSphere is not suitable for every use case. Do not use it for:

Legal disputes requiring licensed legal counsel
Situations where callers expect a named individual
Clinical decisions requiring licensed medical sign-off
Empathy-primary interactions (grief counseling, crisis lines)
Environments without internet connectivity