Platform
How CallSphere Works
Technical reference for the CallSphere AI voice and chat agent platform. Architecture, pipeline, execution model, and safety controls.
System Definition
What It Is
An agentic AI platform that conducts voice and chat conversations with customers. It understands intent, executes actions via tools, and responds in natural language.
What It Replaces
IVR phone trees, rule-based chatbots, and after-hours voicemail. Handles tasks that previously required a human agent for each interaction.
What It Does Not Replace
Human agents for complex escalations, licensed professionals (legal, medical), and empathy-primary interactions. CallSphere augments your team, not eliminates it.
Mechanistic Workflow
Every voice interaction follows these 9 steps from inbound signal to response delivery.
Inbound Signal
Call arrives via SIP trunk, WebRTC, or WebSocket. The transport layer establishes a bidirectional audio stream.
ASR Transcription
Automatic speech recognition converts audio to text in real time. Supports 57 languages with speaker diarization.
Turn Detection
Voice activity detection (VAD) and endpointing determine when the caller has finished speaking. Silence threshold: 600ms configurable.
Intent Recognition
The LLM analyzes the transcript against the system prompt and conversation history to identify caller intent.
Tool Selection
Based on intent, the LLM selects zero or more tools from the agent's allowlist. Tool definitions include name, description, and parameter schema.
Tool Execution
Selected tools execute against external APIs (CRM, calendar, payment processor). Results return as structured JSON.
Response Generation
The LLM composes a natural-language response incorporating tool results, conversation context, and guardrail constraints.
TTS Synthesis
Text-to-speech converts the response to audio. Voice, speed, and tone are configurable per agent.
Delivery
Audio streams back to the caller. Barge-in detection allows the caller to interrupt at any point, restarting from step 3.
Agent Architecture
The platform is organized into 6 layers. Each layer is independently replaceable.
Transport
WebRTC, SIP, WebSocket, PSTN. Manages bidirectional audio/text streams and session lifecycle.
Speech
ASR (speech-to-text), TTS (text-to-speech), VAD (voice activity detection), endpointing, barge-in handling.
Reasoning
LLM with system prompt, conversation history, and structured output. Supports GPT-4o, Claude, and Gemini.
Actions
Tool calling engine. Executes API calls, database queries, and workflow triggers based on LLM decisions.
Safety
Guardrails, PII redaction, topic deny-lists, confidence thresholds, and escalation triggers.
Integrations
CRM, calendar, payments, ticketing, knowledge base, and custom webhook connectors.
Voice Pipeline
| ASR Providers | Deepgram, Google Speech, Whisper |
| ASR Latency | ~300ms per utterance |
| Turn Detection | VAD + endpointing, 600ms configurable silence threshold |
| Total Latency Budget | <1.5 seconds end-to-end |
| TTS Providers | ElevenLabs, Google TTS, Azure Neural TTS |
| Interruption Handling | Barge-in restarts pipeline from turn detection |
| Languages | 57 languages with accent-aware models |
Action Execution Model
Agent actions fall into 4 modes depending on risk and reversibility.
Deterministic
Fixed-logic actions like looking up business hours or reading a menu. No LLM reasoning required.
API Call
Agent invokes an external API (e.g., book appointment, check inventory). Parameters are extracted from conversation context.
Approval-Required
Agent proposes an action and waits for caller confirmation before executing. Used for payments and irreversible operations.
Human Handoff
Agent transfers the call to a human operator with full conversation context. Triggered by policy rules or caller request.
Enterprise Safety & Control
- Tool allowlists per agent prevent unauthorized actions
- Topic deny-lists block discussion of excluded subjects
- PII redaction masks sensitive data before storage
- Confidence thresholds trigger escalation when the agent is uncertain
- Turn limits prevent infinite conversation loops
- Rate limiting protects against abuse
- Immutable audit logs record every action and tool invocation
- HIPAA, PCI-DSS, and GDPR compliance controls available
When Not to Use CallSphere
CallSphere is not suitable for every use case. Do not use it for:
- Legal disputes requiring licensed legal counsel
- Situations where callers expect a named individual
- Clinical decisions requiring licensed medical sign-off
- Empathy-primary interactions (grief counseling, crisis lines)
- Environments without internet connectivity