Skip to content
Technical Guides
Technical Guides16 min read1 views

SIP Trunking for AI Voice Agents: Carrier Selection and Architecture

A technical guide to SIP trunking for AI voice agents — carrier comparison, codec selection, and high-availability patterns.

Why SIP trunking still matters

Most teams starting with AI voice agents buy a Twilio number and stop thinking about telephony. That works until you need to port 300 existing DIDs, attach an AI agent to an on-prem PBX, or dial into a country where your preferred CPaaS has terrible termination rates. At that point you are in SIP trunking territory, and the decisions you make about carriers, codecs, and failover will dictate your voice quality for years.

This is a technical guide to wiring SIP trunks into an AI voice agent stack. It covers the carrier comparison I wish I had when I started, the codec tradeoffs that matter, and the high-availability patterns that keep calls flowing when one carrier goes dark.

on-prem PBX / softswitch
   │ SIP INVITE
   ▼
Primary SIP trunk (carrier A)
   │
   ▼
SBC (session border controller)
   │ PCM16
   ▼
AI voice agent edge

Architecture overview

┌──────────┐      ┌──────────┐      ┌────────────┐
│ Carrier A│──┐   │ Carrier B│──┐   │ Carrier C  │
└──────────┘  │   └──────────┘  │   └────────────┘
              ▼                 ▼           │
        ┌────────────────────────────┐      │
        │        Dual SBCs           │◄─────┘
        │ (active/active failover)   │
        └────────────┬───────────────┘
                     │ RTP / PCM16
                     ▼
        ┌────────────────────────────┐
        │ AI voice agent edge        │
        │ (FastAPI + Realtime API)   │
        └────────────────────────────┘

Prerequisites

  • Accounts with at least two SIP carriers (Twilio Elastic SIP Trunking, Bandwidth, Telnyx, or similar).
  • An SBC — cloud (Twilio, Telnyx) or self-hosted (Kamailio, OpenSIPS, FreeSWITCH).
  • A public IP or SRV record that the carriers can reach.
  • Familiarity with SIP methods (INVITE, ACK, BYE) and SDP.

Step-by-step walkthrough

1. Choose your codec strategy

For AI voice agents, stick with G.711 ulaw (8kHz) or Opus (16-48kHz). Avoid G.729 unless you are forced into it — the compression artifacts confuse speech recognition.

Codec Bandwidth Quality for STT Notes
G.711 64 kbps Good Universal, carrier default
Opus 6-64 kbps Excellent Not all carriers support it end-to-end
G.729 8 kbps Poor Avoid for AI agents

2. Configure carrier authentication

Most carriers support IP-based auth or SIP digest. IP-based is simpler but requires a static egress IP.

; Kamailio example: accept INVITEs from carrier A's IP range
if (src_ip == 198.51.100.0/24) {
    xlog("L_INFO", "Call from carrier A\n");
    route(FORWARD_TO_EDGE);
}

3. Bridge SIP to your edge with a media gateway

Use FreeSWITCH or a cloud SBC to terminate SIP and emit PCM16 frames over a WebSocket or RTP stream your edge can consume.

<!-- FreeSWITCH dialplan -->
<extension name="ai_agent_bridge">
  <condition field="destination_number" expression="^\+1([0-9]{10})$">
    <action application="answer"/>
    <action application="set" data="media_webhook_url=wss://edge.yourapp.com/sip"/>
    <action application="audio_fork" data="wss://edge.yourapp.com/sip"/>
  </condition>
</extension>

4. Consume audio on the edge

import WebSocket from "ws";

const server = new WebSocket.Server({ port: 8080, path: "/sip" });

server.on("connection", (sock) => {
  const oai = new WebSocket(
    "wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2025-06-03",
    { headers: { Authorization: "Bearer " + process.env.OPENAI_API_KEY, "OpenAI-Beta": "realtime=v1" } },
  );

  sock.on("message", (frame) => {
    oai.send(JSON.stringify({ type: "input_audio_buffer.append", audio: frame.toString("base64") }));
  });

  oai.on("message", (raw) => {
    const evt = JSON.parse(raw.toString());
    if (evt.type === "response.audio.delta") {
      sock.send(Buffer.from(evt.delta, "base64"));
    }
  });
});

5. Add a second carrier for failover

Configure your SBC to route primary traffic through carrier A and automatically fall back to carrier B on SIP 5xx responses or RTP timeouts.

6. Monitor with Homer or sngrep

SIP debugging is a full-time job without a packet capture tool. Homer captures every SIP message and lets you reconstruct a call flow after the fact.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Production considerations

  • Latency: SIP adds 20-100ms versus a direct CPaaS WebSocket. Budget for it.
  • NAT traversal: use a public SBC IP; do not put carriers behind 1:1 NAT without testing.
  • DTMF: prefer RFC 2833 over inband. Inband DTMF corrupts AI transcription.
  • RTP inactivity timeout: set to 30-60s to detect silent failures.
  • Billing reconciliation: carriers disagree with your CDRs. Keep your own call log authoritative.

CallSphere's real implementation

CallSphere primarily uses Twilio for telephony with WebRTC for in-browser testing, and for enterprise customers with existing telecom infrastructure we bridge SIP trunks to the same edge service that handles native Twilio Media Streams. The edge runs Python FastAPI and forwards PCM16 at 24kHz to the OpenAI Realtime API with gpt-4o-realtime-preview-2025-06-03 and server VAD.

The multi-agent topologies vary by vertical — 14 tools for healthcare, 10 for real estate, 4 for salon, 7 for after-hours escalation, 10 plus RAG for IT helpdesk, and an ElevenLabs + 5 GPT-4 specialist pod for sales — but they all share the same carrier-agnostic audio plane, which means a new SIP carrier is a config change, not a rewrite. CallSphere supports 57+ languages with under one second of end-to-end response time on live traffic.

Common pitfalls

  • Mixing G.729 with STT: recognition accuracy drops 10-20 points.
  • Inband DTMF: tones leak into the audio and confuse the LLM.
  • Single carrier: when they have an outage, you have an outage.
  • Skipping the SBC: you need it for topology hiding and codec negotiation.
  • Forgetting about emergency calls: if you handle 911, you need a separate E911 provider.

FAQ

Is Twilio Elastic SIP Trunking enough for production?

Yes for most teams. It handles failover, has good global coverage, and integrates cleanly with Twilio's programmable voice.

Can I use Asterisk instead of FreeSWITCH?

Yes, but FreeSWITCH has a more modern audio_fork app and better WebSocket support.

Do I need STIR/SHAKEN?

In the US and Canada, yes, for outbound calling to avoid spam labeling.

What sample rate should the SBC deliver?

Whatever the model expects. For the Realtime API, 24kHz PCM16.

How do I debug a one-way audio issue?

Capture SIP and RTP with sngrep or Wireshark and verify the SDP offered by each side. One-way audio is almost always an RTP port issue.

Next steps

Planning a telephony migration or an enterprise SIP integration? Book a demo, read the technology overview, or check the platform page.

#CallSphere #SIPTrunking #VoiceAI #Telephony #Kamailio #FreeSWITCH #Carriers

Share
C

Written by

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.