WebRTC Browser Calling for Enterprise: Complete Guide

What Is WebRTC and Why Does It Matter for Enterprise Calling

WebRTC (Web Real-Time Communication) is an open-source framework built into every major browser that enables peer-to-peer audio, video, and data communication without plugins or native app installations. For enterprise calling, this means agents can make and receive phone calls directly from a browser tab — no softphone downloads, no desktop clients, no IT provisioning headaches.

The technology has matured significantly since its introduction. As of 2026, WebRTC handles over 3 billion minutes of voice and video communication per week across all platforms, and 94% of global browser traffic supports it natively.

WebRTC Architecture for Enterprise Voice

Understanding the architecture is critical for making informed deployment decisions. A production WebRTC calling system consists of several layers:

Signaling Layer

WebRTC does not define a signaling protocol — it only handles the media transport. Your application must implement signaling to coordinate call setup, teardown, and metadata exchange. Common approaches include:

WebSocket-based signaling: The most common approach, using persistent WebSocket connections between the browser and a signaling server
SIP over WebSocket (SIP.js): Maps traditional SIP telephony signaling onto WebSocket transport, enabling interoperability with existing PBX systems
Custom REST + WebSocket hybrid: REST APIs for call initiation with WebSocket for real-time events

Media Layer

The media layer handles the actual voice data:

Codec negotiation: WebRTC supports Opus (preferred for voice, 6-510 kbps) and G.711 (legacy compatibility, 64 kbps). Opus provides significantly better quality at lower bandwidth
SRTP encryption: All WebRTC media is encrypted by default using SRTP with DTLS key exchange. There is no option to disable encryption — a significant security advantage
Adaptive bitrate: WebRTC automatically adjusts audio quality based on network conditions using congestion control algorithms (GCC — Google Congestion Control)

NAT Traversal Layer

Enterprise networks present the biggest deployment challenge for WebRTC: NAT traversal. Most corporate networks use symmetric NATs and firewalls that block direct peer-to-peer connections.

The ICE (Interactive Connectivity Establishment) framework handles this:

STUN servers: Help clients discover their public IP address and port mapping. Succeeds for approximately 85% of connections
TURN servers: Relay media through a server when direct connectivity fails. Required for roughly 15% of enterprise connections, but can reach 30-40% on restrictive corporate networks
ICE candidates: The browser gathers multiple connection candidates (host, server-reflexive, relay) and tests them in priority order

TURN Server Sizing

TURN servers are the most resource-intensive component. Each relayed call consumes:

Bandwidth: 80-100 kbps bidirectional for Opus voice
Ports: Two UDP ports per allocation (one for STUN binding, one for relay)
Memory: Approximately 2-5 KB per active allocation

For an enterprise with 200 concurrent calls where 30% require TURN relay:

60 relayed calls x 100 kbps = 6 Mbps bandwidth
60 relayed calls x 2 ports = 120 UDP ports
Recommended: 2 TURN servers (active-active) with 100 Mbps NICs and 4 GB RAM

Browser Compatibility and Codec Support

Browser	WebRTC Support	Opus	G.711	Insertable Streams
Chrome 90+	Full	Yes	Yes	Yes
Firefox 85+	Full	Yes	Yes	Yes
Safari 15+	Full	Yes	Yes	Partial
Edge 90+	Full (Chromium)	Yes	Yes	Yes
Mobile Chrome	Full	Yes	Yes	Yes
Mobile Safari	Full (iOS 15+)	Yes	Yes	Partial

Safari has historically been the most problematic browser for WebRTC. While support has improved substantially, organizations should test Safari-specific edge cases including:

Audio session interruptions on iOS (incoming calls, notifications)
Microphone permission handling differences
H.264 codec preference conflicts in video+voice scenarios

Implementing Enterprise-Grade WebRTC Calling

Step 1: Choose Your Signaling Architecture

For enterprise calling, SIP over WebSocket is the most practical choice because it enables direct interoperability with existing telephony infrastructure. Libraries like SIP.js (JavaScript) and JsSIP provide battle-tested SIP stacks that run in the browser.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

A typical signaling flow for an outbound call:

Browser sends SIP INVITE via WebSocket to your SIP proxy
SIP proxy routes the call to a PSTN gateway (or SIP trunk)
Gateway connects to the carrier network
Media flows directly between the browser and the gateway (or via TURN if needed)
Call metadata (duration, recording status) is tracked by the signaling server

Step 2: Deploy TURN Infrastructure

For enterprise deployments, self-hosted TURN servers are strongly recommended over third-party services. Coturn is the industry-standard open-source TURN server:

Recommended deployment pattern:

Minimum 2 TURN servers in each geographic region where you have agents
Use TCP 443 as a fallback transport (bypasses most firewalls)
Enable TURN over TLS for networks that inspect UDP traffic
Implement short-lived credentials (HMAC-based) rather than static passwords
Monitor allocation counts and bandwidth utilization

Step 3: Handle oEnterprise Network Challenges

Corporate networks introduce challenges that do not exist in consumer deployments:

Proxy servers: HTTP proxies can intercept WebSocket connections. Use WSS (WebSocket Secure) on port 443 to maximize compatibility
VPN split tunneling: When agents use VPNs, media may route through the VPN tunnel, adding latency. Configure split tunneling to exclude media traffic
QoS policies: Enterprise routers may not prioritize WebRTC traffic by default. Work with network teams to apply DSCP markings (EF — Expedited Forwarding) to WebRTC media
Firewall rules: At minimum, allow outbound UDP 3478 (STUN/TURN), UDP 49152-65535 (media), and TCP 443 (WSS signaling and TURN fallback)

Step 4: Implement Call Quality Monitoring

WebRTC exposes real-time statistics through the getStats() API. Key metrics to monitor:

Round-trip time (RTT): Target under 150ms for acceptable voice quality
Packet loss: Above 1% causes noticeable degradation; above 5% makes calls unusable
Jitter: Target under 30ms; WebRTC's jitter buffer compensates for up to 200ms
MOS (Mean Opinion Score): Calculate estimated MOS from RTT, jitter, and packet loss. Target 3.5+ for business calls

Platforms like CallSphere provide built-in WebRTC quality monitoring dashboards that aggregate these metrics across all active calls, alerting on degradation before agents or customers notice problems.

Scaling WebRTC to Thousands of Concurrent Calls

At scale, the architecture shifts from simple peer-to-gateway connections to a media server topology:

Selective Forwarding Unit (SFU) Architecture

For scenarios involving call recording, real-time transcription, or AI processing, route media through an SFU:

The SFU receives media from the browser and forwards it to recording/transcription services
No media mixing or transcoding — just forwarding, keeping CPU usage low
A single SFU server can handle 1,000-2,000 concurrent voice streams
Use Kubernetes or auto-scaling groups to add SFU capacity dynamically

Geographic Distribution

For global enterprises, deploy infrastructure in multiple regions:

TURN servers in each region (latency-sensitive)
SFU servers in each region (bandwidth-sensitive)
Signaling servers can be centralized with global load balancing
Use GeoDNS or anycast to route clients to the nearest infrastructure

Security Considerations for Enterprise WebRTC

WebRTC has strong security defaults, but enterprise deployments require additional measures:

Mandatory encryption: All WebRTC media uses SRTP encryption. Unlike traditional VoIP (where SRTP is optional), WebRTC cannot send unencrypted media
Certificate pinning: Validate DTLS certificates during the handshake to prevent man-in-the-middle attacks
Oobfuscated TURN credentials: Use short-lived, HMAC-signed credentials that expire after each session
Content Security Policy: Configure CSP headers to restrict which domains can initiate WebRTC connections
Oaudit logging: Log all call signaling events (INVITE, BYE, CANCEL) for compliance and forensics

Frequently Asked Questions

How does WebRTC call quality compare to traditional desk phones?

With proper infrastructure (low-latency TURN servers, QoS-enabled networks, Opus codec), WebRTC call quality matches or exceeds traditional desk phones. The Opus codec at 24 kbps delivers better perceived quality than G.711 at 64 kbps due to its wideband frequency range (50 Hz to 20 kHz versus 300 Hz to 3.4 kHz for G.711). The primary quality variable is the network — corporate Wi-Fi with proper QoS delivers excellent results, while congested networks without traffic prioritization can cause degradation.

What bandwidth does each WebRTC voice call require?

A single WebRTC voice call using the Opus codec requires 30-80 kbps bidirectional, depending on the configured bitrate and network conditions. With overhead (SRTP, UDP, IP headers), plan for approximately 100 kbps per direction per call. For 100 concurrent calls, you need 20 Mbps of dedicated bandwidth. This is significantly less than video calls, which require 1.5-4 Mbps per participant.

Can WebRTC calls connect to regular phone numbers (PSTN)?

Yes. WebRTC calls connect to the PSTN through a SIP-to-PSTN gateway. The browser establishes a WebRTC media session with the gateway, which then bridges to the carrier network using SIP trunking. CallSphere handles this gateway infrastructure transparently — agents make calls from their browser and recipients see a standard phone call from a regular phone number.

How do I handle WebRTC call recording for compliance?

WebRTC call recording is typically implemented server-side by routing media through a recording-capable media server (SFU). The media server forks the audio stream to a recording pipeline while forwarding it to the far end. This approach is more reliable than client-side recording (MediaRecorder API), which can be affected by browser tab switching, device sleep, or network interruptions. Recorded audio should be encrypted at rest and stored in a compliance-approved location with proper retention policies.

What happens to WebRTC calls when the network connection is unstable?

WebRTC has built-in resilience mechanisms: the jitter buffer absorbs short packet delays (up to 200ms), Forward Error Correction (FEC) recovers from moderate packet loss (up to 10-15%), and ICE restart automatically renegotiates the connection path if the network interface changes (for example, Wi-Fi to cellular). For enterprise deployments, implementing a reconnection handler in your signaling layer that detects ICE failures and automatically reinitiates the call provides the best user experience.