Building a Custom Calling Platform: Enterprise Guide
Evaluate build vs buy for enterprise calling platforms. Architecture patterns, SIP infrastructure, WebRTC, cost models, and timeline estimates for custom telephony systems.
When Enterprises Consider Building Custom Calling Platforms
The decision to build a custom calling platform rather than purchasing an off-the-shelf solution is one of the most significant technology investments an enterprise can make. It involves deep architectural decisions, substantial engineering investment, and ongoing operational commitment. Yet for certain organizations — those with unique workflow requirements, massive scale, strict compliance needs, or competitive differentiation tied to their communications infrastructure — a custom build can deliver significant long-term value.
This guide provides a comprehensive technical and financial framework for enterprise CTOs and engineering leaders evaluating the build-vs-buy decision, including architecture patterns, technology choices, cost models, and realistic timeline estimates.
The Build vs Buy Decision Framework
Before diving into technical architecture, apply this framework to determine whether building is justified:
Build When:
- Your calling workflows are genuinely unique and cannot be configured within existing platforms (not just "we want it slightly different")
- Call volume exceeds 10 million minutes/month, where per-minute pricing from commercial platforms becomes prohibitively expensive
- You need deep integration with proprietary systems that commercial APIs cannot support
- Regulatory requirements demand complete control over call data routing, storage, and processing
- Telephony is a core competitive differentiator (you are a communications company or communications is central to your product)
Buy When:
- Your requirements can be met by configuring an existing platform (most businesses overestimate their uniqueness)
- You lack a dedicated telecom engineering team (minimum 5-8 engineers for a production telephony platform)
- You need to be operational within 3-6 months (custom builds typically take 12-18 months to production-ready)
- Your call volume is under 1 million minutes/month (commercial platforms are more cost-effective at this scale)
- Telephony is an operational tool, not a product differentiator
Hybrid Approach: CPaaS + Custom Logic
The most common enterprise approach in 2026 is building custom application logic on top of Communications Platform as a Service (CPaaS) infrastructure:
- Use CPaaS providers (Twilio, Vonage, Bandwidth, Telnyx, SignalWire) for PSTN connectivity, number management, and media handling
- Build custom routing logic, IVR flows, analytics, and integrations in your own application layer
- This approach delivers 80% of the control of a full custom build at 30% of the cost and timeline
Core Architecture Components
A production calling platform consists of several interconnected subsystems:
1. PSTN Connectivity Layer
This is the foundation — how your platform connects to the public telephone network.
- SIP Trunking: The standard protocol for connecting VoIP systems to PSTN. Major providers include Bandwidth, Telnyx, Twilio, Vonage, and regional carriers
- Session Border Controllers (SBCs): Security and interoperability devices that sit between your platform and SIP trunk providers. Handle NAT traversal, protocol normalisation, TLS/SRTP encryption, and DoS protection. Leading options: Oracle SBC, Ribbon SBC, Kamailio (open source), OpenSIPS (open source)
- Number Management: Provisioning, porting, and managing local, toll-free, and international phone numbers. This is typically sourced from carrier partners or CPaaS providers
Architecture Decision: Build vs CPaaS for PSTN Connectivity
| Factor | Self-Managed SIP | CPaaS (Twilio/Telnyx) |
|---|---|---|
| Setup time | 3-6 months | Days to weeks |
| Per-minute cost (US local) | $0.003 - $0.008 | $0.008 - $0.015 |
| Number provisioning | Manual carrier relationships | API-driven, instant |
| Geographic coverage | Requires per-country carrier contracts | 100+ countries via API |
| SBC management | Your responsibility | Provider-managed |
| Regulatory compliance | You handle it | Shared responsibility |
| Engineering headcount | 2-3 dedicated engineers | 0 (API integration) |
At 10M+ minutes/month, self-managed SIP trunking saves $50,000-$70,000/month versus CPaaS pricing, justifying the engineering investment. Below that threshold, CPaaS is almost always more cost-effective.
2. Media Server Layer
The media server handles real-time audio processing:
- Media mixing: Conference calling, call recording, music on hold
- Codec transcoding: Converting between audio codecs (G.711, G.729, Opus) for interoperability
- DTMF detection: Processing touch-tone inputs for IVR
- Speech processing: Integration with ASR (speech-to-text) and TTS (text-to-speech) engines
Technology Options:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
- FreeSWITCH: Open-source telephony platform with comprehensive media handling. Widely used in custom platforms. Handles 500-1,000+ concurrent calls per server
- Asterisk: Open-source PBX with media capabilities. Better suited for smaller deployments (100-300 concurrent calls per server)
- Janus: Lightweight, open-source WebRTC server. Excellent for browser-based calling
- Oortzi/MediaSoup: Modern WebRTC SFU (Selective Forwarding Unit) options for browser-based media
- Commercial: Oortzi, Oortzi, and other vendors offer managed media processing
3. Signalling and Call Control Layer
This layer manages call setup, teardown, routing, and state management:
- SIP Proxy/Registrar: Handles SIP signalling. Kamailio and OpenSIPS are the industry-standard open-source options, capable of handling 10,000+ calls per second
- Call Routing Engine: Custom business logic that determines how calls are routed based on rules, time-of-day, agent availability, skills, and AI predictions
- Queue Management: ACD (Automatic Call Distribution) logic for contact center use cases
- WebSocket Signalling: For browser-based (WebRTC) endpoints, SIP over WebSocket (RFC 7118) or custom signalling protocols
4. WebRTC Layer
For browser and mobile-based calling without plugins:
- SRTP media: Encrypted real-time audio/video
- STUN/TURN servers: NAT traversal for peer-to-peer and relayed media. coturn is the standard open-source TURN server
- Codec negotiation: Opus codec for high-quality audio at variable bitrates
- Connectivity: ICE (Interactive Connectivity Establishment) framework for reliable connection setup across diverse network conditions
5. Data and Analytics Layer
- CDR (Call Detail Records): Structured records of every call — essential for billing, compliance, and analytics. Store in a time-series or columnar database (TimescaleDB, ClickHouse) for efficient querying
- Real-time analytics: WebSocket-based dashboards showing live call counts, queue depths, agent status. Consider Redis Pub/Sub or Apache Kafka for real-time event streaming
- Recording storage: Call recordings consume significant storage (approximately 0.5 MB/minute for mono G.711, 0.1 MB/minute for compressed Opus). Plan for object storage (S3, GCS, MinIO) with lifecycle policies
- AI/ML pipeline: Integration points for real-time transcription (Whisper, Deepgram, Google STT), sentiment analysis, and conversation intelligence
Infrastructure and Scaling Patterns
Horizontal Scaling Architecture
A production calling platform must handle varying load. The typical architecture uses:
- Stateless SIP proxies: Kamailio/OpenSIPS instances behind a load balancer. Scale horizontally by adding instances
- Stateful media servers: FreeSWITCH instances handle active calls. Each server handles 500-1,000 concurrent calls. Scale by adding servers with intelligent load distribution
- Database layer: PostgreSQL for configuration and CDR storage, Redis for real-time state (active calls, agent status), ClickHouse or TimescaleDB for analytics
- Message queue: NATS or Kafka for event distribution between components
Capacity Planning Benchmarks
| Component | Capacity per Instance | Cost (Cloud VM) |
|---|---|---|
| Kamailio SIP proxy | 5,000-10,000 CPS | $200-$400/month |
| FreeSWITCH media server | 500-1,000 concurrent calls | $400-$800/month |
| TURN server (coturn) | 200-500 relayed sessions | $200-$400/month |
| PostgreSQL (CDR storage) | 50M records/month | $500-$1,000/month |
| Redis (real-time state) | 100K concurrent sessions | $200-$400/month |
| Recording storage (S3) | 1TB = ~33,000 call-hours | $23/TB/month |
High Availability Design
- Deploy across multiple availability zones (minimum 2, preferably 3)
- SIP proxy clustering with shared registration state
- Media server failover with call preservation (or graceful re-establishment)
- Database replication with automatic failover
- Geographic redundancy for disaster recovery (active-active or active-passive across regions)
Cost Model: Build vs Buy vs Hybrid
Scenario: Enterprise with 5 million minutes/month, 500 concurrent agents
| Cost Category | Full Custom Build | CPaaS + Custom Logic | Commercial Platform |
|---|---|---|---|
| Year 1 (Build + Operate) | |||
| Engineering team (8 FTE) | $1,200,000 | $600,000 (4 FTE) | $0 |
| Infrastructure | $180,000 | $120,000 | $0 (included) |
| PSTN/SIP costs | $180,000 | $450,000 | Included in per-seat |
| Software licenses | $50,000 (open source + tools) | $20,000 | $0 |
| Platform licensing | $0 | $0 | $1,800,000 ($300/seat) |
| Year 1 Total | $1,610,000 | $1,190,000 | $1,800,000 |
| Year 2+ (Operate Only) | |||
| Engineering team (5 FTE) | $750,000 | $400,000 (3 FTE) | $0 |
| Infrastructure | $180,000 | $120,000 | $0 |
| PSTN/SIP costs | $180,000 | $450,000 | Included |
| Platform licensing | $0 | $0 | $1,800,000 |
| Year 2+ Annual | $1,110,000 | $970,000 | $1,800,000 |
Over a 5-year horizon:
- Full custom: $1.61M + (4 x $1.11M) = $6.05M
- CPaaS hybrid: $1.19M + (4 x $0.97M) = $5.07M
- Commercial platform: 5 x $1.80M = $9.00M
The hybrid CPaaS approach often delivers the best total cost of ownership for enterprises in this scale range.
Timeline and Team Requirements
Full Custom Build Timeline
| Phase | Duration | Key Deliverables |
|---|---|---|
| Architecture and design | 6-8 weeks | System design, technology selection, infrastructure planning |
| Core telephony (SIP, media) | 12-16 weeks | PSTN connectivity, basic call handling, recording |
| IVR and routing | 8-12 weeks | IVR flows, skills-based routing, queue management |
| Agent interface | 8-12 weeks | Softphone, agent dashboard, supervisor tools |
| Analytics and reporting | 6-8 weeks | CDR processing, dashboards, historical reporting |
| Integration (CRM, WFM) | 8-12 weeks | CRM connectors, WFM integration, API development |
| Testing and hardening | 8-12 weeks | Load testing, security audit, failover testing |
| Total | 14-18 months | Production-ready platform |
Minimum Team Composition
- 1 Telecom/VoIP architect (Kamailio/FreeSWITCH expertise)
- 2 Backend engineers (API, routing logic, integrations)
- 1 Frontend engineer (agent interface, dashboards)
- 1 DevOps/SRE engineer (infrastructure, monitoring, scaling)
- 1 QA engineer (telephony-specific testing)
- 1 Product manager
- 1 Engineering manager (part-time if team is experienced)
Technology Stack Recommendations
For the CPaaS Hybrid Approach (Recommended for Most Enterprises)
- PSTN connectivity: Telnyx or Bandwidth (better pricing than Twilio at enterprise volume)
- WebRTC: Janus or MediaSoup for browser-based calling
- Backend: Go or Rust for the call control layer (performance-critical), Python/Node.js for business logic and APIs
- Real-time state: Redis Cluster
- CDR/Analytics: ClickHouse for analytics, PostgreSQL for application data
- Message bus: NATS for internal eventing
- Recording storage: S3-compatible object storage with lifecycle policies
- Monitoring: Prometheus + Grafana for infrastructure, custom dashboards for call quality (MOS, jitter, packet loss)
How CallSphere Relates to Custom Builds
CallSphere occupies the middle ground between commercial platforms and full custom builds. For enterprises that need more control and customisation than a standard commercial platform but cannot justify the 14-18 month timeline and 8-person engineering team of a full custom build, CallSphere provides API-first architecture that supports deep custom integrations, white-label options for embedding calling into existing products, and webhook-driven workflows that connect to proprietary business systems.
This approach delivers the customisation enterprises need while eliminating the undifferentiated heavy lifting of PSTN connectivity, media handling, and telephony infrastructure management.
Common Pitfalls to Avoid
- Underestimating telephony complexity: SIP interoperability, codec negotiation, NAT traversal, and carrier-specific quirks consume far more engineering time than expected. Budget 30% more time than initial estimates
- Neglecting monitoring: Telephony problems are invisible without proper monitoring. Implement MOS scoring, jitter tracking, packet loss monitoring, and call quality alerting from day one
- Ignoring carrier redundancy: A single SIP trunk provider creates a single point of failure. Use at least two carriers with automatic failover
- Building too much from scratch: Use proven open-source components (Kamailio, FreeSWITCH, coturn) rather than writing SIP stacks from scratch. The telephony community has solved these problems over decades
- Skipping load testing: Telephony systems fail non-gracefully under load. Test with realistic traffic patterns including ramp-up, sustained peak, and burst scenarios. Tools: SIPp for SIP load generation, Oortzi for WebRTC load testing
FAQ
How many engineers does it take to build and maintain a custom calling platform?
A minimum viable team requires 6-8 engineers for the initial build phase (12-18 months) and 4-5 engineers for ongoing operation and feature development. The critical hire is the telecom/VoIP architect — someone with deep experience in SIP, RTP, Kamailio or FreeSWITCH, and carrier interconnection. This role is specialized and commands $180,000-$250,000 in US markets. Without this expertise, projects frequently fail or produce unreliable systems.
What is the minimum scale that justifies a custom build?
As a rule of thumb, custom builds become economically justified at approximately 5-10 million minutes per month or 500+ concurrent agents. Below this threshold, the engineering cost to build and maintain the platform exceeds the licensing savings compared to commercial platforms. The CPaaS hybrid approach lowers this threshold somewhat because you avoid the most expensive components (PSTN connectivity, media handling) while maintaining custom control over business logic.
How does call quality compare between custom and commercial platforms?
In a well-engineered custom platform, call quality can match or exceed commercial platforms because you have full control over codec selection, media routing, and quality-of-service prioritisation. However, achieving this quality requires dedicated monitoring, regular tuning, and rapid response to quality degradation. Commercial platforms handle this operationally as part of their service. If your team lacks telephony operations experience, commercial platforms will likely deliver better average call quality.
Can I migrate from a custom platform to a commercial one (or vice versa) later?
Yes, but migration is non-trivial. The most portable layer is phone numbers (number porting is well-established). IVR flows, routing logic, and integrations require rebuilding. Agent training on new interfaces takes 1-2 weeks. The most challenging aspect is typically CRM integration — custom integrations built for one platform rarely transfer directly to another. Plan 3-6 months for a full migration.
What open-source telephony projects should I evaluate?
The core open-source telephony stack in 2026 includes: Kamailio (SIP proxy, registration, routing — the gold standard for high-performance SIP), FreeSWITCH (media server, IVR, conferencing — the most capable open-source media platform), Oortzi (SIP proxy alternative with scripting), Janus (WebRTC gateway — lightweight and well-documented), coturn (TURN/STUN server for NAT traversal), and Homer (SIP capture and monitoring). These projects are production-proven at scale and have active communities and commercial support options.
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.