Voice Platform Architecture

This page explains how the EnableX Voice API works at a fundamental level: the webhook-driven control model, call legs, call states, actions, and the relationship between your server and the EnableX platform. Understanding this architecture is essential before building voice applications.

The Webhook Control Model

The Voice API is webhook-driven, not request-response. Here's the fundamental pattern:

Your App Server → POST /voice/v1/call
                  {
                    "to": "555-1234",
                    "action_on_connect": { "play": { "text": "Press 1 for sales" } },
                    "event_url": "https://yourdomain.com/voice/webhook"
                  }
                   ↓
EnableX Voice Platform dials out
                   ↓
Your Server ← POST { event: "connected", voice_id: "abc123", ... }
                   ↓
Your Server → Responds with play action (or omits — action_on_connect already set)
                   ↓
EnableX Voice Platform plays audio prompt to caller
                   ↓
(Caller presses a digit)
Your Server ← POST { event: "digitcollected", digit: "1", voice_id: "abc123", ... }
                   ↓
Your Server → Responds with: { connect: { to: "sip:[email protected]" }, voice_id: "abc123" }
                   ↓
EnableX Voice Platform bridges the call to the agent
                   ↓
(Call continues until hangup or error)
Critical: Your Server Must Respond

For every webhook event your server receives, it MUST respond with a valid JSON action within 5 seconds. If your server fails to respond, times out, or returns an error, the call may terminate. Your webhook handler must be fast and reliable. Use async processing for long-running tasks (logging to a database, calling external APIs) but respond to the webhook immediately with an action.

Why webhook-driven? Because calls happen in real-time, and you need to make decisions as events occur. Polling or batch processing won't work. By pushing events to your server, the platform keeps your application in control of every moment of the call.

Call Legs: Leg A and Leg B

Every call has two legs:

  • Leg A — The caller (the person who initiated the call or the person being called in an outbound scenario).
  • Leg B — The callee (the person receiving the call or being called in an outbound scenario).

For an outbound call:

  • Leg A = Your application (the one dialing out).
  • Leg B = The person you're calling.

For an inbound call:

  • Leg A = The external caller (the person calling in).
  • Leg B = Your application (handling the call).

Call States and Lifecycle

Every call progresses through a sequence of states. Here's the typical lifecycle:

┌────────────┐
│  initiated │  (Call created, dialing in progress)
└──────┬─────┘
       │
       ├─→ ┌────────────┐
       │   │  ringing   │  (Phone is ringing)
       │   └──────┬─────┘
       │          │
       │          ├─→ ┌─────────────┐
       │          │   │   answered  │  (Call connected, audio established)
       │          │   └──────┬──────┘
       │          │          │
       │          │          ├─→ ┌───────────────┐
       │          │          │   │  in-progress  │  (Call active, actions executing)
       │          │          │   └──────┬────────┘
       │          │          │          │
       │          │          │          ├─→ ┌─────────────┐
       │          │          │          │   │    ended    │  (Call disconnected)
       │          │          │          │   └─────────────┘
       │          │          │          │
       │          │          └──────────┘
       │          │
       │          └─→ ┌───────────────┐
       │              │  no-answer    │  (Phone didn't answer)
       │              └───────────────┘
       │
       ├─→ ┌─────────────┐
       │   │    busy     │  (Line is busy)
       │   └─────────────┘
       │
       └─→ ┌──────────────┐
           │  call-failed │  (Call failed to connect)
           └──────────────┘
State Description Webhook Event Your Response
initiated Call created, dialing in progress. (No webhook) Returned in POST /voice/v1/call response.
ringing Leg B phone is ringing, waiting for answer. call-ringing Wait for call-answered or no-answer event.
answered Call connected, audio stream established. call-answered Respond with an action (talk, play, collect, transfer, etc.).
in-progress Call active, your action is executing. Depends on action (dtmf-collected, speech-recognized, etc.) Respond with next action or wait.
ended Call disconnected (hangup, timeout, or error). call-ended Perform cleanup, logging, billing. No further actions sent.
no-answer Leg B didn't answer within timeout (default 120s). call-no-answer Hangup or retry.
busy Leg B rejected the call (busy signal). call-busy Hangup or retry.
call-failed Call failed to connect (network error, invalid number, etc.). call-failed Log error, notify user, or retry.

Webhook URL Configuration

Webhook URLs are configured per phone number in the EnableX Portal. When you provision a DID or virtual number, you specify the HTTPS endpoint that will receive all call events for that number.

For outbound calls, you can override the number's pre-configured webhook by passing an event_url directly in the API request payload. If no event_url is provided, the platform falls back to the webhook URL configured against the originating phone number.

Override webhook per call (outbound):

curl -X POST https://api.enablex.io/voice/v1/call \
  -H "Authorization: Basic $(echo -n 'APP_ID:APP_KEY' | base64)" \
  -H "Content-Type: application/json" \
  -d '{
    "to": "555-1234",
    "from": "555-5678",
    "event_url": "https://yourdomain.com/voice/webhook"
  }'

If event_url is omitted, the webhook configured for the from number in the Portal is used automatically.

Webhook Security & Reliability
  • HTTPS Required: All webhook URLs must be HTTPS (TLS 1.2+). HTTP is rejected.
  • Timeout: Your server must respond within 5 seconds or the call may terminate.
  • Publicly Reachable: EnableX servers must be able to reach your endpoint. Localhost or private IPs won't work in production. Use ngrok for local testing.
  • Retry Logic: If your webhook handler returns a non-2xx HTTP status, the platform may retry. Handle retries idempotently (store request IDs to detect duplicates).

Inbound Call Model

When a caller dials your DID number, the platform sends a incomingcall webhook to your configured endpoint. You receive the caller's number, DID, and other metadata, and you respond with an action.

From that point on, the inbound call follows the same webhook control loop as outbound calls. Your server controls everything: prompt the caller, collect input, route to an agent, record, transfer, etc.

Key differences from outbound:

  • You don't initiate the call; the caller does.
  • You receive the first webhook (incomingcall) instead of starting from initiated state.
  • You must respond to the incomingcall webhook immediately. If you don't, the call may be rejected.
  • All subsequent events and actions work identically to outbound calls.

Number Types

Different numbers serve different purposes:

Type Description Use Case
DID (Direct Inward Dial) A full phone number for incoming calls. Routed to your webhook endpoint. Inbound call center, customer support line, two-way communication.
Virtual Number A number that masks your outbound caller ID. Your customer sees your virtual number, not your real number. Outbound campaigns, marketplace calling (buyer calls seller), privacy preservation.
SIP Trunk SIP-based number for advanced use cases (PBX integration, media servers). Enterprise PBX integration, specialized telephony workflows.

Security Considerations

When building with the Voice API, follow these security practices:

  • Verify webhook origin: The platform sends a signature header in each webhook. Verify it to ensure webhooks are genuinely from EnableX.
  • Validate inputs: Always validate webhook payloads (call_id, caller ID, digits) before processing.
  • Rate limiting: Implement rate limiting on your webhook endpoint to prevent abuse.
  • Store APP_KEY securely: Use environment variables or secrets management (never hardcode).
  • Log carefully: Log call events for debugging, but avoid logging sensitive information (full caller numbers, PII).
  • HTTPS only: Ensure your webhook endpoint uses TLS 1.2+.
  • Handle timeouts gracefully: If your server is slow, the call may drop. Optimize your webhook handler.