Technical Specifications
This page documents the exact media parameters, codec support, streaming constraints, recording behaviour, and infrastructure details of EnableX Video. Use it as a reference when designing your integration, sizing bandwidth, configuring quality layers, or troubleshooting media quality.
Video Streaming Quality
EnableX Video supports three quality tiers for camera streams: HD, SD, and LD. Each tier defines a maximum and minimum resolution at a fixed 26 FPS frame rate. The actual quality delivered to each subscriber is selected automatically based on available bandwidth — you specify the quality layers you want to publish; EnableX handles the rest.
| Tier | Label | Maximum Resolution | Minimum Resolution | Frame Rate |
|---|---|---|---|---|
| HD | 720 pixels | 1280 × 720 | 320 × 180 | 26 FPS |
| SD | 480 pixels | 640 × 480 | 320 × 180 | 26 FPS |
| LD | 240p / 180 pixels | 640 × 360 | 80 × 45 | 26 FPS |
Video Layers (Simulcast)
Simulcast lets a publisher send multiple quality versions of their stream simultaneously. Subscribers with high-bandwidth connections receive HD; those on slower connections automatically receive SD or LD — all from a single published stream. You configure the number of layers when initialising the local stream.
- 1 Layer — HD only (720p). Suitable for small rooms or fixed high-bandwidth environments.
- 2 Layers — HD (720p) + SD (480p). Recommended for most group sessions.
- 3 Layers — HD (720p) + SD (480p) + LD (240p/180p). Best for large, bandwidth-heterogeneous audiences.
Codec Support
Video Codec
EnableX uses VP8 as the standard video codec across all platforms and SDKs. VP8 is royalty-free, well-supported across all WebRTC-capable browsers and devices, and produces consistent quality results on constrained networks.
Audio Codec
All audio is encoded and decoded using OPUS (RFC 6716). OPUS operates at adaptive bitrates from 6 kbps to 510 kbps, has built-in Forward Error Correction (FEC), and handles packet loss gracefully — making it the ideal codec for real-time voice over varying network conditions.
Active Talker Streams
In a multi-participant session, not all streams are delivered at full quality simultaneously. EnableX processes, records, and transmits a maximum of 9 top active talker streams at any given time. The platform dynamically determines the "top 9" based on audio activity (who is currently speaking or has spoken most recently).
Streams beyond the top-9 threshold are still present in the session but are not actively forwarded to other participants until a speaker falls out of the active set. This model allows sessions with many participants to remain bandwidth-efficient without manual stream management.
Screen Share
Screen sharing transmits the presenter's display (or application window) as a dedicated stream. It uses
a separate stream slot with its own resolution and bandwidth budget — independent of the camera stream.
Enable screen sharing in your room settings with settings.screen_share: true.
| Parameter | Value |
|---|---|
| Default quality | 1080p HD |
| Maximum resolution | 1920 × 1080 @ 6 FPS |
| Transmit bandwidth | 300 KBps |
| Receive bandwidth | 300 KBps |
Canvas Streaming
Canvas Streaming allows you to publish a programmatically rendered HTML5 canvas as a video stream.
This is used for whiteboard sessions, overlays, composite video scenes, or any scenario where your
application generates the video frame rather than a camera. Enable it with
settings.canvas: true in your room configuration.
| Parameter | Value |
|---|---|
| Maximum resolution | 1280 × 720 @ 23 FPS |
| Minimum resolution | 320 × 180 @ 6 FPS |
| Transmit bandwidth | 300 KBps |
| Receive bandwidth | 300 KBps |
Recording
EnableX recording captures each participant stream individually and then composites them into a single playable video file. Understanding the recording pipeline helps you plan for storage, post-processing, and playback integration.
Recording Pipeline
- Source streams: Each individual stream in a session is recorded separately in MKV format. MKV is used for raw capture because it is resilient to connection drops mid-session.
- Post-session processing: After the session ends, EnableX transcodes and composites the individual MKV streams into a single MP4 file. The MP4 is what you download and play back.
- Maximum recording quality: Up to 720p HD (as received on the server).
- Maximum transcoded quality: 480p SD. The playable MP4 output is capped at 480p regardless of the source quality.
File Sizes
| Quality | Approximate File Size |
|---|---|
| 720p HD (source) | ~11 MB per minute |
| 480p SD (transcoded MP4) | ~4 MB per minute |
Output Format & Storage
- The final playable recording is delivered as MP4 (H.264 + AAC).
- Recordings are stored on EnableX servers and are downloadable via REST API using a signed URL.
- Retention is 90 days by default; download and archive before this window closes.
- A
recording.readywebhook is emitted when the file is available for download.
Recording Triggers
- Auto-record: Set
auto_record: truewhen creating the room. Recording starts automatically with the session. - On-demand: Start and stop recording during an active session via SDK or REST API.
Quality Adaptation
EnableX Video automatically adapts to changing network conditions without any intervention from your application code. The adaptation lifecycle works as follows:
- Auto adaptation: The platform continuously monitors available bandwidth for each participant and adjusts the received video quality (resolution and bitrate) to deliver the best experience possible within the current network constraint.
- Audio-only fallback: When available bandwidth drops below the threshold required for any video quality tier, the platform falls back to audio-only mode automatically. The participant remains in the session without any UI code change required.
- Auto restoration: When network connectivity improves, the platform automatically restores video communication, stepping the quality back up through LD → SD → HD as bandwidth allows.
Media Tunneling
EnableX Video sends real-time media over UDP for the lowest latency. When UDP is blocked (corporate firewalls, restrictive NATs), the platform falls back to a TURN relay automatically.
| Parameter | Value |
|---|---|
| Protocol | UDP (RTP) |
| Port range | 30000 – 35000 |
| Fallback | TURN server relay (configurable) |
| Signalling | WSS (WebSocket Secure) over TCP 443 |
PSTN / SIP
EnableX supports both inbound and outbound telephony integration for hybrid sessions that mix WebRTC participants with phone-line participants.
- Dial-In to Session: Participants can join a session from a regular phone by dialing a PSTN number. Audio is bridged into the video session.
- In-Session Dial-Out: The moderator or application can dial out to a phone number from within an active session, bridging the call into the room.
SIP trunking is also supported for enterprise PBX and contact-centre integrations. Contact EnableX sales for PSTN/SIP provisioning.
Platform Capabilities at a Glance
| Capability | Specification |
|---|---|
| Participants per room | Up to 100 (group mode), up to 2000 (lecture mode, view-only), 2 (p2p) |
| Session duration | Configurable per room, up to 24 hours (extendable on request) |
| Recording | Auto-record or on-demand. MP4 and WebM output. Multiple layouts (grid, spotlight, sidebar). |
| Live streaming | RTMP and HLS output to external services (YouTube Live, Twitch, etc.) |
| Screen sharing | Built-in. Presenter shares screen; other participants see screen + presenter video. |
| Chat | In-session text chat. Persistent (available in post-session transcript). Supports rich text, files, emojis. |
| Whiteboard | Collaborative whiteboard for brainstorming and teaching. Recordable. |
| Breakout rooms | Facilitator can divide participants into subgroups. Automated regrouping. |
| Transcription | Live and post-session speech-to-text. Multiple language support. Timestamps and speaker identification. |
| Metadata and CDR | Participant list, duration per participant, bitrate, latency, packet loss, connection changes, platform/browser info. |
| Webhooks | Real-time event notifications: session start/end, participant join/leave, recording ready, transcription ready. |
| Global deployment | Media servers in 50+ countries. Auto-routing by geography. No manual region selection. |
| Security | DTLS-SRTP for media, WSS for signalling, HTTPS for API. TLS 1.2+. Support for end-to-end encryption (E2EE) on request. |
Room Modes: Detailed Comparison
| Aspect | Group (SFU) | Lecture (MCU) | P2P |
|---|---|---|---|
| Max participants | 100 | 2000 (view-only) | 2 |
| Media routing | SFU: each participant publishes one stream, SFU forwards to others | MCU: streams composited into single output | Direct peer-to-peer |
| Publish capability | All participants can publish | Only moderators publish by default; participants with floor access can publish | Both participants can publish |
| Roles | Participant (all equal) | Moderator (publishes, controls) and Participant (receives, requests floor) | Symmetric (both equal) |
| Typical bandwidth per participant | 500-1500 kbps (incoming, varies by participant count) | 500-2000 kbps (incoming, depends on layout) | 300-1000 kbps (direct, low latency) |
| Latency | 100-300 ms (SFU relay) | 200-500 ms (MCU processing) | 20-100 ms (direct peer) |
| Moderator controls | Mute participant audio/video, remove participant, lock room | Mute, remove, grant floor access, control recording, switch layouts | None (symmetric) |
| Use cases | Team meetings, collaborative sessions, group interviews, peer support | Webinars, live streams, large classes, town halls, broadcasts | 1-on-1 support, pair interviews, doctor-patient, sales calls |
| Recording layouts | Grid (all participants), gallery (featured + grid) | Grid, spotlight, sidebar, custom layouts | Side-by-side |
| Screen sharing | Yes (presenter screen + video visible to all) | Yes (screen + layout adjusts) | Yes (both see each other and shared screen) |
| Scaling recommendations | Keep under 50 participants for optimal quality; beyond 50, consider lecture mode | Optimized for 100s to 1000s; no performance degradation with participant count | Always 2; no scaling needed |
Session Behavior and Lifecycle
Session Start
A session starts when the first participant joins a room with a valid token. At this moment:
- A session instance is created (assigned a
session_id). - A
session.startedwebhook is emitted to your server. - CDR recording begins (start time, initial participant list).
- If auto-record is enabled, recording begins.
- If live streaming is enabled, RTMP/HLS broadcast begins.
Session Active
While the session is active:
- Participants can publish/subscribe media (mode-dependent).
- Participants can send chat, share screens, collaborate on whiteboard, request floor access.
- Transcription (if enabled) is continuously processed.
- Quality metrics are sampled (bandwidth, latency, packet loss, jitter).
- Webhooks are emitted for participant join/leave events.
- Moderators can mute, remove participants, switch recording layouts.
Session End
A session ends when one of these conditions is met:
- The last participant leaves voluntarily.
- The session duration limit is reached (configured per room, default or custom). All participants are auto-disconnected.
- The room is explicitly deleted via API (all active sessions end).
- A moderator ends the session via SDK or dashboard.
When a session ends:
- Media streams stop.
- A
session.endedwebhook is emitted with final metadata. - Recording is finalized (if enabled).
- Live stream is stopped.
- CDR is finalized and marked as complete.
- Post-session data becomes available (recordings, transcripts, final metrics) within ~5 minutes.
Post-Session Data Availability
After a session ends, post-session data is available for 90 days by default. This includes:
- CDR (full participant list, durations, quality metrics).
- Recording (MP4/WebM file).
- Chat transcript.
- Audio transcript (if transcription enabled).
- Session metadata (participants, duration, mode, settings).
Retrieve this data via REST API or via webhook event. Plan your storage and archival strategy accordingly.
Encryption and Protocols
Protocols
- WebRTC (RTP over UDP): Bi-directional media. Adaptive bitrate, FEC (forward error correction), jitter buffers.
- WSS (WebSocket Secure): Signalling (SDP, ICE candidates). TLS-encrypted, runs over TCP, traverses most firewalls.
- HTTPS: All Video API calls. TLS 1.2 or higher.
Encryption and Security
- Media encryption: DTLS-SRTP. All RTP media is encrypted using keys negotiated via DTLS. Verified via SRTP authentication tags.
- Signalling encryption: WSS (WebSocket over TLS). All SDP and ICE candidate exchange is encrypted.
- API encryption: HTTPS. All REST API calls use TLS 1.2 or higher.
- Certificate pinning: Optional. If your app requires pinning, contact EnableX for our certificate chain.
- End-to-End Encryption (E2EE): Available upon request for HIPAA/regulated use cases. Ensures media is encrypted before leaving the client; EnableX never has plaintext access.
Media quality adapts automatically to available bandwidth. See the Quality Adaptation section above for details.
SDK and Platform Support
EnableX provides native SDKs for Web (JavaScript), iOS, Android, React Native, Flutter, and Cordova. Native SDKs use the platform's WebRTC implementation directly, providing better performance and battery efficiency. See the Video SDK overview and the Browser Compatibility guide for full support matrices.
Scalability and Performance
Horizontal Scaling
EnableX media servers are deployed globally and scale horizontally. You do not provision or manage capacity:
- Automatic scaling: As you add rooms and sessions, our infrastructure scales transparently. No configuration, no quota requests.
- Geographic distribution: Media servers are deployed in 50+ countries. Clients automatically route to the nearest server. Latency is minimized without manual intervention.
- No region selection: You do not specify regions when creating rooms. The system handles routing.
- Multi-region redundancy: If a region fails, users are automatically rerouted to a neighboring region.
Performance Characteristics
| Metric | Typical Value | Note |
|---|---|---|
| Connection setup time | 2-5 seconds | From SDK join() call to media flowing. Varies by network and distance to media server. |
| One-way latency | 50-150 ms | P2P: 20-100 ms. SFU: 100-300 ms. MCU: 200-500 ms. Depends on geography and network path. |
| Session start time | <100 ms | From first participant join to session.started webhook. |
| Participant add/remove time | <200 ms | Media streams added/removed. Renegotiation for other participants. |
| Recording initialization | 1-3 seconds | Auto-record starts when session starts. First frames may be brief. |
| Post-session data availability | ~5 minutes | Recording, transcripts, CDR available via API ~5 min after session ends. |
Capacity Limits
- Rooms per app: Unlimited. You can create as many rooms as needed.
- Sessions per room: Unlimited. A room can host multiple sequential sessions over time.
- Participants per session: 100 (group), 2000 (lecture), 2 (p2p). Hard limits enforced by the platform.
- Concurrent sessions: Depends on plan. Contact EnableX for limits specific to your pricing tier.
- API rate limits: 100 requests per second per App ID (standard tier). Higher limits available on enterprise plans.
Recording Specifications
Formats and Output
| Format | Container | Video Codec | Audio Codec | Use Case |
|---|---|---|---|---|
| MP4 | ISO Base Media | H.264 | AAC | Web playback, email, mobile sharing. Widely compatible. |
| WebM | Matroska | VP8 or VP9 | Opus | Web-native, lower file size, good for streaming. |
Recording Layouts
- Grid: All active participants in a grid. Scales automatically as participants join/leave.
- Spotlight: One participant full-screen, others in a strip below (lecture mode). Moderator can switch spotlight.
- Sidebar: Main speaker full-screen on left, others in a sidebar on right.
- Custom: Custom layouts available for enterprise customers. Contact EnableX for details.
Recording Resolutions
- 480p: 854x480, 30 fps. Suitable for mobile playback, lower bandwidth, smaller files.
- 720p: 1280x720, 30 fps. Balanced quality and file size. Recommended for most use cases.
- 1080p: 1920x1080, 30 fps. High quality. Larger file size. For screen sharing and detailed content.
You specify resolution per room during creation or update.
Storage and Retrieval
- Hosted storage: Recordings are stored on EnableX servers by default. Retention: 90 days. Downloadable via REST API.
- Download: Recordings are available as direct HTTP downloads (signed URLs, expire in 24 hours). Stream or save to your storage.
- Webhook notification: When recording is ready, a
recording.readywebhook is sent with download URL and metadata. - Custom storage: Enterprise customers can configure S3 bucket sync. Recording files are automatically uploaded to your S3 bucket.
- Lifecycle policies: Set up automatic deletion or archival based on retention policies.
Recording Triggers
- Auto-record: Set
auto_record: trueduring room creation. Recording starts automatically when session starts, ends when session ends. - On-demand recording: Start/stop recording via REST API or SDK during an active session. Useful for selective recording or recording only certain segments.
Limits and Quotas Summary
| Resource | Limit | How to Request Increase |
|---|---|---|
| Rooms per app | Unlimited | N/A |
| Sessions per room | Unlimited | N/A |
| Participants per group session | 100 | Not extendable (use lecture mode instead) |
| Participants per lecture session | 2000 | Not extendable (architecture limit) |
| Session duration | 24 hours (configurable) | Contact support for longer sessions |
| Concurrent active sessions | Plan-dependent | Upgrade plan or contact sales |
| API requests per second | 100 (standard tier) | Upgrade plan or contact sales |
| Post-session data retention | 90 days | Contact sales for extended retention |
| Recording storage | Plan-dependent | Upgrade plan or configure external S3 |