Guide — Building a Video Application

A complete EnableX video application is made up of two parts: an App Server and a Client Application. Neither works in isolation — the app server provisions the session, and the client application is what users actually interact with. Both must be in place before a user can join a live video session.

This guide walks through how the two parts fit together, how a user gets into a session, and the essential steps your client application performs from device access to disconnect. Code examples use the Web SDK. The same logical steps apply to all other SDK variants — refer to the relevant SDK reference for the exact method names.

Part 1 — The App Server

The App Server is a backend web service you build and host. It holds your EnableX API credentials (App ID and App Key) and is the only component that talks directly to the EnableX Video API. Your client applications never call the Video API directly.

The App Server is responsible for:

Credentials never reach the client

Your App ID and App Key must only ever be used on your App Server — never embedded in a client application, a browser page, or a mobile app. The client receives a token, not credentials.

Part 2 — The Client Application

The client application is what your end users run — a browser tab, an Android app, an iOS app, or a hybrid mobile app. It is built using one of the EnableX Video SDKs and handles everything the user sees and interacts with during a session.

Choosing an SDK

Select the SDK that matches the platform your application targets:

All SDK variants connect to the same EnableX infrastructure. A user on a browser, an Android app, and an iOS app can all join the same video room and communicate with each other in real time — the SDK handles platform differences transparently.

About the examples in this guide

All code examples below use the Web SDK. The same flow — request a token, connect, publish, subscribe, handle active talkers, disconnect — applies to every SDK. Refer to your platform's SDK reference for exact method signatures and parameters.

The Token Flow

Before a user can join a session, they need a token. The token flow is always the same regardless of which SDK or platform you are using:

  1. User opens your application and requests to join a session (e.g., by clicking a meeting link or entering a room code).
  2. Your client application calls your App Server — an authenticated request to your own backend endpoint (e.g., GET /api/token?roomId=…).
  3. Your App Server calls the EnableX Video API — specifically the Token route — passing the room_id, the user's role, and any user metadata. The Video API returns a signed token.
  4. Your App Server returns the token to the client — in the HTTP response body or via a push mechanism.
  5. The client application uses the token to connect — passed directly to the SDK's join/connect method. EnableX validates the token before granting access to the room.
Tokens are short-lived

Tokens carry an expiry. Request a fresh token shortly before the user connects — do not cache tokens across sessions. If a token expires before the user joins, connection will be rejected.

Session Steps

Once the client application has a token, the session follows a predictable sequence. Each step below shows the Web SDK call and links to the SDK reference for full detail.

Step 1 — Access Media Devices

Before connecting to the room, your application should confirm that the user's camera and microphone are accessible. The Web SDK provides a method to enumerate available media devices and check permissions. This allows you to surface the right UI — for example, letting the user pick a camera before joining, or gracefully handling the case where permissions are denied.

// Enumerate available media devices
EnxRtc.getDevices(function(devices) {
    // devices: { audioinput: [...], videoinput: [...], audiooutput: [...] }
    // Let the user choose or apply defaults before connecting
});

See Web SDK — Device Access for full enumeration options and permission handling.

Step 2 — Connect, Publish, and Subscribe

Joining a room, publishing your local media stream, and subscribing to remote streams all happen through a single method call: EnxRtc.joinRoom(). The SDK connects the signalling socket, negotiates media, publishes your stream, and returns the room object along with any remote streams already in the session.

// Define what you want to publish (camera + mic)
var publishOptions = {
    audio: true,
    video: true
};

// Join the room — connect, publish, and get remote streams in one call
var localStream = EnxRtc.joinRoom(TOKEN, publishOptions, function(success, error) {
    if (error) {
        // Handle connection failure — check error.code for the reason
        return;
    }

    // 'success.room' is the connected room object — save it for later method calls
    room = success.room;

    // 'success.streams' is the list of remote streams already in the room
    // Subscribe to each one so you can receive their audio and video
    success.streams.forEach(function(stream) {
        room.subscribe(stream);
    });
});

New remote streams published after you connect are delivered via the stream-added event — subscribe to those too:

room.addEventListener("stream-added", function(event) {
    room.subscribe(event.stream);
});

See Web SDK — Connecting to a Session and Web SDK — Stream Management for the full set of publish options, stream events, and subscription handling.

Step 3 — Play Streams

Once subscribed, call stream.play() to render a stream into a DOM element. This works for your local stream as well as any subscribed remote stream, screen share stream, or canvas stream.

// Play your local stream in an element with id "local-player"
localStream.play("local-player");

// Play a remote stream — call this inside stream-subscribed or active-talkers-updated
room.addEventListener("stream-subscribed", function(event) {
    event.stream.play("remote-player-" + event.stream.streamId);
});

Step 4 — Handle Active Talkers

In a multi-party session, EnableX continuously tracks who is speaking and sends the client an updated Active Talker list — up to 12 of the most actively speaking participants, ordered with the latest talker first. This is the primary signal your application uses to decide which video tiles to show on screen.

Listen to the active-talkers-updated event and re-render your video grid whenever the list changes:

room.addEventListener("active-talkers-updated", function(event) {
    var talkers = event.message.activeList;

    talkers.forEach(function(talker) {
        var stream = room.remoteStreams.get(talker.streamId);
        if (stream) {
            // Render this stream in the grid slot for this talker
            stream.play("grid-slot-" + talker.streamId);
        }
    });
});

The active talker list also includes screen share (streamId: 101) and canvas streams (streamId: 102) when those features are active.

Step 5 — Disconnect

When the user leaves the session, call room.disconnect(). The SDK releases media tracks and closes the signalling socket. Listen to room-disconnected to clean up your UI.

// Disconnect from the room
room.disconnect();

room.addEventListener("room-disconnected", function(event) {
    // Session is over — clean up the UI, navigate away, etc.
});

// Also handle remote users leaving
room.addEventListener("user-disconnected", function(event) {
    // event.message contains the disconnected user's info
    // Remove their video tile from the grid
});

See Web SDK — Session Management for moderator controls, room mode switching, and forced disconnection of participants.

What Comes Next

The steps above cover the essential session lifecycle — enough to have users joining rooms, seeing each other, and leaving cleanly. Everything beyond this is feature development driven by your use case:

All of these are covered in the SDK reference pages for your chosen platform: