OpenTok API Architecture

Traditionally, video conferencing has been limited to a standard design. The traditional video conferencing application creates a video chat room and embeds a single visual representation of that chat room into a web page.

This one-size-fits-all approach limits both the experience of users and the capabilities available to developers. The old approach constrains developers from delivering the video chat experience that makes the most sense for their sites.

The OpenTok API places control directly in the hands of the developer. The OpenTok API allows developers to make the video chat experience an integral part of their content — both in terms of layout and interaction.

The basic structure and underlying workflows of the OpenTok API are shown below.

The OpenTok API Architecture

OpenTok functionality is exposed to the developer through the OpenTok client-side libraries (for JavaScript, ActionScript, and iOS) and the OpenTok server-side libraries.

For an overview of the OpenTok API developer workflows, see OpenTok API Workflows.

Streams, Connections, and Sessions

The OpenTok API platform is built on three key concepts: streams, connections, and sessions.

  • A stream is a single audio-video signal, which includes a user's published webcam and microphone feed.
  • A connection is a logical abstraction of a single browser's interaction with a session. The connection is the mechanism through which a browser publishes streams to a session, and through which a browser subscribes to streams published by others to that session.
  • A session represents an entire video chat environment. It is a collection of connections publishing and subscribing to streams. A session also dispatches events representing changes in the session. The session exists as a logical entity in the OpenTok web service. When you connect to the session from the browser, a simplified version of the session is maintained locally.

Each user's browser will usually open a single connection to a given session. Through that connection, the browser may subscribe to one, some, or all of the audio-video streams available in the session. Using that same connection, the browser may choose to publish an audio-video stream to the session.

Note that while this usage scenario is the norm, these are not architectural constraints. A browser page may have more than one connection open at a time, whether to the same or other sessions. A browser may publish multiple streams concurrently to a session, or other sessions.

The session provides a video conferencing crossbar switch in the cloud. The developer's code on each web page decides which sessions to connect to. The developer's code also decides which streams within those sessions to display, and whether or not to publish to that session in return. This provides developers with complete control of who sees what streams on which pages or even on which web sites.

Authentication Tokens

When a browser connects to a session, it must authenticate itself by providing a server-generated token. Whereas session IDs are used to identify which conference to connect to, tokens are used for security purposes. Tokens have expiration dates and they can be assigned roles (such as publisher, subscriber, and moderator).

Every time a browser interacts with the OpenTok web service, the service checks that the authentication token is still valid and that the requested action is permitted. If the token has expired (or if it has been revoked), the action is not carried out.

When generating a token, you can associate it with a role (such as publisher or administrator). Each role corresponds to set of permitted actions. This provides the developer control over what a specific connection is allowed to do within a conference.

Publishers and Subscribers

Within a browser's connection to a session, the most important objects are publishers and subscribers, because most of the end user's experience results from the ongoing creation and destruction of publishers and subscribers.

Publishers

A publisher publishes audio-video streams to the chat session. When you instantiate a publisher, your browser notifies the session that it is now streaming a new audio-video stream. When you destroy your publisher, its stream is terminated and the session is notified appropriately.

The act of publishing a stream will be broadcast to all connections in the session using the appropriate event. In response to these events (assuming they are being listened to), the developer can choose to subscribe to a newly published stream using a subscriber (detailed in the next section).

Note that the publisher has a local visual manifestation (the video signal that you are streaming out), which will be displayed in the client's local app, based on the properties provided to the OpenTok client-side library. Usually, an app only makes use of a single publisher at a time, because generally there is only a single camera available to stream.

Subscribers

A subscriber consumes an audio-video stream in sessions, displaying it on the web page based on calls to the OpenTok client-side library.

When a web page is notified that a new stream is being published to the session (assuming it is listening for these events), it must decide whether or not to subscribe to that stream. If the application logic deems that the stream should be subscribed to, it uses the OpenTok client-side library to instantiate a subscriber for that stream.

The subscriber has a local audio and visual manifestation (the audio and video signal that is being streamed). The video stream will be displayed in the local web page based calls to the OpenTok client-side library.

Note that there is no requirement that all published streams be subscribed to, and it is up to the developer to determine what business logic makes sense for their application. In a classic multi-party videoconference, the developer could choose to subscribe to all published streams; in a speed dating application the developer may choose to only subscribe to one stream at a time.

The way in which publishers and subscribers are instantiated has a huge impact on the end user's mental model of the session. In some cases, the session acts like a small meeting room, in which everyone can see and hear everyone else. In other cases, the session acts like a huge ballroom, in which many small conversations can be conducted independently.

Client / Server Separation of Duties

The vast majority of functionality is delivered through the OpenTok client-side library. This is executed on the client browser (or in an iOS app). However, the developer's web server needs to get involved when creating new sessions or new tokens for security reasons.

Put simply, if a web page is connecting to a pre-existing session using a pre-existing security token, the server just passes the relevant session ID and token to the served web page. JavaScript on the web page can then use that session ID and token to connect to a OpenTok session.

However, if either a new token or new video chat environment are desired, then the webserver uses the RESTful interface to create them. The browser can then connect to the appropriate session using the appropriate token as it would normally.

Flash and OpenTok on WebRTC

There are two ways to develop with OpenTok. The original OpenTok libraries and the new OpenTok on WebRTC libraries. The new OpenTok on WebRTC libraries do not rely on Flash. OpenTok on WebRTC uses HTML5 video. For more information, see OpenTok on WebRTC.

Behind the scenes, for web-based OpenTok applications that do not use WebRTC, the platform uses Flash to access the webcam, and to stream and display audio-video signals. (iOS apps built using the OpenTok iOS SDK use no Flash)

All Flash SWF objects are created and destroyed behind the scenes. All this is transparent to the developer. The developer only needs to use JavaScript to access the OpenTok API.

In addition, session data is streamed back and forth to the servers using a third type of Flash SWF object, which is referred to as the controller. A single controller is instantiated per session connection. It has no visual representation and thus is hidden from the end user. It supports the data streaming needed for the event mechanism that is exposed to developers through the JavaScript library.

Next, see OpenTok API Developer Workflows.

IRC Live Chat

Have a quick question? Chat with TokBox Support on IRC. Join chat