Audio Connector lets you send raw audio (PCM 16 khz/16bit) streams from a live Vonage Video session to external services such as AWS, GCP, Azure, etc., through your own servers for further processing and analysis.
Using Audio Connector, you can send audio streams individually or mixed. You can identify the speaker by sending the audio streams individually by opening multiple WS connections.
The further processing of audio streams in real-time and offline enables building capabilities such as captions, transcriptions, translations, search and index, content moderation, media intelligence, Electronic Health Records, sentiment analysis, etc.
Audio Connector is enabled by default for all projects, and it is a usage-based product. Audio Connector usage is charged based on the number of audio streams of participants (or stream IDs) that are sent to the WebSocket server. The Audio Connector feature is only supported in routed sessions (sessions that use the OpenTok Media Router). You can send up to 50 audio streams from a single session at a time.
Notes:
This page includes the following sections:
To start an Audio Connector WebSocket connection, use the OpenTok REST API.
You can also you can also start an Audio Connector WebSocket connection using the OpenTok server SDKs:
OpenTok.connectAudioStream()
method.opentok.websocketConnect()
method.OpenTok->connectAudio()
] method.opentok.connect_audio_to_websocket()
method.opentok.websocket.connect()
methodOpenTok.StartBroadcast()
methodMake an HTTPS POST request to the following URL:
https://api.opentok.com/v2/project/:apiKey/connect
Replace apiKey
with your OpenTok API key.
Set the Content-Type
header to application/json
. Set a custom X-OPENTOK-AUTH
header
to a JSON Web token that is valid for use with the OpenTok REST API calls. See the section on
OpenTok REST API call authentication.
Set the body of the request to JSON data of the following format:
{
"sessionId": "OpenTok session ID",
"token": "A valid OpenTok token",
"websocket": {
"uri": "wss://service.com/ws-endpoint",
"streams": [
"streamId-1",
"streamId-2"
],
"headers": {
"headerKey": "headerValue"
},
"audioRate" : 8000
}
}
The JSON object includes the following properties:
sessionId
(required) — The OpenTok session ID that includes the OpenTok streams you want
to include in the WebSocket stream.
token
(required) — The OpenTok token to be used for the Audio Connector connection to the
OpenTok session. You can add token data
to identify that the connection is the Audio Connector
endpoint or for other identifying data. (The OpenTok client libraries include properties
for inspecting the connection data for a client connected to a session.) See the
Token Creation developer guide.
websocket
(required): Included details for the WebSocket:
uri
(required): A publicly reachable WebSocket URI to be used for the destination of
the audio stream (such as "wss://service.com/ws-endpoint").
streams
(optional) — An array of stream IDs for the OpenTok streams you want to
include in the WebSocket stream. If you omit this property, all streams in the session
will be included.
headers
(optional) — An object of key-value pairs of headers to be sent to your
WebSocket server with each message, with a maximum length of 512 bytes.
audioRate
(optional) — A number representing the audio sampling rate in Hz.
Accepted values are 8000 and 16000 (the default).
A successful call results in a HTTP 200 response, with details included in the JSON response data:
{
"id": "b0a5a8c7-dc38-459f-a48d-a7f2008da853",
"connectionId": "e9f8c166-6c67-440d-994a-04fb6dfed007"
}
The JSON response data includes the following properties:
id
— A unique ID identifying the Audio Connector WebSocket connection.
connectionId
— The OpenTok connection ID for the Audio Connector WebSocket connection
in the OpenTok session.
For more details, see the Audio Connector REST API documentation.
The initial message sent on the established WebSocket connection is text-based, containing
a JSON payload. The JSON details the audio format in content-type, along with any other metadata
that you put in the headers
property of the body in the POST request to start the WebSocket
connection:
{
"content-type":"audio/l16;rate=16000",
"CUSTOM-HEADER-1": "value-1",
"CUSTOM-HEADER-2": "value-2"
}
Messages that are binary represent the audio of the call. The audio codec supported on the WebSocket interface is Linear PCM 16-bit, with a 16kHz sample rate. Each message includes one 640-byte frame of data (20ms of audio) at 50 frames (messages) per second.
When audio in the streams included in the WebSocket is muted, a text message is sent with the
following JSON payload (with active
set to false
):
{
"content-type":"audio/l16;rate=16000",
"method": "update",
"event": "websocket:media:update",
"active": false,
"CUSTOM-HEADER-1": "value-1",
"CUSTOM-HEADER-2": "value-2"
}
(The CUSTOM-HEADER
properties in this example represent metadata that you include
in the headers
property of the body in the POST request to start the WebSocket
connection.)
Audio may be muted because all clients stop publishing audio or as a result of a force mute moderation event.
When audio of one of the streams resumes, a text message is sent with the
following JSON payload (with active
set to true
):
{
"content-type":"audio/l16;rate=16000",
"method": "update",
"event": "websocket:media:update",
"active": true,
"CUSTOM-HEADER-1": "value-1",
"CUSTOM-HEADER-2": "value-2"
}
When the Audio Connector WebSocket stops because of a call to the force disconnect REST method) or because the 6-hour time limit is reached (see Stopping a WebSocket connection), a text message is sent with the following JSON payload:
{
"content-type":"audio/l16;rate=16000",
"method": "delete",
"event": "websocket:disconnected",
"CUSTOM-HEADER-1": "value-1",
"CUSTOM-HEADER-2": "value-2"
}
This message marks the termination of the WebSocket connection.
(The CUSTOM-HEADER
properties in this example represent metadata that you include
in the headers
property of the body in the POST request to start the WebSocket
connection.)
When your WebSocket server closes the connection, the the OpenTok connection for the call also ends. In each client connected to the session, the OpenTok client-side SDK dispatches events indicating the connection ended (just as it would when other clients disconnect from the session).
You can disconnect the Audio Connector WebSocket connection using the force disconnect REST method). Use the connection ID of the Audio Connector WebSocket connection with this method.
As a security measure, the WebSocket will be closed automatically after 6 hours.
Audio Connector will make a few attempts to re-establish a WebSocket connection that closes unexpectedly (for example, if the WebSocket closes without resulting from a call to the force disconnect REST method).
See the demo-video-node-audio_connector project for a sample Node application that uses Audio Connector.
See this blog post.