Use the Live Captions API to transcribe audio streams and generate real-time captions for your application.
The Vonage Video Live Captions API lets you show live captions to end-users in a Vonage Video session, using a transcription service. We are using AWS Transcribe as a transcription provider. Since Live Captions captures the audio from the OpenTok Media Router, it can provide the captions for the audio of SIP dial-in participants as well.
Live Captions is enabled by default for all projects, and it is a usage-based product. Live Captions usage is charged based on the number of audio streams of participants (or stream IDs) that are sent to the transcription service. For more information, see Live Captions API pricing rules.
The Live Captions feature is only supported in routed sessions (sessions that use the OpenTok Media Router). You can send up to 50 audio streams from a single Vonage session at a time to the transcription service for captions.
Use the REST API to enable captioning for a session.
Use the method in the client SDK to publish audio to the captions service:
In subscribing clients, call the respective client SDK method for a subscriber to subscribe to captions for a stream.
Upon starting live captioning, OpenTok securely streams audio to a third-party audio transcriptions service such as Amazon Transcribe.
Use the captioning API in the OpenTok client SDKs to enable or disable receiving live captions in your application:
Starting or stopping to receive live captions in one web client does not impact captions received by other clients connected to the OpenTok session.
Live Captions Support 11 Languages and 3 dialects of English. Pass in the desired language as the languageCode
option when enabling live captions with the REST API:
"en-US"
— English, US"en-AU"
— English, Australia"en-GB"
— English, UK"es-US"
— Spanish, US"zh-CN"
— Chinese, Simplified"fr-FR"
— French"fr-CA"
— French, Canadian"de-DE"
— German"hi-IN"
— Hindi, Indian"it-IT"
— Italian"ja-JP"
— Japanese"ko-KR"
— Korean"pt-BR"
— Portuguese, Brazilian"th-TH"
— ThaiLive captions can improve an application's user experience and user engagement. Captioning improves the accessibility score of your application, which often results in participation from individuals with hearing disabilities. Some laws worldwide require applications to provide captioning.
Captioning can result in increased speaker comprehension in uncontrolled surroundings, thereby improving user engagement.
Live captions are only available for routed OpenTok sessions (sessions that use the OpenTok Media Router).
Upon enabling the Live Captions feature:
Use the client SDK audio captioning API to start audio captioning for each published stream.
The audio stream is sent to a third-party audio transcription service (AWS Transcribe).
Use the client audio captioning API to subscribe to the live captions for each published stream.
Choosing to not receive the captions by an individual subscriber does not affect the receiving captions by other subscribers in other clients connected to the session.
When the OpenTok session is over (when all clients have stopped publishing streams to the session), you can explicitly stop captioning using the Stop Captions API. Otherwise, audio captioning automatically stops after maximum duration (specified when calling the Start Captions API) has expired.
The default maximum allowed captioning duration for each OpenTok session is 4 hours. You can set this to another maximum duration when you call the Start Captions API. Upon expiration, the audio captioning will stop without any effect on the ongoing OpenTok session.
Note that in the current phase, this feature is only available as a REST API interface and in the client SDKs as listed above.
You can set up a webhook to receive events when live captions start, stop, and fail for a session.
Go to your Video API account and select the project from the list of projects in the left-hand menu.
Under Project settings, find Live Captions Monitoring and click Configure.
Submit the URL for callbacks to be sent to.
Secure callbacks: Set a Signature Secret to use secure webhook callback requests with signed callbacks, using the signature secret. See Secure callbacks.
When the status of live captions changes, an HTTP POST is delivered to the callback URLs. If no callback URL is configured, no status update is delivered. The raw data of the HTTP request is a JSON-encoded message of the following form:
{
"captionId": "<captionsId>",
"projectId": "<apiKey>",
"sessionId": "<sessionId>",
"status": "stopped",
"createdAt": 1651253477,
"updatedAt": 1651253837,
"duration": 360,
"languageCode": "en-US",
"reason": "Maximum duration exceeds.",
"provider": "aws-transcribe",
"group": "captions"
}
The JSON object includes the following properties:
captionsId
— The unique ID for Audio Captioning session.projectId
— API Key
sessionId
— OpenTok session for which Audio Captioning has started.
status
— Current status of the live captions.
"started"
— The Vonage Video API platform has successfully allocated necessary resources to send audio streams for captioning.
"transcribing"
— The transcription service has started (and captioning is in progress).
"stopped"
— Captioning has stopped and all the resources have been deleted.
"failed"
— Captioning has failed to allocate the necessary resources or failed to send streams for captioning.
createdAt
— The Unix timestamp (Epoch) at which the audio captioning has started.
updatedAt
— The Unix timestamp (Epoch) at which the audio captioning has updated. If the status
is "stopped", the updatedAt
indicates the time at which captioning has stopped.
languageCode
— The BCP-47 language code usedreason
— Additional error information about the status changeprovider
—The third-party service provider used for the audio captioning:
"aws-transcribe"
— Amazon Transcribe
group
— The type of the event, which is always set to "captions" for audio caption API events.The opentok-web-samples Basic-Captions sample uses live captions in a web app built with the Vonage web client SDK.
See this Vonage API Support article for more technical specifications and FAQs.