Suggestions

close search

Add Messaging, Voice, and Authentication to your apps with Vonage Communications APIs

Visit the Vonage API Developer Portal

Live Captions

Use the Live Captions API to transcribe audio streams and generate real-time captions for your application.

The Vonage Video Live Captions API lets you show live captions to end-users in a Vonage Video session, using a transcription service. We are using AWS Transcribe as a transcription provider. Since Live Captions captures the audio from the OpenTok Media Router, it can provide the captions for the audio of SIP dial-in participants as well.

Live Captions is enabled by default for all projects, and it is a usage-based product. Live Captions usage is charged based on the number of audio streams of participants (or stream IDs) that are sent to the transcription service. For more information, see Live Captions API pricing rules.

The Live Captions feature is only supported in routed sessions (sessions that use the OpenTok Media Router). You can send up to 50 audio streams from a single Vonage session at a time to the transcription service for captions.

Steps to enable Live Captions

  1. Use the REST API to enable captioning for a session.

  2. Use the method in the client SDK to publish audio to the captions service:

  3. In subscribing clients, call the respective client SDK method for a subscriber to subscribe to captions for a stream.

Upon starting live captioning, OpenTok securely streams audio to a third-party audio transcriptions service such as Amazon Transcribe.

Use the captioning API in the OpenTok client SDKs to enable or disable receiving live captions in your application:

Starting or stopping to receive live captions in one web client does not impact captions received by other clients connected to the OpenTok session.

Supported languages

Live Captions Support 11 Languages and 3 dialects of English. Pass in the desired language as the languageCode option when enabling live captions with the REST API:

Use cases

Live captions can improve an application's user experience and user engagement. Captioning improves the accessibility score of your application, which often results in participation from individuals with hearing disabilities. Some laws worldwide require applications to provide captioning.

Captioning can result in increased speaker comprehension in uncontrolled surroundings, thereby improving user engagement.

Live captions are only available for routed OpenTok sessions (sessions that use the OpenTok Media Router).

Upon enabling the Live Captions feature:

Notes

The default maximum allowed captioning duration for each OpenTok session is 4 hours. You can set this to another maximum duration when you call the Start Captions API. Upon expiration, the audio captioning will stop without any effect on the ongoing OpenTok session.

Note that in the current phase, this feature is only available as a REST API interface and in the client SDKs as listed above.

Live caption status updates

You can set up a webhook to receive events when live captions start, stop, and fail for a session. Set the statusCallbackUrl option in the REST API method to start live captions.

When the status of an audio captioning changes, an HTTP POST is delivered to the callback URLs. If no callback URL is configured, no status update is delivered. The raw data of the HTTP request is a JSON-encoded message of the following form:


{
  "captionId": "<captionsId>",
  "projectId": "<apiKey>",
  "sessionId": "<sessionId>",
  "status": "stopped",
  "createdAt": 1651253477,
  "updatedAt": 1651253837,
  "duration": 360,
  "languageCode": "en-US",
  "reason": "Maximum duration exceeds.",
  "provider": "aws-transcribe",
  "group": "captions"
}

The JSON object includes the following properties:

Sample

The opentok-web-samples Basic-Captions sample uses live captions in a web app built with the Vonage web client SDK.

Known issues

More information

See this Vonage API Support article for more technical specifications and FAQs.