Suggestions

close search

Add Messaging, Voice, and Authentication to your apps with Vonage Communications APIs

Visit the Vonage API Developer Portal

Post-Call Transcriptions

You can get a text transcription for a Video API archive.

This page includes the following sections:

Private beta

Note: The post-call transcriptions API is provided as a private beta feature. Contact us to enable post-call transcriptions for your project.

The private beta is available for selected partners to preview and evaluate the feature and provide feedback on the implementation and influence the direction of development. In order to incorporate feedback, and adapt the product to customer needs, it may be necessary to make breaking changes that affect the APIs and customer code. Please be aware that it may be necessary to modify code written during the private beta phase after the product is made generally available.

The private beta will be limited in capacity and therefore load testing and production traffic is prohibited. You will be asked to provide a Vonage Video test project ID that is not intended for production traffic. Once the test project ID is enabled for the private beta, the feature will be available with applications that use that project ID.

Pricing

For the private beta on the OpenTok environment, instead of charging the transcription rate of $0.04429 per minute, only individual archive charges will be applicable.

Feature overview

Vonage Video API servers generate post-call transcriptions using artificial intelligence and other state-of-the-art technology.

You enable transcriptions when you start an archive using the REST API.

After the archive recording completes, the transcription will be available as a JSON file.

Enabling transcription when starting an archive

When you use the Vonage Video REST API start an archive, set the hasAudio and hasTranscription properties to true in the JSON properties you sent to the start archive REST method:

Also, you can include an optional transcriptionProperties object with a hasSummary property (Boolean) to include an AI-generated summary in the transcription. The default value for hasSummary is false (the transcription summary is not included).

api_key=12345
json_web_token="jwt_string" # replace with a JSON web token
data='{
  "sessionId": "1_MX40NzY0MDA1MX5-fn4",
  "hasAudio": true,
  "hasVideo": true,
  "hasTranscription": true,
  "transcriptionProperties": {
    hasSummary: true
  },
  "name": "archive_test",
  "outputMode": "individual"
}'

curl \
  -i \
  -H "Content-Type:application/json" \
  -X POST \
  -H "X-OPENTOK-AUTH:$json_web_token" \
  -d "$data" \
  https://api.opentok.com/v2/project/$api_key/archive

Set outputMode (in the POST data) to "individual". Transcriptions are available for individual stream archives only.

Set the value for api_key to your OpenTok project API key. Set the value for json_web_token to a JSON web token (see the REST API Authentication documentation).

For other archive options, see the documentation for the start archive REST method.

The response for a call to the start archive REST method will include hasTranscription and transcription properties in addition to the other documented properties of the response:

{
  "createdAt" : 1384221730555,
  "duration" : 0,
  "hasAudio" : true,
  "hasVideo" : true,
  "id" : "b40ef09b-3811-4726-b508-e41a0f96c68f",
  "name" : "The archive name you supplied",
  "outputMode" : "composed",
  "projectId" : 123456,
  "reason" : "",
  "resolution" : "640x480",
  "sessionId" : "flR1ZSBPY3QgMjkgMTI6MTM6MjMgUERUIDIwMTN",
  "size" : 0,
  "status" : "started",
  "streamMode" : "auto",
  "hasTranscription" : true,
  "transcription" : {
    "status": "requested",
    "url": ""
  }
}

See Getting transcription status for information on dynamically getting the transcription details.

In an automatically archived session, the transcription won't be started automatically. You should start a second archive, using the multiArchiveTag option, for the transcription (see Simultaneous archives).

Support for transcriptions is currently available with the Vonage Video REST API. It is not supported in the Vonage Video server SDKs.

Getting transcription status

The response for the REST methods for listing archives and retrieving archive information will include hasTranscription and transcription properties:

{
    "id" : "b40ef09b-3811-4726-b508-e41a0f96c68f",
    "event": "archive",
    "createdAt" : 1723584124,
    "duration" : 328,
    "name" : "the archive name",
    "projectId" : 123456,
    "reason" : "",
    "sessionId" : "2_MX40NzIwMzJ-flR1ZSBPERUIDIwMTN-MC45NDQ2MzE2NH4",
    "size" : 18023312,
    "status" : "uploaded",
    "hasTranscription" : true,
    "transcription": {
      "status": "available",
      "url": "URL for downloading the transcription, if available",
      "reason": "The reason for failure, if status is set to failed" 
    }
}

The hasTranscription property is a Boolean, indicating whether transcription is enabled for the archive.

The transcription property an object with the following properties:

You can also set an archive status callback for your Video API account. See Archive status changes. The callback data will also include hasTranscription and transcription properties.

Transcription format

The transcription is provided as a compressed ZIP file. The uncompressed file is a text file with JSON data.

The transcription includes individual segments of text. Each segment corresponds to an individual audio channel (from one of the audio streams in the session).

The JSON has the following top-level properties:

The transcription object has the following properties:

Each object in the segments array has the following properties:

Each raw_data object includes the following properties:

Limitations/Known Issues

Regional Media Zone Support Available during alpha Available when in GA
USA Yes Yes
EU Yes Yes
Canada No Based on requirement
Germany No Based on requirement
Australia No Based on requirement
Japan No Based on requirement
South Korea No Based on requirement

FAQs

Frequently asked questions:

How many streams can be analyzed from a single session?

Up to 50 streams with a maximum of 120 transcribed minutes.

Does the post-call transcriptions feature work with both routed and relayed sessions?

The post-call transcriptions feature is intended for routed sessions (sessions that use the Vonage Media Router).

If the transcription uploads to a customer-configured S3 bucket fails, does the retry or fallback mechanism work similarly to the archive upload?

Yes, the retry mechanism for PCT operates exactly the same as for regular archive uploads.

When the transcription status changes, the customer should receive a callback that includes the download URL. If no callback is registered, the download link can only be retrieved through an HTTP GET request.

There are no plans to introduce authentication for the link. The download link has a short expiration window. If not accessed within that timeframe, a new request must be made to obtain a fresh link.

Even though multiple users are joined in the session, the transcription file is a single JSON file. How do we differentiate between the users?

Each transcription entry in the file is associated with a specific channel number, assigned to each stream. The file also includes a channels_metadata property, which provides stream ID information corresponding to each channel ID.

Updates

May 21, 2025

Post-Call Text Insights — The start archive API call now includes an optional transcriptionProperties object, that includes a hasSummary property for including an AI-generated summary in the transcription file.