Creating a true voice call experience with OpenTok and CallKit

opentok callkit voice call experience

In previous posts, we have looked at some key considerations for building apps for social video calling on mobile devices, and some of the features you can include to make sure your users have a great experience.

Here, we’re going to look in depth at CallKit, a framework for iOS which is an important component for creating frictionless, delightful apps, especially voice and video calling apps.

Introducing CallKit and why it’s important

Last year at WWDC 2016, Apple announced a new VoIP framework: CallKit. This new framework allows developers to integrate VoIP calling services with other call-related apps on iOS. Simply put, your own application calling services can now have the same system priority as native phone calls.

callkit opentok webrtc voice and video

Before CallKit, a app VoIP call could easily be dropped by an incoming native phone call without warning. Now users have the control to either continue or stop the call through the native calling user interface.

TokBox has been powering real-time audio and video communication over the web for a decade, and we are glad to have this new functionality to improve the calling experience for native iOS apps. In the real-time communication business, quality and continuity are often critical criteria, and even small inconveniences can lead to user frustration. CallKit provides a much-needed improvement to continuity, and will undoubtedly help developers (including those building TokBox apps) create more enjoyable user experiences as the mobile market continues to advance.


The sample app below will show you how to integrate CallKit with OpenTok iOS SDK. With CallKit, your app will now be able to:

  • Use the native incoming call UI in both locked and unlocked state.
  • Interact with other calls in the system.

As a developer, you will mostly use two primary classes:  CXProvider and CXCallController.


A CXProvider object is responsible for reporting out-of-band notifications that occur to the system. Whenever such an event occurs, CXProvider internally creates a CXCallUpdate object to notify the system. A CXCallUpdate encapsulates new or updated call-related information which exposes properties such as the caller’s id, or whether it’s an audio-only, or a video call etc. The app communicates with CXProvider through the core protocol: CXProviderDelegate, which defines methods for provider lifecycle events and telephony actions. The templates below are shown in Swift where the different types of the action parameter distinguish the different func methods.

// MARK: CXProviderDelegate
func providerDidReset(_ provider: CXProvider) {
    print("Provider did reset")

func provider(_ provider: CXProvider, perform action: CXStartCallAction) {
    print("Provider performs the start call action")
    // Configure audio session but do NOT start audio until session activated

func provider(_ provider: CXProvider, perform action: CXAnswerCallAction) {
    print("Provider performs the answer call action")
   // Configure audio session but do NOT start audio until session activated

Note the importance of getting the timing right with the above actions. We can configure the audio session and other information, but we should not start call audio until the audio session has been activated by the system after having its priority elevated – see AVAudioSession below.

func provider(_ provider: CXProvider, perform action: CXEndCallAction) {
    print("Provider performs the end call action")
    // Trigger the call to be ended via the underlying network service.

func provider(_ provider: CXProvider, perform action: CXSetHeldCallAction) {
    print("Provider performs the hold call action")

func provider(_ provider: CXProvider, perform action: CXSetMutedCallAction) {
    print("Provider performs the mute call action")

The following methods indicate whether your app’s call has successfully had its priority boosted or recovered:

func provider(_ provider: CXProvider, timedOutPerforming action: CXAction) {
    print("Timed out \(#function)")
    // React to the action timeout if necessary, such as showing an error UI.

func provider(_ provider: CXProvider, didActivate audioSession: AVAudioSession) {
    // Start call audio media, now that the audio session has been activated
    // after having its priority boosted.

func provider(_ provider: CXProvider, didDeactivate audioSession: AVAudioSession) {
    // Restart any non-call related audio now that the app's audio session has been
    // de-activated after having its priority restored to normal.


Let’s explore how to make a call and answer a call on behalf of a user. To do that, we need a CXCallController object to interact with the system.

The CXCallController object takes a CXTransaction object to request a telephony action (which will later trigger delegate methods above if successful). To specify a telephony action in a transaction, you need to create your desired action object and associate them with the transaction. Each telephony action has a corresponding CXAction class such as CXEndCallAction for ending a call, or CXSetHeldCallAction for putting a call on hold.

Once you have it all ready, invoke the request(_:completion:) by passing a ready transaction object. Here’s how you start a call:

callkit opentok voice call

// create a CXAction
let startCallAction = CXStartCallAction(call: UUID(),
  handle: CXHandle(type: .phoneNumber, value: handle))
// create a transaction
let transaction = CXTransaction()

// create a label
let action = "startCall"

callController.request(transaction) { error in
    if let error = error {
        print("Error requesting transaction: \(error)")
    } else {
        print("Requested transaction \(action) successfully")

Present Native Incoming Call Screen

callkit opentok webrtc

// Construct a CXCallUpdate describing the incoming
// call, including the caller.
let update = CXCallUpdate()

update.remoteHandle =
  CXHandle(type: .phoneNumber, value: handle)

// Report the incoming call to the system
provider.reportNewIncomingCall(with: uuid, update: update)
{ error in
    // Only add incoming call to the app's list of calls if
    // call was allowed (i.e. there was no error) since calls
    // may be "denied" for various legitimate reasons.
    // See CXErrorCodeIncomingCallError.

For testing purpose, you can define your own customer behavior to present the native calling screen. Often, this piece of code works with a VoIP remote notification to make a call to a specific device/person, as WhatsApp, WeChat and Messenger do.

Voice-only configuration with OpenTok

By default, OpenTok iOS SDK requires camera and microphone for video and audio communication. For voice-only apps, you might not want to acquire camera access. You can easily achieve this by setting the videoTrack property to false :

if publisher == nil {
    let settings = OTPublisherSettings() =
    settings.audioTrack = true
    settings.videoTrack = false
    publisher = OTPublisher.init(delegate: self, settings: settings)
var error: OTError?
session?.publish(publisher!, error: &error)
if error != nil {

Of course, you don’t have to receive the video stream either:

subscriber = OTSubscriber.init(stream: stream, delegate: self)
subscriber?.subscribeToVideo = false
if let subscriber = subscriber {
    var error: OTError?
    session.subscribe(subscriber, error: &error)
    if error != nil {

Download the Sample App

Based on the Apple CallKit sample app Speakbox, we have developed a sample app to demonstrate how to integrate CallKit with OpenTok to create a true video call experience. With the sample app, you can try out how to actually make an outbound call and simulate an incoming call in both locked and unlocked state with OpenTok.

Unfortunately, a small issue prevents the sample from working fully. It turns out that the audio session does not function normally if a call is accepted from the locked state. Thanks to the brilliant folk over at Apple, they suggest developers set up the audio session as early as possible to temporarily make the case work. Here is the information:

callkit demo voice call app


callkit demo voice call app

Conclusion and Next Steps

Isn’t it great to make your app behave like a native phone call and redirect your app when possible? Now you can do it with CallKit!

While CallKit won’t fundamentally change your iOS app, the improved continuity will undoubtedly help you provide a more seamless user experience for your users.

Note: The application above requires values for TokBox API key, session ID, and token. During development and testing, you can get these values from the project page in your TokBox Account. For production deployment, you need to generate the session ID and token values using one of the OpenTok Server SDKs.

If you don’t have a TokBox account, sign up for your free trial here.

For an overview of the benefits of using the OpenTok platform for building richer applications, check out our series on social video apps:

Have fun!