How we test WebRTC live video sessions for massive audiences

Test WebRTC framework for large scale broadcast

Co-authored by Tiffany Walsh, Patrick Quinn-Graham, and Michael Sander.

We recently announced the launch of our large-scale Interactive broadcast capabilities, including the option to publish to a wide variety of endpoints via RTMP and HTTP Live streaming. You can now broadcast to an audience of up to 3000 real-time viewers – and we know because we’ve been testing it to get it perfect, so that you don’t have to. Want to know how we test WebRTC sessions for huge audiences even though we only have a handful of people on our team? Read on!

WebRTC session types

Testing interactive audio and/or video live sessions, with a small to medium number of participants is relatively easy, since a publisher (media source, i.e. microphone, webcam, etc.) or a subscriber (media sink, i.e. web browser, native app with widget to visualize video, headphones, etc.) takes an acceptable portion of the resources in a device (laptop, mobile, tablet…), and we can have several of them at the same time.

Here’s a quick set of examples of some of the types of sessions we can create with OpenTok:

webrtc video call 1-2-1

1-to-1: Two people, each one sending audio and/or video, and receiving the other’s.

webrtc multiparty calls

Many-to-Many: Several people, each one sending audio and/or video, and receiving all audio and video streams of the other participants.

One-way Broadcasting: The best examples of this type are TV and radio broadcasts, for instance live sports or news. There is a very limited number of media sources (i.e. one in the arena, another on the TV set, and a reporter on the street), with a potentially huge audience (in the order of hundreds of millions), like you would get for a Super Bowl match. With TokBox, these sessions can be created using HLS or RTMP streams. You can find more information about these types of sessions in our blog posts about Crowdcast and Future of webinars.

test webrtc interactive broadcast

Interactive broadcast: There are a few publishers and a medium to large real-time audience, up to a total of 3000 streams in a session. At certain points, an audience member can go “on stage” and publish, so all others can see and hear them. A talk show format where there is an interview with a celebrity and people at home can go live for a few seconds and ask a question is a real use case, for instance, the MLB Chatting Cage.

How we test WebRTC session types

As we’ve already talked about, creating test sessions of types 1 (1-to-1 ) and 2 (many-to-many) in a controlled test environment is fairly simple, and it only requires a minimal amount of resources that remains low, with just a few laptops, mobiles, tablets or other devices needed.

Moving to type 3, one-way broadcast, doesn’t actually impose further constraints, since we only need a few publishers and, optionally, one or more subscribers. Once the HLS or RTMP stream is created, all we need to do is connect it to a very small number of clients (a tab in the web browser, or any other application). Quality in this stream is provided by the underlying CDN, so there will be no difference in creating 1 client or 1000.

All these tests don’t look very exciting, do they? The actual challenge arises when we come up with the need to test WebRTC for medium to large and massive interactive broadcast sessions, where more than 100 (and up to several thousands) subscribers are connected to the same one. It is obvious that the approach of having a number of laptops and mobile devices on a table, and connecting them manually to the session is not enough. Even having this automated in some way falls short, since the huge number of clients requires a large amount of hardware. We cannot have all the clients in the same machine, nor can we visually check the quality for those publishers and subscribers.

Dropbear to the rescue

Here is where our Dropbear tool comes to the rescue. It is a tool developed internally at TokBox, designed to be able to deal with an arbitrary number of sessions, each of which with an arbitrary number of publishers and subscribers that simulate real clients in mobile or fixed devices. Traffic generated by them is realistic in both protocol (signalling, RTP/RTCP) and media aspects (VP8 video and Opus audio), which allows us to extract valid conclusions regarding platform quality and performance.

The general architecture used in Dropbear can be seen in the following diagram:

test webrtc dropbear framework for large session

There is a master node where the user connects via ssh. There, we configure some important environment variables, such as the OpenTok API URL, the Amazon AMI that will be run in the workers, or the API key and secret to be used to create sessions. Then, we run the tool, complementing configuration with the number and types of sessions, number of publishers per session, subscribers, etc.

The master node then calculates the number of CPU cores required to run the test, based on the aggregated number of publishers and subscribers for all concurrent sessions. These publishers and subscribers are implemented by workers, with a relation of one worker per publisher/subscriber. Workers are spread amongst EC2 boxes, with several workers per core. This means that, if, for example, we can have 2 workers per core, in an 8-core machine, running a test with 1 publisher and 2000 subscribers would require:

NUM_BOXES = CEIL((NPUBLISHERS + NSUBSCRIBERS) / WORKERS PER BOX) = CEIL((1 + 2000) / 16) = CEIL(125.06) = 126

A total of 126 boxes will be needed to run this specific test.

In order to implement this stage, we use EC2 instances, which are spun up and down on demand, using Spot Fleet, requesting 1001 cores (made up of whatever instance types are available) and that are shut down automatically at the end of the tests, when they are not required anymore, or after an activity timeout, whichever comes first (to prevent leaving running instances without actually being used).

Dropbear uses more CPU than memory, so we opt for the “compute optimized” EC2 instance types – c4.2xlarge up to the c4.8xlarge. The latter type allows a much higher number of workers per box, but at a higher cost. Spot Fleet is used because it prefers the cheapest (per core) instance type in the availability zone with the cheapest spot price.

Regarding workers, they are implemented in javascript, and run inside a docker container. This gives us a reliable and reproducible environment – that is described in code. These containers instantiate an instance of a custom built CentOS 7 image, which contains specific versions of Node.js and a node module that wraps our native SDK (the same core that powers our iOS, Android and Windows SDKs).

While the test runs, logs generated by the workers are sent to an ElasticSearch server, where they can be consumed in real time (or after the test) through Kibana.

When the test finishes, all workers notify the master node, and a report is then generated for each session created. This report provides a lot of useful information, such as average bitrates, packet loss and connection times.

Since we’ve done the testing for you, you can get on with developing your interactive broadcast application and make plans for spreading the word to as wide an audience as possible. To get started with the OpenTok platform, sign up for an account including $10 credit here.

Our Developer Center is full of resources to help you get up and running, and you can get in touch if you have any questions.