WebRTC and Signaling: What Two Years Has Taught Us

There’s a new digital version of an old analog joke that starts something like this: “Two WebRTC engineers walk into a bar to have a beer while they talk about signaling.”  The problem is that it’s the hotel bar at the Hotel California, and the punchline is that the engineers never get to leave.

Hardly a day goes by without another blog post about signaling and WebRTC.

Some people think signaling should be standardized; others think we already have the answer in SIP or REST. Some think that the lack of a signaling specification (beyond the need to support SDP offer/answer) is a huge gap in the WebRTC standard.

We think that leaving signaling out was the smartest thing that the key drivers of the standard could have done, for three reasons:

  1. There is no one right answer.  What might be best for interoperating with the existing telecom infrastructure is not necessarily best for pure OTT solutions.  What might work well for 1-1 calls doesn’t necessarily extend well to group conferencing.
  2. We’d still be arguing, and WebRTC wouldn’t be where it is today.  There are too many entrenched players with too much to lose from a WebRTC signaling standard that doesn’t match their objectives. If you think the codec debate is complicated, those agendas are trivial compared to the heat that would accompany a WebRTC signaling standard debate.
  3. Getting it wrong would be disastrous. Standardizing on the wrong signaling protocol could easily limit the future potential of WebRTC.

Rather than slowing down WebRTC, the committee managed to skip years of unproductive debate.  They chose the “different strokes for different folks” path, opting to let specific implementations choose signaling that suits their needs.

After having spent a few years in the signaling weeds ourselves, we’ve developed a pretty strong point of view on the topic.  So as we introduce some new capability into the OpenTok platform, we thought this was a good time to share what we’ve learned.

Our Point of View

In our experience, signaling is bigger than offer/answer. It’s where the rubber meets the road at the intersection of use case, audience, scalability and interoperability. Signaling determines the future direction and potential of WebRTC itself.

At TokBox, our view on this is clear:

  • Our sole focus is on helping developers and enterprises add live video to web and mobile applications whatever their use case and business goal.
  • We are building towards a WebRTC-native world, creating an infrastructure platform whose core value lies in connecting WebRTC endpoints to each other in powerful, flexible, scalable and intelligent ways.
  • Enabling the customer’s application use case is our goal. Ease of connecting to the legacy telecom infrastructure is valuable, but secondary (and yes, we know that TokBox is owned by a large telco).

We have serious experience with how broad that range of applications might be. OpenTok predates WebRTC.  We’ve had a head start on solving these problems, and have had time to take a couple of kicks at the can.

We’ve Tried It All

Our initial forays into signaling infrastructure started with the basics. We watched closely as customers used OpenTok’s publish/subscribe model to build a broader range of video topologies than we initially anticipated. It was easy to imagine 1:1, symmetric video conferencing, and simple multi-stream broadcast use cases. But use cases that created hybrids of all of these (for monitoring, previewing, “side chat” enablement, and more) surprised us – and delighted us too!

In OpenTok, the signaling infrastructure is the backbone that establishes and coordinates video topologies. Along with these many topologies came a wide variety of demands on the signaling infrastructure: The ability for it to scale up. To scale out. Low latency. These requirements were driven by the variety of applications, and by the need for our own infrastructure to communicate with itself to deliver the services our customers needed.

We looked at SIP. Who wouldn’t? It’s what the world of telecom works on today. But SIP was implemented with a more constrained problem space in mind. WebRTC aims to interoperate across endpoints and provide a much more dynamic call environment than was intended in traditional telephony scenarios.  While it was clear that being able to interoperate with one or more SIP endpoints would be important to some of our customers, interoperation alone didn’t justify limiting our WebRTC-native infrastructure with SIP-based signaling.

We looked at XMPP. In fact, we built an entire signaling and messaging infrastructure with XMPP. But after examining performance metrics and scaling characteristics, we concluded that XMPP wasn’t really right for us – or our customers. (It turns out we weren’t alone in this conclusion. Some of the largest messaging providers on the planet have outgrown XMPP at the core because it didn’t keep up. It’s interesting to note that in the CU-RTC-Web proposal, Microsoft also commented on the challenges of scalable signalling.)

At this point we stopped looking at off-the-shelf protocols, stepped back, and thought hard about what would really be needed in state-of-the-art messaging technologies.  Ultimately, we decided that we should build our own signaling infrastructure.

Rumor:  Extreme Signaling Performance

We built Rumor, a cloud-based message fabric that runs globally across the OpenTok infrastructure. It’s our own implementation, based on a state-of-the-art open source socket library called ZeroMQ. Rumor can run on a single server, or it can scale up and out across instances within and across data centers.

We named our message fabric “Rumor” because nothing travels faster than one.

No surprise, it’s fast. And scalable. Rumor is so fast that we use the same signaling technology between our own internal cloud components that we use to talk to OpenTok customer application endpoints out at the last mile. A single Rumor instance can route, distribute and deliver tens of thousands of messages per second. We can cluster them together to scale.

Rumor is so important to us because it fuels OpenTok’s powerful event model – delivering connection, subscription, quality, and other event-based information that tells your application what’s going on across its OpenTok session. Rumor does it across all the use cases that we see out in the wild:  1-1+ messaging, group conferencing, large-scale broadcast (which can sometimes require that the same message be delivered to thousands of endpoints at the same moment in time) and more.

Rumor makes this happen over a wide variety of transports: for modern WebRTC-equipped browsers, Rumor runs on websockets. For mobile device endpoints, it runs on raw sockets. For legacy reasons, there’s an RTMP implementation. Most recently, as a part of our Cloud Raptor SDK, we have extended Rumor’s reach to our customers’ servers, making it possible for your application’s central business logic to observe and react to the OpenTok event stream as well. The list goes on.

Now For Your Application-Level Messages Too

But what’s the fun of having a phenomenal signaling layer for WebRTC if we keep all the benefits for internal use only? For more than a year now, Rumor has been providing the control fabric for every OpenTok session around the planet.  More recently, we’ve been using Rumor to control and manage specialized work flows in OpenTok for Customer Service, including customer queuing and call transitions. So we thought it was time to extend the value of Rumor out to our customers.

Every developer knows that as soon as you have connected two or more users of an application with video, you have turned that app into a collaborative application. And that means that you probably need to start sending application-level messages back and forth between the endpoints in order to coordinate application activity. Given that we have a real-time video connection between and amongst the endpoints, along with an uber-powerful message fabric, it seems as though letting you send some additional messages around using that fabric shouldn’t be that hard.

So we’re now taking this proven technology out of the barn, and giving customers access to it. Using some very simple primitives in the OpenTok API, customers can now take advantage of the Rumor message fabric to send their own messages between specific endpoints, broadcast their own messages to every endpoint in a session, or anything in between.

Using the signaling API, you can implement text chat.  You can send chess moves.  You can control a robot in real-time.  You can do this seamlessly within your OpenTok environment, using all the OpenTok constructs that already exist within your application.  And if you’re using our Cloud Raptor SDK to let your server watch all the OpenTok events moving around in your OpenTok sessions, you will now be able to observe all these application-level messages as well.

Keeping an Eye on the Future by Answering the Right Question

SIP? XMPP? JSON? Rumor? The right answer to the signaling question probably depends a lot on your starting point and on what you’re trying to accomplish.

At TokBox, we’re trying to invent the future. We think signaling is a very big deal for WebRTC and that the standard has provided a lot of running room for innovation.

With years of experience deploying live video applications, we’ve built a signaling infrastructure for a native WebRTC ecosystem that enables applications that know no bounds. Now that we’re opening up Rumor to customer use, we’re taking the next step in enabling that ecosystem.

And of course, we’re far from done.  Still in the lab, we have a new plug-compatible implementation (appropriately named Gossip) that is even faster than Rumor. An order-of-magnitude faster. We’ll let you know when we take that out for a spin.