WebRTC for Enterprise: Challenges and Solutions


WebRTC is changing the way enterprises communicate within their organization and with their customers.

As a result of the large and diverse range of different use cases of WebRTC in the Enterprise world, there are inevitably a number of challenges that need to be addressed. We’ve compiled a  list of some of the key challenges and solutions for consideration with regards to implementing WebRTC for Enterprise solutions: Signaling, Multi-party, Interoperability, Quality and Scalability.


SIP? XMPP? JSON? Rumor? The right answer to the signaling question probably depends a lot on your starting point and on what you’re trying to accomplish.

While many people think signaling should be standardized; others think we already have the answer in SIP or REST. Some maintain that the lack of a signaling specification (beyond the need to support SDP offer/answer) is a huge gap in the WebRTC standard.

We think that leaving signaling requirements out was the smartest thing that the key drivers of the standard could have done. There is no one right answer.  What might be best for interoperating with the existing telecom infrastructure is not necessarily best for pure OTT solutions.  What might work well for 1-1 calls doesn’t necessarily extend well to group conferencing.

In our experience, signaling is bigger than offer/answer. It’s where the rubber meets the road at the intersection of use case, audience, scalability and interoperability.

Read more here:



In spite of limited specification of anything beyond one-to-one audio and video calls in WebRTC, one of the most popular usages of this technology today is multiparty video conference scenarios.   Don’t think just about traditional meeting rooms; there is a huge range of different use cases beyond meeting rooms, including e-learning, customer support, or real time broadcasting.  In each case, the core capability is being able to distribute the media streams from multiple sources to multiple destinations.   So… if you are a service provider how can you implement a multi-party topology with WebRTC endpoints?

There are several different architectures that may be suitable depending on your requirements. These architectures basically revolve around two axes:

  • Centralized vs Peer-to-Peer (P2P) and
  • Mixing vs Routing.

1. Mesh solution

The Mesh approach is the simplest solution. It has been popular among new WebRTC service providers because it requires no initial infrastructure. The architecture is based on creating multiple one-to-one streams from every sender to every possible destination.

2. Mixer solution

This Mixer approach is the traditional solution for multi-conferencing, and has been used for years with great success.  This success can be credited to the fact that it requires the least amount of intelligence in the endpoints.   The architecture is based on having a central point maintain a single one-to-one stream with each participant.  The central element then receives and mixes each incoming audio and video stream to generate a single stream out to every participant.  One common term in the video conference industry for these centralized element is Multipoint Control Unit (MCU). In practice, use of an MCU usually refers to a mixer solution.

Mixer Solution

3. Router solution

The Router (or relay) approach became popularized by H.264 SVC infrastructures, and it is the architecture being used by most of the new WebRTC platforms that have started without any legacy baggage.   The architecture is based on having a central point receiving a stream from every sender and sending out a stream to every participant for each.   This central point only does packet inspection and forwarding, but not expensive encoding and decoding of the actual media.

Router Solution

Which architecture should I use?

There is no simple answer.  In fact some commercial solutions include support for all of them, in order to optimize different customers’ use cases.  However, there are some general rules of thumb that you can use.

  • If you are providing an audio only service, or need interoperability with legacy devices, then the Mixer architecture is likely the most appropriate for you.   Also, in some cases where the cost of the infrastructure is not an issue, and the participants have very heterogeneous connectivity, this can be a good solution.

  • If you are building a service to be used by users with really good connections and powerful devices (i.e. an internal corporate service), and the number of participants is limited, then you may get good results with a Mesh architecture.

  • In general if you are providing a large scale service, preference should be given to the Router approach.   At the end of the day, the router solution is closest to the Internet paradigm of putting the intelligence in the border of the network, to achieve better scalability and flexibility when building the end user applications.

Read more here:



Many of the most interesting use cases we have seen for WebRTC involve a mobile endpoint running a native application that embeds WebRTC. That being said, WebRTC doesn’t support mobile endpoints. That means there are additional requirements for optimizing WebRTC support on mobile.

For example, there is no access to hardware based encoders and decoders, which means that highly efficient software encoding and encoding techniques need to be used. This in turn has implications on battery usage.

To solve for the issue of interoperability, use of a third party platform like OpenTok is recommended.


WebRTC is essentially defined as a peer-to-peer protocol for real-time browser-based communication. The problem is that countless real-world applications require multi-party support.

The standard WebRTC approach to optimize for network conditions in a four-person call would be:

  • User A publishes independently to Users B, C and D.
  • User A has good upload bandwidth and Users B and C have good download bandwidth.
  • User D has poor download bandwidth and is not able to subscribe to User A’s stream in a bandwidth-efficient manner.

The easiest approach to this problem would be to downgrade the quality of User A’s stream to the lowest common denominator. In this case, the maximum quality which User D can subscribe to. But this essentially penalizes Users B and C.

As you can imagine, this problem is exacerbated in large multi-party calls or even in smaller calls with mobile participants who may be moving across networks with high-variability.

In regulatory compliance in an industry may require a trade off between security and quality and efficiency for real time communications.

Third-party platforms are solving for this issue by providing technology that adapts to network conditions, delivering the best possible experience for all participants whether they are on a browser, mobile device, or behind a corporate firewall. That includes capabilities such as bandwidth optimization, audio fallback, video recovery, dynamic frame-rate controls

Learn how:

How to improve the quality of your experience.

How to implement Audio fallback.

How to traverse firewalls