What the CU-RTC-Web vs. WebRTC debate means for developers

About six months ago, Microsoft released an alternative proposal to the W3C WebRTC 1.0 Working Draft[2], dubbed CU-RTC-Web[1]. Like all W3C groups, the WebRTC Working Group enlists membership from a majority of the industry, including names like Nokia, Cisco, Google, and Mozilla. The most important question raised by the Microsoft proposal is how the Working Group would react to criticism of its draft proposal, and whether Microsoft would accept the published APIs of the Working Group, even if CU-RTC-Web is not adopted. So what exactly does this mean for the development community?

The Microsoft draft outlines a low-level API that allows developers more direct access to the underlying network and media delivery components. It exposes objects representing network sockets and gives explicit application control over the media transport[3]. In contrast, the WebRTC API abstracts these details with a text-based interface that passes encoded strings between the two participants in the call. With the WebRTC draft, developers are responsible for passing the strings between communicating browsers, but not explicitly configuring media transport for a video chat.

In terms of functionality and interoperability, these two approaches are equivalent. The text strings used by the WebRTC API are formatted according to the Session Description Protocol (SDP) [4]. SDP was initially developed for setting up SIP and VoIP phone calls, and contains all the configuration needed for two parties to initiate a call. CU-RTC-Web argues that SDP is not appropriate for use in a Web API, and that the limitation of endpoints that understand SDP is too prohibitive. As a result, the bulk of the difference between the two drafts is that CU-RTC-Web re-imagines the functionality of SDP in JavaScript.

While it is fairly straightforward to translate between the WebRTC 1.0 use of SDP and the equivalent CU-RTC-Web JavaScript interface definitions, there is an additional hurdle that would prevent interoperability. There is an ongoing discussion about required video codecs, focused on the choice between competing video codec specifications, H.264 and VP8. H.264 is used by default for YouTube and iTunes video content. Many modern mobile devices also offer native H.264 processing. However, many in the W3C Working Group have raised concern that the patents on H.264 are too restrictive to be used as a core Web technology. As an alternative, Google offers the VP8 codec, which was purchased in 2010 and relicensed as free-to-use under Creative Commons Attribution 3.0. Right now, Firefox and Chrome are using VP8 for WebRTC support. Microsoft has not yet committed its browser to a video format. Until browser vendors converge on this decision, the cost of interoperability will remain prohibitive for most web application developers.

Shortly after Microsoft announced CU-RTC-Web, the W3C Working Group voted on whether to incorporate the alternative ideas from the Microsoft proposal. The Working Group decided to continue using SDP to configure media transport[5]. Microsoft is continuing forward with CU-RTC-Web, and recently released a demo using the proposed API[6].

As far as web developers are concerned, this is both good news and bad. While it appears that face-to-face communication will be supported in all browsers, they may not work together seamlessly. For OpenTok users, there’s no need to worry. We will continue to offer a consistent API that bridges the technical gaps left behind by the standards development process. OpenTok on WebRTC will work across all browsers and will tackle interoperability issues of today, and tomorrow.