VP9 Scalable Video Coding for routed sessions

About the Public Beta feature to enable VP9 in routed session, with support for Scalable Video Coding (SVC)

Public Beta

The Public Beta program is available for all partners to preview and evaluate the feature and provide feedback on the implementation. Please be aware that it may be necessary to modify code written during the Public Beta phase after the product is made generally available.

VP9 Overview

VP9 is an open and royalty-free video coding format developed by Google. It serves as the successor to VP8, offering greater compression efficiency. This means VP9 can encode higher-quality video at the same bitrate as VP8, although it may require more processing power to do so.

Scalable Video Coding (SVC)

One of the key advantages of VP9 is its support for Scalable Video Coding (SVC). SVC allows a single video stream to contain multiple spatial and temporal qualities within it. This enables an SFU (Selective Forwarding Unit), such as the Vonage Video Media Router, to forward different resolutions and frame rates to each client subscribed to an SVC-capable publisher. This is more efficient than simulcast, where the publisher sends multiple video streams at different resolutions, as commonly done with VP8.

Scalability Mode

The scalability mode in SVC defines the number and types of spatial and temporal layers in an SVC stream, as well as the dependencies between them. For further details, refer to the W3C WebRTC SVC specification. The scalability mode is defined for a publisher, and the Vonage Video Media Router will take care of forwarding the appropriate stream among the available ones to the subscriber.

When publishing a camera stream, the Vonage Video API supports the L1T3_KEY, L2T3_KEY, and L3T3_KEY scalability modes (three temporal layers with a variable number of spatial layers) for both web and native clients.

Please note that the number of spatial layers will be adjusted automatically based on the camera stream resolution and the estimated bandwidth between the publisher and the Vonage Video Media Router. With sufficient bandwidth, the spatial layers for FHD are 1920x1080, 960x540, and 480x270 pixels. For HD, they are 1280x720, 640x360, and 320x180. For SD, there are only two spatial layers: 720x480 and 360x240. Similarly, the temporal layers will be 30 FPS for the highest temporal layer, 15 FPS for the middle temporal layer, and 7.5 FPS for the lowest temporal layer.

For both web and native clients, when publishing a screen-sharing stream, the scalability mode is determined through an option set during publishing (more details in the Scalable Video page. If it is disabled, the stream will use L1T1 mode, offering a single spatial and temporal layer (no scalability). However, when is enabled, the system will dynamically select between L1T1, L2T1, or L3T1 based on the estimated bandwidth between the publisher and the Vonage Video Media Router, as well as the resolution of the screen-sharing stream.

Simulcast (used in VP8) vs. SVC (used in VP9)

In simulcast (used with VP8), the publisher sends multiple independent video streams, each consisting of different temporal layers with varying resolutions and bitrates, to the media server. These encoded temporal layers within multiple streams allow the server to dynamically select the most appropriate stream and layer for each subscriber based on network conditions. While this method enhances efficiency by adjusting video quality in real-time, it requires the publisher to encode (which increases CPU load) and transmit (which increases network load) several versions of the same stream to accommodate different conditions.

In contrast, SVC (Scalable Video Coding), as supported in VP9, embeds multiple spatial and temporal qualities into a single stream. The media server can then extract and forward the appropriate layer to each subscriber without requiring the publisher to send multiple streams. This makes SVC more bandwidth-efficient and improves the efficiency of the publisher's workload compared to simulcast.

Usage

You can proceed to test the feature by selecting VP9 as the preferred video codec from the Project page of your Video API account.

For relayed sessions, SVC is disabled.
For routed sessions, SVC will be automatically turned on.

Notes on Archiving

Our platform offers two archiving modes: Composed Archives and Individual Archives, and those will continue to work when selecting VP9 as a preferred codec. In Composed Archives, recordings are stored as composed MP4 files with H.264 video and AAC audio, delivering a single, finalized output. In contrast, Individual Archives store each participant's media separately as WebM streams, with VP9 SVC (Scalable Video Coding) video.

Support for VP9 SVC in both commercial and open-source players may vary, depending on the version and specific implementation. For instance, when playing back Individual Archives, not all players can handle SVC-encoded streams correctly. In the case of FFMPEG, only the libvpx-vp9 codec is able to properly decode these streams. For correct playback, the following command can be used:

ffplay -vcodec libvpx-vp9 vp9_with_svc.webm

Additionally, if you need to convert an SVC-encoded WebM file to a standard VP8 stream without SVC (for broader compatibility), you can use the following FFMPEG command to transcode the video:

ffmpeg -c:v libvpx-vp9 -i vp9_with_svc.webm -c:v libvpx vp8.webm

This transcoding strips out the SVC layers, ensuring the video can be played in players that don’t fully support VP9 and/or SVC.

Additionally, if you want to strip out the SVC layers while maintaining the VP9 codec, you can use:

ffmpeg -c:v libvpx-vp9 -i vp9_with_svc.webm vp9_without_svc.webm

Public Beta Limitations/Known issues

For the Public Beta phase, we are aware of the following limitations and known issues:

When using features that introduce complex background textures, such as video filters from the Vonage media processor library, there may be dropped frames and visual stuttering.
End-to-End Encryption is not supported with VP9 codec.
For Experience Composer workloads with highly dynamic video content and complex textures, the output video quality of VP9-encoded Experience Composer streams scored lower on perceptual video (VMAF) testing than VP8-encoded Experience Composer streams. This Experience Composer quality difference should not be noticeable for standard composition video conferencing use cases.

Frequently Asked Questions

What is the status of the VP9 browser compatibility?

As of 2024, VP9 is fully supported across all major browsers. This includes Google Chrome, Firefox, Microsoft Edge, and Opera, which have offered complete VP9 compatibility since around 2016. Apple's Safari 15+ has support for VP9.

Which devices support VP9?

As of 2025, most modern devices fully support VP9 for WebRTC services. This includes desktops, laptops, and mobile devices running recent versions of major browsers like Google Chrome, Firefox, Microsoft Edge, Opera, and Safari (from version 15 onwards). While these browsers may perform video coding through hardware, hardware support for encoding and decoding is not as ubiquitous. For example, even if the hardware is capable of encoding and decoding VP9, our native SDKs will use software video coding for VP9.

On the other hand, hardware and software SVC support is not as ubiquitous.

Note: VP9 is supported on Firefox, but SVC is not.

Which devices are recommended?

VP9 offers improved video compression over VP8, with the tradeoff being higher CPU load. Modern device models from premier brands (e.g., Apple iPhone; Google Pixel; Samung Galaxy, etc.) can be expected to perform well.

Another feature with similar CPU constraints is the Vonage Media Processor API. Devices that meet those requirements can be expected to handle VP9 well.

Recommended devices by platform:

What client SDK versions can I use?

What is indicated in the codecs page applies. However, full VP9 and SVC support with improvements developed during the Early Access phase is available starting from the 2.29 release version. If you are unable to upgrade, basic VP9 and SVC support starts from Web and Native client SDK version 2.27.

What happens if a browser/SDK/device does't support VP9?

If it is a publisher, it will fall back to VP8. If it is a subscriber, it will not be able to subscribe to video and will receive only audio.

What happens if a browser/SDK/device supports VP9 but doesn't support SVC?

For both publishers and subscribers, VP9 will be negotiated, but without the scalability support. The end-point will use, transparently, VP9 without SVC.

How do I monitor the codec in a session?

As with any other codec, the Video Inspector tool will display codec, resolution, and frame rate in the Quality Metrics module. Just mouse over any point on a plotted line to see the used codec.

How do I know what codec an end-point is using?

The SDKs provide methods to retrieve RTCStatsReport object for each stream, which will include the audio and video codec used. To see code samples, have a look at the Subscriber Reference Manual or the Publisher Reference Manual for JS. For Linux, you can find additional information here.

Could we improve this page? Let us know what's missing.

Feedback

Suggestions