Exploring Media over QUIC: A Superior Alternative to WebRTC?

Tencent MPS-Dev Team

Jun 19, 2024

What is QUIC?

QUIC (Quick UDP Internet Connection) is a transport protocol developed by Google, based on UDP (User Datagram Protocol). It combines the functionalities of TCP, HTTPS, and HTTP/2 to ensure reliability while reducing network latency. By leveraging UDP, which is a simpler transport protocol, QUIC eliminates factors like TCP transmission acknowledgments, retransmissions, and slow start, allowing for the establishment of a secure connection in just one round trip time. It also incorporates features such as HTTP/2 multiplexing and header compression.

It is well-known that UDP offers faster transmission speeds compared to TCP. While TCP is a reliable protocol, it incurs additional overhead due to the need for data acknowledgment between both parties. Additionally, TCP is implemented in the operating system kernel, making it challenging to upgrade the protocol without requiring users to upgrade their systems. In contrast, QUIC operates on top of UDP and allows clients to freely utilize it, requiring only server compatibility for integration.

Development of the HTTP protocol

1. HTTP History:

HTTP 0.9 (1991): Only supported the GET method and did not include request headers.
HTTP 1.0 (1996): Basic version with support for request headers, rich text, status codes, caching, but lacked connection reuse.
HTTP 1.1 (1999): Introduced connection reuse, chunked transfer encoding, and support for resumable downloads.
HTTP 2.0 (2015): Introduced binary framing, multiplexing, header compression, server push, and more.
HTTP 3.0 (2018): QUIC protocol was implemented in 2013. In October 2018, the HTTP and QUIC working groups of IETF jointly decided to map HTTP over QUIC as "HTTP/3" to make it a global standard.

2. HTTP 1.0 and HTTP 1.1:

Head-of-line blocking: The next request must wait for the previous request to complete, leading to underutilization of bandwidth and blocking subsequent requests (HTTP 1.1 attempted to address this with pipelining, but the inherent FIFO mechanism still caused head-of-line blocking).
High protocol overhead: Large content in headers without compression increases transmission costs.
Unidirectional requests: Only allows one-way requests, where the client requests and the server responds.

Differences between HTTP 1.0 and HTTP 1.1:
- HTTP 1.0: Only supports short-lived TCP connections (no connection reuse), lacks support for resumable downloads, and suffers from head-of-line blocking.
- HTTP 1.1: Default support for persistent connections (reusable TCP connections), supports resumable downloads (through header parameters), optimized caching control, pipelining (multiple requests sent at once, but responses still return sequentially, not solving head-of-line blocking), added error status code notifications, and supports Host header fields in both request and response messages.

3. HTTP2:

HTTP2 aimed to address some of the issues in HTTP1 but couldn't solve the head-of-line blocking problem at the underlying TCP protocol level.

Binary framing: Transferring data in binary format for more efficient parsing compared to text-based formats.
Multiplexing: Redefined the underlying HTTP semantics, allowing bidirectional data streams on a single connection. Multiple requests can be sent over the same TCP connection simultaneously, avoiding the latency of establishing new connections and reducing memory consumption. Slow or earlier requests won't block the return of other requests.
Header compression: Reduces redundant data in requests, reducing overhead.
Server push: Allows the server to proactively push necessary resources to the client, reducing latency.
Stream prioritization: Control over data transmission priority, enabling more flexible and powerful page control.
Resettable: Ability to stop data transmission without interrupting the TCP connection.

Drawback: In HTTP2, when multiple requests are in a single TCP pipeline and a packet is lost, the performance is worse than HTTP1.1. TCP has a "packet loss retransmission" mechanism, where lost packets must wait for retransmission confirmation, causing the entire TCP connection to wait. In contrast, HTTP1.1 can open multiple TCP connections, so if this situation occurs, it only affects one connection, and the remaining connections can still transmit data normally.

4. HTTP3 - HTTP Over QUIC:

HTTP is built on top of TCP, and all the bottlenecks and optimization techniques of HTTP are based on the characteristics of TCP. Although HTTP2 implemented multiplexing, it didn't solve the underlying TCP protocol-level issues. HTTP3 with QUIC was developed to address the TCP issues in HTTP2.

Key features of QUIC

About the principles of QUIC, there are many articles that provide relevant introductions. Here, I will list some important features of QUIC. These features are key to the widespread adoption of QUIC, and different businesses can utilize these features to optimize their operations. These features also serve as entry points for providing QUIC services.

1. Connection Migration:

- TCP Reconnection Issue: A TCP connection is identified by a quadruple (source IP, source port, destination IP, destination port). Connection migration refers to maintaining a connection even when any of these elements change, ensuring uninterrupted business logic. Client-side changes, such as IP address changes due to switching from Wi-Fi to 4G, require reestablishing a TCP connection.
- UDP-based QUIC Connection Migration: QUIC uniquely identifies connections using a 64-bit random number as an ID, rather than the IP and port quadruple used by TCP. This means that even if the IP or port changes, as long as the ID remains the same, the connection remains intact, and the upper-layer business logic remains uninterrupted. The probability of ID conflicts is extremely low due to the random generation and length of the ID.

2. Low Connection Latency:

- TLS Connection Latency Issue: Establishing a connection with TCP and TLS involves multiple round trips, resulting in significant latency for short data requests. This latency can greatly impact user experience, especially in poor network conditions.
- True 0-RTT QUIC Handshake: QUIC, being based on UDP, can achieve 0-RTT data transmission in ideal conditions, even for the first connection. In contrast, TCP-based HTTPS still requires 1 RTT for data transmission, even with the best-case scenario of TLS 1.3 early data. For common TLS 1.2 full handshakes, 3 RTTs are needed. QUIC significantly reduces connection establishment latency, especially in RTT-sensitive scenarios.

3. Customizable Congestion Control:

- QUIC uses pluggable congestion control, providing richer congestion control information compared to TCP. Each packet, whether original or retransmitted, carries a new sequence number (seq), allowing QUIC to differentiate between ACKs for original and retransmitted packets, avoiding the ambiguity of TCP retransmissions. QUIC also includes information about the delay between receiving a packet and sending an ACK, enabling more accurate RTT calculations.
- QUIC's ACK Frame supports 256 NACK intervals, providing more flexibility and richer information compared to TCP's Selective Acknowledgment (SACK). This allows clients and servers to know which packets have been received by the other party.
- QUIC's transport control is implemented at the application layer, rather than relying on the kernel's congestion control algorithm. This means that different congestion control algorithms and parameters can be implemented and configured based on different business scenarios. Google's BBR congestion control algorithm, for example, performs differently from CUBIC and shows better performance in weak network and packet loss scenarios. With QUIC, different congestion control algorithms and parameters can be specified for different connections within the same business.

tcp

4. No Head-of-Line Blocking:

- TCP Head-of-Line Blocking Issue: Although HTTP/2 introduced multiplexing, TCP's byte-stream-based nature causes all request streams to be blocked in case of packet loss, affecting the entire multiplexed connection.
- QUIC's Solution to Head-of-Line Blocking: QUIC, being based on UDP, solves the head-of-line blocking problem in its design. TCP's head-of-line blocking occurs when a packet's timeout or loss blocks the sliding of the current window. QUIC avoids this by allowing out-of-order acknowledgments and supporting reordering of packets. When a packet is lost, the window continues to slide as long as new received packets are acknowledged. The sender places the packets that need to be retransmitted into a queue, renumbers them (e.g., from Packet N to Packet N+M), and resends them to the receiver. The handling of retransmitted packets is similar to sending new packets, ensuring that the current window is not blocked by retransmissions.

These features make QUIC a powerful protocol for improving performance, reducing latency, and enhancing reliability in internet communication.

What is WebRTC?

Google released WebRTC in 2011 to solve a very specific problem: How to build Google Meet?

At that time, the internet was a very different place. HTML5 video was primarily used for pre-recorded content. Flash was the only way to do live media, and it was a mess.

Real-time video transmission on the internet was challenging. You needed tight coupling between video encoding and networking to avoid any form of queuing, which added latency. This effectively ruled out TCP and forced you to use UDP. But now you also needed a video codec that could handle packet loss without spewing artifacts everywhere.

Google rightly recognized that it was impossible to segmentally solve these problems with new network standards. Instead, the approach was to create libwebrtc, an actual implementation of WebRTC that still works with all browsers. It can do everything from networking to video encoding/decoding to data transmission, and it does it very well. This was actually a feat of software engineering, especially as Google successfully convinced Apple/Mozilla to embed parts of the full media/network stack in their browsers.

What I love most about WebRTC is that it leverages existing standards. WebRTC is not a single protocol; it's a set of protocols: ICE, STUN, TURN, DTLS, RTP/RTCP, SRTP, SCTP, SDP, mDNS, and more.

What are the issues with WebRTC?

If WebRTC were perfect, I wouldn't be writing this blog post. The core issue is that WebRTC is not only a protocol, it's a whole system.

WebRTC can do a lot of things, so let's break it down:

- Media: A complete capture/encode/network/render pipeline.
- Data: Reliable/unreliable messaging.
- P2P: Peer-to-peer connections.
- SFU: Selective forwarding unit for relaying media.

1. Media:
The WebRTC media stack is designed for conferencing and performs exceptionally well. But when you try to use it for other purposes, the problems start.

The biggest issue was the poor user experience. Sometimes we didn't need extreme latency like Meeting, but WebRTC sacrificed quality with its hard-coded settings.

Overall, customizing WebRTC is quite difficult, except for a few configurable modes. It's a black box that either works or doesn't. If it doesn't work, you have to endure the pain of forking libwebrtc... or hope that Google will help you solve the problem.

2. Data:
WebRTC also has a data channel API, which is particularly useful because until recently, it was the only way to send/receive "unreliable" messages from the browser. In fact, many companies use the WebRTC data channel to avoid using the WebRTC media stack (e.g., Zoom).

I also tried this approach, attempting to send each video frame as an unreliable message. However, this method didn't work due to fundamental flaws in SCTP. I eventually cracked the "datagram" support in SCTP by breaking frames into unreliable messages smaller than the MTU size.

Finally! UDP can be used in the browser, but at what cost:

1. A complex handshake that requires at least 10 round trips!
2. 2x the number of packets because libsctp immediately acknowledges each "datagram."
3. A custom SCTP implementation, meaning the browser can't send "datagrams."

3. P2P:
The best and worst part of WebRTC is its support for peer-to-peer connections.

Even from an application perspective, the ICE handshake is extremely complex. Without going into detail, you have to deal with a lot of permutations based on network topology. Some networks block P2P (e.g., symmetric NAT), while others simply block UDP, forcing you to use TURN servers for a considerable amount of time.

Regardless, most conferencing solutions are client-server, relying on their own dedicated network rather than public transport (CDN). However, servers are still forced to perform complex ICE handshakes, which has significant architectural implications.

4. SFU

Lastly, but equally important, WebRTC uses SFU (Selective Forwarding Unit) for scalability.

The issue with SFUs is subtle: they are custom.

A significant amount of business logic is required to determine where to forward packets. Single servers cannot scale, and participants may not be located in the same geographical location. Each SFU needs to somehow understand the network topology and the location of each participant.

Furthermore, a good SFU will avoid dropping packets based on dependencies, as it would waste bandwidth on undecodable packets. Unfortunately, determining this requires parsing each RTP packet based on each codec. For example, here is the H.264 unpacker in libwebrtc.

What is Media over QUIC?

Media over QUIC(MoQ) refers to the transmission of media content, such as audio and video, using the QUIC (Quick UDP Internet Connection) protocol. QUIC is a transport protocol developed by Google that operates over UDP (User Datagram Protocol) and aims to provide low-latency and reliable connections.

Media over QUIC(MoQ) leverages the benefits of QUIC, such as reduced latency, improved congestion control, and faster connection establishment, to enhance the streaming experience. It allows for efficient and secure transmission of media data, enabling real-time communication applications, video streaming platforms, and other media-intensive services to deliver content with lower latency and improved performance.

By utilizing QUIC's features like multiplexing, stream prioritization, and congestion control, media over QUIC can optimize the delivery of media content, ensuring smooth playback, reduced buffering, and a better overall user experience.

To avoid the mistakes of WebRTC, we need to decouple the application from the transport. If a relay (ie. CDN) knows anything about media encoding, we have failed.

MoqTransport is the base layer and is a typical pub/sub protocol, although catered toward QUIC. The application splits data into “objects”, annotated with a header providing simple instructions on how the relay needs to deliver it. These are generic signals, including stuff like the priority, reliability, grouping, expiration, etc.

MoqTransport is designed to be used for arbitrary applications. Some examples include:

live chat
end-to-end encryption
game state
live playlists
or even a clock!

This is huge draw for CDN vendors. Instead of building a custom WebRTC CDN that targets one specific niche, you can cast a much wider net with MoqTransport. Akamai, Google, and Cloudflare have been involved in the standardization process thus far and CDN support is inevitable.

There will be at least one media layer on top of MoqTransport. We’re focused on the transport right now so there’s no official “adopted” draft yet.

Can i use Media over QUIC instead of WebRTC?

Primarily, it's crucial to acknowledge that WebRTC is here to stay. It excels in its intended purpose: facilitating conferencing. It will be a considerable duration before any technology can match the feature set and latency of WebRTC.

Before considering the replacement of WebRTC, the prerequisite is WebSupport. Fortunately, we now have access to WebCodecs and WebTransport.

However, to genuinely supplant WebRTC, a standard is required. While anyone can create their own UDP-based protocol utilizing this new web technology, and indeed many will, what distinguishes Media over QUIC is that it's being developed through the IETF, the same organization that standardized WebRTC and virtually every other internet protocol.

This process will span years. It will necessitate numerous individuals, like myself, who are eager to replace WebRTC. It will require many companies willing to gamble on a new standard.

Moreover, there are significant issues with both WebCodecs and WebTransport that must be resolved before we can achieve parity with WebRTC. To enumerate a few:

1. We require improved congestion control in browsers.
2. We need a feature akin to transport-wide-cc in QUIC.
3. We need echo cancellation in WebAudio, which might be feasible?
4. We might need FEC in QUIC.
5. We might need additional encoding options, such as non-reference frames or SVC.
6. And, of course, comprehensive browser support for both WebCodecs - WebTransport is essential.

Conclusion

At the current stage, media over QUIC cannot completely replace WebRTC. While both technologies serve different purposes, they can complement each other in certain scenarios.

WebRTC (Web Real-Time Communication) is a comprehensive framework specifically designed for real-time communication applications, such as video conferencing, voice calls, and peer-to-peer data sharing. It provides a set of APIs and protocols for establishing secure, peer-to-peer connections between browsers or other compatible devices.

On the other hand, media over QUIC focuses on optimizing the transport layer for efficient and low-latency transmission of media content. It is primarily used for streaming media, such as video and audio, and can provide benefits like reduced latency and improved performance.

While media over QUIC can be used for streaming media, it does not provide the full suite of features and functionalities offered by WebRTC, such as signaling, peer discovery, and NAT traversal. WebRTC also includes built-in support for audio and video codecs, echo cancellation, and other real-time communication-specific features.

In summary, while media over QUIC can enhance media streaming, it is not a complete replacement for WebRTC, which remains the preferred choice for real-time communication applications.

FAQs

Q: what is WebCodecs?

A: WebCodecs is an emerging web standard that aims to provide a low-level API for encoding and decoding audio and video content in web browsers. It is designed to enable efficient and flexible media processing and manipulation within web applications.

Q: what is WebTransport?

A: WebTransport is an emerging web standard that aims to provide a high-level API for establishing and managing bidirectional communication channels between web browsers and servers. It is designed to enable efficient and reliable data transfer and real-time communication between web applications and servers.

If you would like to further discuss with us, don't hesitate to Contact Us.