Mastering WebRTC: A Complete Guide to Implementation, Reliability, and Security Best Practices

Tencent MPS-Dev Team

Jun 14, 2024

WebRTC (Web Real-Time Communications), as an advanced web communication technology, is gradually becoming an integral part of modern browsers. It enables users to have real-time audio and video calls and exchange data directly on web pages without the need for any plugins. However, the security issues associated with WebRTC cannot be ignored, especially the potential risks it poses to user privacy.

In this article, we will delve into the technical architecture of WebRTC and explore its key components and workflows during implementation. From signaling processing to media negotiation, and then to the establishment of data channels, we will examine each step and discuss how to ensure the security and confidentiality of the communication process.

Additionally, we will pay special attention to the issue of potential real IP address leakage caused by WebRTC and provide effective preventive measures. For developers who need to build applications using WebRTC while protecting user privacy, this article will provide some insights.

What is WebRTC?

WebRTC (Web Real-Time Communication) is an open-source project and technology that enables real-time communication, including video, voice, and data sharing between peers directly in web browsers and mobile applications, without the need for plug-ins or external software. The WebRTC initiative aims to make peer-to-peer communication between devices simple and efficient through standard APIs that can be implemented in all modern web browsers.

WebRTC is widely used for applications requiring real-time communication, such as web-based video conferencing tools, voice applications, real-time gaming, and live streaming apps. Its easy-to-use, versatile, and secure nature has made it a cornerstone technology for modern web applications that aim to provide seamless interaction among users.

The implementation process of WebRTC

1、Getting Media Stream:

Getting media stream is the first step in WebRTC, which involves capturing audio and video data from the user's camera and microphone. In WebRTC, the getUserMedia API is used to obtain the media stream. Here are the detailed steps for obtaining a media stream:

Checking browser support: First, you need to check if the browser supports the getUserMedia API.
Requesting media stream: Once it is confirmed that the browser supports getUserMedia, you can use the API to request a media stream. By calling the getUserMedia method and passing an object containing the desired media types and constraints, you can request access to the camera and microphone.
Handling media stream: Once the user grants access to the camera and microphone, the getUserMedia method returns a MediaStream object. You can use this object to handle the media stream.

It is important to note that obtaining a media stream may require user authorization. The browser will display a permission request dialog, asking the user if they allow access to the camera and microphone. The user can choose to grant or deny the access.

2、Establishing a signaling channel:

Establishing a signaling channel is a crucial step in enabling communication between peers in WebRTC. The signaling channel is used to exchange Session Description Protocol (SDP) and candidate addresses, allowing peers to establish a connection.Here are the detailed steps for establishing a signaling channel:

Choosing a signaling server: First, you need to select a signaling server to act as an intermediary between peers. The signaling server can be a self-hosted server or a third-party service. The main role of the signaling server is to forward and store signaling messages between peers.
Connecting to the signaling server: Peers need to establish a network connection with the signaling server. WebSocket, HTTP, or other protocols can be used to communicate with the signaling server. Before the connection is established, peers need to provide some authentication information to ensure security.
Exchanging identity information: Once peers are connected to the signaling server, they need to exchange some identity information to recognize and authenticate each other. This information can be unique identifiers, tokens, or other credentials.
Exchanging SDP: Peers exchange Session Description Protocol (SDP) through the signaling server to describe the media streams. SDP contains detailed information about the media streams, such as codecs, transport protocols, media types, etc. Peers generate their local SDP using the createOffer and createAnswer methods of the RTCPeerConnection object and send it to the other peer through the signaling server.
Exchanging ICE candidates: Peers exchange ICE candidates through the signaling server. ICE candidates are a combination of a device's local IP address and port number. Peers listen for the generation of ICE candidates using the onicecandidate event of the RTCPeerConnection object and send them to the other peer through the signaling server.
Signaling message exchange: Peers communicate signaling messages through the signaling server. These messages can be SDP, ICE candidates, or other custom signaling information. Peers engage in bidirectional communication through the signaling server to ensure that they can receive and send signaling messages to each other.
Handling signaling messages: Upon receiving signaling messages from the other peer, peers need to parse and process these messages. Depending on the message type, peers can use methods of the RTCPeerConnection object, such as setLocalDescription, setRemoteDescription, addIceCandidate, etc., to set the local SDP and ICE candidates.

Through the aforementioned steps, peers can exchange media stream description information and ICE candidates via the signaling channel to establish a connection and engage in real-time communication. The implementation of the signaling channel can be adjusted and expanded based on specific application requirements and technological choices.

3、NAT traversal and establishing direct peer-to-peer connections

When devices are located in different private networks, using NAT traversal techniques to punch holes can allow them to establish direct peer-to-peer connections. Here is a general process of hole punching:

Candidate address collection: Devices collect candidate addresses by using STUN (Session Traversal Utilities for NAT) servers and TURN (Traversal Using Relays around NAT) servers. STUN servers are used to obtain the device's public IP address and port number, while TURN servers act as relays for data transmission when direct communication is not possible.
Candidate address exchange: Devices exchange candidate addresses through a signaling channel to inform each other about their network addresses and port numbers. This can be achieved by including the candidate address information in signaling messages for exchange.
Hole punching attempts: Devices attempt to directly send packets to each other to establish a direct peer-to-peer connection. During this process, devices send special UDP packets to create mapping rules on NAT devices, allowing the packets to traverse the NAT.
- Sending packets: Device A sends a UDP packet to Device B, which includes its own candidate address information.
- NAT mapping: The outbound mapping table of NAT device A will record the source IP address and port number of the packets sent by device A and map them to a public IP address and port number.
- Packet transmission: After receiving the packet sent by device A, the inbound mapping table of NAT device B will map the destination IP address and port number of the packet to device B's private IP address and port number.
- Connection establishment: Both device A and device B know each other's public IP address and port number, allowing them to communicate directly and establish a peer-to-peer connection.
Relay communication: If direct communication fails, devices will attempt to use a TURN server as a relay for communication. The device sends data to the TURN server, which then forwards the data to the target device. This approach bypasses NAT restrictions but introduces additional latency and bandwidth consumption.

The hole punching process involves interaction between devices and communication with the server to establish a direct peer-to-peer connection. Through candidate address gathering, exchange, and hole punching attempts, devices can try to communicate directly and establish a peer-to-peer connection. If direct communication fails, devices can use a TURN server as a relay for communication. Hole punching technology is a crucial part of NAT traversal, allowing devices to establish direct peer-to-peer connections in NAT environments and enable real-time communication.

4、Media stream transmission:

Once the connection is established, peers can begin transmitting media streams.Here are the steps involved in media stream transmission:

Media Encoding and Decoding: The sending device uses encoders to encode the raw audio and video data into appropriate formats for transmission over the network. The receiving device uses decoders to decode the received data into playable audio and video streams.
Media Stream Rendering: The receiving device utilizes the MediaStream API to display the received media stream on the user interface. For example, video streams can be displayed using the <video> element, while audio streams can be played using the <audio> element.

WebRTC Implementation Process.png
WebRTC Implementation Process

WebRTC Reliability

Overview

WebRTC technology enables direct peer-to-peer connections between browsers, facilitating efficient real-time communication. The core of this technology lies in its redundant packetization and forward error correction (FEC) mechanisms, which ensure smooth and reliable communication even under challenging network conditions.

Redundant packetization method

1、RedFEC: It provides basic redundancy protection by duplicating packets, but it is not efficient in terms of bandwidth utilization.

RedFEC packetization method.png
2、UlpFEC: It generates redundant packets through XOR operations, allowing for effective recovery in the event of multiple packet losses. This method is more commonly used in WebRTC. For example, in the diagram below, D represents media packets, R represents redundant packets, and the redundancy level shown in the diagram is 2.

Packetization at the sender

Packet loss in the network

Packet loss recovery

3、FlexFEC: Similar to UlpFEC in implementation, UlpFEC only performs XOR operations on a 1D row array, while FlexFEC is more flexible and introduces interleaving algorithms, allowing XOR operations on 1D rows, columns, and 2D arrays. However, it is currently still in the draft stage.

Summary of FEC algorithms

The three main FEC algorithms in the field of wireless transmission are TURBO, LDPC, and POLAR.

The following are several FEC algorithms used in the field of audio and video transmission:

The Opus audio codec used in WebRTC employs inband FEC and interleaving coding techniques.
The video component of WebRTC uses UlpFEC, which utilizes XOR operations.
The Reed-Solomon algorithm is relatively complex but theoretically has strong data recovery capabilities.

WebRTC IP Leakage

Many people may mistakenly believe that using a proxy can completely hide our real IP address, but that's not always the case. In fact, numerous articles have pointed out the security risks of WebRTC. The frightening aspect of WebRTC security risks is that even if you use a VPN proxy to browse the internet, your real IP address may still be exposed.

While this may sound somewhat concerning, we should not lose confidence in proxy technologies. Proxies are still very useful tools that can protect our online privacy and security. However, we need to be aware that proxies are not 100% foolproof, and therefore, we need to take additional measures to safeguard our privacy and security.

Principle of WebRTC IP leakage

WebRTC allows browsers to establish direct peer-to-peer connections for real-time communication, such as video, voice, and data transmission. During the establishment of a WebRTC connection, browsers send their own IP addresses to each other to establish the connection. Attackers can access the WebRTC API through JavaScript or other techniques to obtain users' IP addresses, enabling tracking, monitoring, or potential attacks.

Specifically, attackers can exploit the browser's WebRTC API by requesting permission to access media devices, thereby obtaining the user's IP address. Attackers can accomplish this by writing malicious JavaScript code, which can be injected into websites to execute cross-site scripting (XSS) attacks and other malicious activities.

In addition, the STUN/TURN servers used in WebRTC can also leak users' IP addresses. STUN/TURN servers are crucial components in WebRTC for NAT traversal and relaying. If these servers have vulnerabilities or are not properly configured, attackers can exploit them to obtain users' real IP addresses, enabling potential attacks.

Preventive measures

Install the WebRTC Leak Shield extension:

Conclusion

As this article concludes its in-depth exploration of WebRTC technology, we hope readers have gained a comprehensive understanding of its powerful capabilities and potential security risks. WebRTC offers unprecedented real-time communication abilities but also presents challenges for privacy protection. By taking appropriate preventive measures and following best practices, we can ensure the responsible use of this technology while safeguarding the security of user data.

We encourage developers to continue harnessing the potential of WebRTC to create innovative and secure applications. Thank you for reading, and if you have any questions or topics you'd like to discuss in-depth, please feel free to contact us at any time.