In recent years, there have been several waves of audio and video-related trends in the internet industry, such as VR, short videos, and live streaming. While VR is still developing due to technological maturity issues, short videos and live streaming continue to gain popularity and enter new industry applications in various ways. However, the limitations of the popular upstream transmission protocol RTMP are becoming increasingly apparent, such as the FLV container format not supporting new codecs, multiple audio tracks, and low timestamp accuracy. Additionally, RTMP's TCP-based transmission is not suitable for real-time audio and video transmission due to its fair and reliable transmission design.
New upstream transmission methods have emerged in the industry, including SRT, RIST, WebRTC-based streaming, and fmp4-based streaming (DASH-IF Live Media Ingest Protocol).
The SRT protocol is derived from the UDT protocol, including protocol design and codebase. UDT is a UDP-based file transfer protocol originally designed for high-bandwidth, high-latency scenarios such as long-distance fiber optic transmission to overcome the limitations of TCP. For detailed information about the UDT protocol, refer to "Experiences in Design and Implementation of a High Performance Transport Protocol, Yunhong Gu, Xinwei Hong, and Robert L. Grossman 2004]". Haivision used UDT for streaming media transmission and added optimization features for streaming media scenarios, such as end-to-end fixed latency, to transform it into the SRT protocol. The SRT protocol standard is currently in the draft stage. Haivision open-sourced the libsrt project in 2017 and established the SRT Alliance, which now has over 500 members. Tencent MPS is one of the members of the SRT Alliance.
The most prominent feature of SRT is its fixed end-to-end latency. In general transmission protocols, the time from sending data (send()) to receiving data (receive()) fluctuates. SRT smooths out network jitter and ensures a relatively constant time from srt_sendmsg() to srt_recvmsg(). The srt_sendmsg() function timestamps the data, and the receiving end checks the timestamp to determine when to deliver the data to the upper-layer application. The core idea is not complicated: buffering against jitter and maintaining fixed latency based on timestamps. The implementation also considers time drift issues.
The end-to-end latency is approximately equal to the configured SRTO_RECVLATENCY + 1/2 RTT0, where RTT0 is the RTT calculated during the handshake phase. The configured latency should compensate for RTT fluctuations and take into account the number of packet retransmissions.
The fixed end-to-end latency is controlled by the TSBPD (Timestamp-based packet delivery) switch, and whether packet loss is allowed is controlled by TLPKTDROP (Too late packet drop). The former is influenced by the latter, and only when packet loss is allowed can fixed latency be guaranteed. Otherwise, in poor network conditions, the received packets may exceed the configured latency. When TSBPD and TLPKTDROP are both enabled, timed-out packets are no longer retransmitted because, according to the fixed latency design, they would be discarded by the receiving end even if sent. Fixed latency means that either the data is delivered to the upper-layer application on time or expired packets are passively or actively discarded.
The fixed latency at the transport layer simplifies the buffer design for the receiving end consumer. For example, in a player, there are usually multiple levels of buffering, including buffers within the I/O module or between I/O and demux, as well as buffering queues between demux and rendering. The core buffer module is generally placed after demux and before decoding because it allows for calculating the duration of buffered data. The duration can also be calculated after decoding but the data volume after decoding is large, making it unsuitable for buffering data for a long time and controlling the buffer level. By achieving fixed latency buffering in the I/O module, the player can eliminate the buffering modules between I/O and demux, and demux and decoding, and only require simple data management. The multi-level buffering design introduces high latency, and the buffer in the I/O module is only a small part of the entire player buffer, resulting in weak packet loss recovery capability. By handing over the buffer management to the transport module, the packet loss recovery capability is enhanced.
SRT supports two mechanisms for combating weak networks, ARQ and FEC. Additionally, the broadcast mode in link bonding can also improve data transmission reliability.
2.2.1 ARQ
SRT's ARQ design uses both ACK and NACK mechanisms. The receiver sends ACK in two situations:
In addition to the usual ACK and NACK packets, SRT also has ACKACK packets.
NACK is sent by the receiver in two situations:
Periodic NACK may result in a packet being retransmitted more than once, but it ensures low latency.
2.2.2 Simulation testing
SRT has strong resistance to random packet loss, but in high packet loss scenarios, it consumes a higher bandwidth. Using FFmpeg as the SRT client and server, various packet loss rates (0%, 10%, 20%, 50%, 70%) were simulated using NetEm to create a weak network environment. The video bitrate was approximately 5 Mbps with a frame rate of 25 fps.
Even with a 70% packet loss, the receiver can still maintain a stable frame rate. On the other hand, at a 20% packet loss rate, the bandwidth usage is already twice the video bitrate. These experimental results represent scenarios with sufficient bandwidth and high random packet loss. In production environments, more adverse scenarios with insufficient available bandwidth need to be considered.
In addition to the fixed latency feature, resistance to random packet loss can also help reduce latency. TCP reduces its sending speed and has slow retransmission when encountering packet loss due to congestion control, resulting in data accumulation at the sender and increased latency. The following graph shows a comparison of latency in a high RTT with 1% random packet loss scenario. Since the server used for testing does not support SRT downstream, only 1% packet loss was simulated upstream.
RTT ranges from 250 ms to 400 ms with significant fluctuations.
Ping:
time=321.291 ms
time=291.814 ms
time=362.499 ms
SRT file mode has congestion control, while live mode only has pacing control without congestion control. However, libsrt provides rich statistical information, allowing the application layer to adjust video capture and encoding based on the statistics to avoid congestion and buffering issues. From the perspective of real-time performance in a live streaming system, only doing congestion control at the transport layer cannot guarantee low latency and cope with situations where the transmission bandwidth is consistently lower than the video bitrate.
SRT's pacing calculates the packet-sending interval based on the maximum sending bandwidth. The maximum sending bandwidth is determined by three strategies:
These configurations have priorities. For example, if max_bw is configured, input_bw, overhead, and other configurations are ignored. The combination relationship is shown in the figure below:
Mode / Variable | MAX_BW | INPUT_BW | OVERHEAD |
MAXBW_SET | v | - | - |
INPUTBW_SET | - | v | v |
INPUTBW_ESTIMATED | - | - | v |
Note: The default configuration for SRT live mode is max_bw 1 Gbps. When the data writing speed to SRT is not uniform enough, a high max_bw value can lead to ineffective pacing control, causing a sudden increase in the packet loss rate. Understanding and properly configuring these three parameters is crucial.
SRT has strong resistance to random packet loss, but it consumes a higher bandwidth in high packet loss rate scenarios. It is suitable for low-latency upstream streaming scenarios with sufficient network bandwidth. When the upstream bandwidth is low, SRT's statistical information can be used to dynamically adjust encoding parameters to avoid congestion and buffering. SRT has room for improvement in the following directions:
SRT is targeted at the broadcasting industry, requiring manual testing of network conditions for parameter tuning. The configuration is complex and lacks adaptive features, making it difficult to apply a single set of configurations to cope with changing network environments. Simplifying configuration and adding adaptive strategies are optimization directions for SRT.
For example, the current configuration of latency is based on a time unit. Configuring latency should consider the network's RTT, but RTT is only known at runtime. If RTT can be used as the unit, configuring latency as latency = N * RTT would greatly reduce the configuration difficulty and adapt to different network conditions.
Currently, SRT lacks congestion control and only has simple methods such as configuring input bitrate and output bandwidth limit. It does not perform well in scenarios where the transmission bandwidth is lower than the video bitrate, the video bitrate changes significantly, or the network is unstable. SRT should have basic congestion control strategies combined with video encoding to cope with congestion.
SRT does not limit the container format, but the loss mode relies on a container format with error recovery and resynchronization mechanisms, leaving only TS as a viable option. However, TS has a high overhead in the encapsulation layer due to various factors such as fixed 4-byte TS headers, padding, and empty packets when using fixed bitrates.
For the lossless mode, there are several container format options, including flv, mkv, and fmp4.
RTMP over SRT allows traditional RTMP-based upstream SDKs to seamlessly migrate. The advantage is the smooth integration with existing RTMP streaming systems, but there are some additional considerations:
The RIST protocol was proposed in 2017 and has released two profiles to date: the simple profile in 2018 and the main profile in 2020.
The simple profile inherits the RTP protocol and is compatible with it. It adds the following to the RFC 3550 RTP protocol:
- Redundancy mode: It sends duplicate data through multiple network interfaces to increase reliability.
- Round-robin mode: It sends different data through different network interfaces in a round-robin manner to increase bandwidth.
RIST retains the multicast mode of RTP, but currently, it is only suitable for controlled local network environments due to the special nature of multicast.
In addition to the features of the simple profile, the main profile adds the following features:
- Implemented based on GRE-over-UDP RFC 8086.
- RTP and RTCP share the same transport channel and port.
- Multiple streams can share the same transport channel and port.
- Multiple streams plus user-defined data streams can share the same transport channel and port.
- DTLS encryption.
- Pre-Shared Key encryption.
- Scenario: Transmitting a 100 Mb/s bitrate TS stream, the RTP sequence number wraps around every 6.9 seconds. When the ARQ retransmission limit is configured as 7, the maximum supported RTT is 1 second. In other words, in scenarios with high bitrates and high latency, the RTP sequence number wraparound limits the number of retransmissions.
- Solution: Use RTP header extension to extend the RTP sequence number to 32 bits.
In the broadcasting industry, RIST and SRT are in a competitive state. In the internet industry, RIST still has the following issues to be resolved:
- RIST follows the RTP approach, using IP and port as the session, which means a server cannot support multiple clients with one or a pair of ports. For each additional client, the server needs to open two new ports. Except for specific restricted networks, large-scale deployment in the public network faces challenges.
- Apart from IP and port as the mapping between clients and servers, the RIST standard does not define other means to uniquely identify clients. Clients cannot pass streamid-like identification information to the server.
- The main profile does not support backward compatibility with the simple profile: When one end is configured as the simple profile and the other end is configured as the main profile, there is no complete set of fallback strategies to automatically downgrade the main profile end to the simple profile.
SRT and RIST are both protocols designed for reliable video transport over the internet. However, they have some differences:
Origin and Support: SRT was developed and open-sourced by Haivision. It has a large community and is supported by the SRT Alliance, which includes over 450 members. On the other hand, RIST was developed by the Video Services Forum (VSF), a consortium of broadcast industry vendors.
Error Correction: Both protocols use ARQ (Automatic Repeat reQuest) for error correction. However, RIST uses a technique called "Negative Acknowledgement (NACK)" with a retransmission buffer, while SRT uses a hybrid of ARQ and FEC (Forward Error Correction).
Security: SRT includes built-in encryption (AES) to secure the streams, which is optional in RIST.
Compatibility: RIST is designed to be compatible with existing protocols and standards, including RTP (Real-time Transport Protocol). SRT, however, is a newer protocol and may not be as compatible with existing infrastructure.
Latency: Both protocols handle network congestion and packet loss to minimize latency, but they do it in different ways. SRT uses a technique called "too-late packet drop" while RIST uses a combination of packet retransmission and jitter buffer management.
In summary, both SRT and RIST are designed to deliver high-quality video over unreliable networks, but they differ in their approach to error correction, security, compatibility, and latency management. The choice between the two often depends on the specific requirements of the use case.
Tencent MPS supports SRT protocol for live streaming, and customer feedback indicates that SRT significantly improves streaming buffering issues compared to traditional RTMP.
1. Assemble independently through splicing rules. For detailed operations, please refer to Splicing Live Streaming URLs.
2. Enter the Tools > Address Generator of the Cloud Streaming Services console, select URL type: Push Address, and select domain name as needed. For detailed operations, please refer to Address Generator.
If you have any questions about our services, don't hesitate to Contact Us.