4K/8K Video Revolution: Overcoming Challenges with Innovative Solutions

Tencent MPS-Dev Team

Jun 24, 2024

With the continuous evolution of display resolutions and the constant updates in viewing devices, the industry's demand for superior video clarity is on the rise. 4K and 8K Ultra HD resolutions have become widespread across sectors such as television, film production, photography, and gaming. These resolutions offer an enhanced visual experience, allowing viewers to indulge in more vivid and detailed imagery. However, the advent of these ultra-high resolutions and high bitrates presents a new set of challenges that necessitate innovative solutions. This article will delve into how Tencent Media Processing Service (MPS) can harness its media processing capabilities to expedite digitalization upgrades in the media industry.

What is 4K Resolution?

4K Ultra HD: 4K resolution typically refers to a video resolution with a horizontal pixel count of 3840 pixels and a vertical pixel count of 2160 pixels. It is also known as 2160p or UHD (Ultra High Definition). Compared to traditional Full HD (1080p) resolution, 4K provides higher pixel density, resulting in sharper images and richer details.

What is 8K Ultra HD?

8K Ultra HD: 8K resolution refers to a video resolution with a horizontal pixel count of 7680 pixels and a vertical pixel count of 4320 pixels. It is also known as 4320p or FUHD (Full Ultra High Definition). 8K resolution is higher than 4K resolution, providing even higher pixel density and more details, resulting in more realistic and clearer images.

Challenges of Ultra HD video

1. Urgent Need for Efficient Real-Time Encoding Technologies

As 4K/8K video introduces unprecedented resolution and bitrate, conventional encoding technologies struggle to keep up with the demands of real-time processing. Particularly in live broadcasting and major event scenarios, there is an acute need for technologies that can swiftly handle and compress high-resolution video streams to ensure seamless and real-time content delivery.

2. Innovation in Live Streaming System Architecture

Ultra-high-definition video significantly intensifies the demands placed on transmission and distribution system architectures. It becomes imperative to innovatively design systems capable of supporting the transmission of high-bitrate video seamlessly.

3. Growing Demand for Higher Resolutions

As the demand for high-quality content continues to rise, there is an increasing need for higher resolutions to deliver an immersive viewing experience.

4. Memory Optimization Challenges

The pursuit of higher resolutions poses challenges, particularly in terms of memory usage. How to maintain high image quality while optimizing memory bottlenecks is crucial.

By focusing efforts on addressing these core challenges, the development and application of 4K/8K ultra-high-definition video technology can be propelled more effectively.

Technological innovation and solutions

When faced with the challenges brought by 4K/8K ultra-high-definition videos, we have not only conducted thorough analysis of these technical obstacles but also actively explored and implemented a series of innovative optimization solutions, aiming to achieve technological breakthroughs and enhance the overall video processing capabilities.

1. 4K/8K Encoding Optimization

Tencent MPS has developed its own encoding kernels for various video codecs, including H.264, H.265, AV1, and the latest H.266. The advantage of developing in-house encoders is the ability to design encoding features that are more tailored to real-world business scenarios and optimize them accordingly. For example, during the Beijing Winter Olympics, Our live streaming system supported real-time encoding and compression of 4K/8K videos, with a maximum of 120 frames per second (FPS) real-time encoding. To achieve real-time performance, the internal encoder underwent extensive customization and optimization.

From Tencent's self-developed H.265 encoder, V265, it offers improved speed and better compression compared to the open-source X265 encoder. V265 excels in terms of speed, with its fastest preset significantly outperforming X265, enabling fast encoding even at high resolutions. Additionally, V265 supports 8K, 10-bit, and HDR encoding, providing enhanced capabilities for advanced video processing.

Optimization of Ultra-High-Definition Encoding

When it comes to fast encoding of ultra-high-definition videos, we focus on several key optimization directions. The first one is to improve parallelism. We understand that there is frame-level parallelism and macroblock-level parallelism in the encoding process. For real-time encoding of high-resolution content, we have optimized the frame structure to enhance inter-frame encoding parallelism. For macroblock-level parallelism, we support tile-based encoding to optimize row-level encoding parallelism.

On the other hand, encoders often involve a lookahead analysis process, where the video needs to undergo lookahead analysis before subsequent encoding operations can take place. However, lookahead analysis can become a bottleneck in the overall parallel processing pipeline. To address this, we simplify the complexity bottleneck algorithms in the lookahead analysis and post-processing stages, thereby speeding up this process. Through these optimizations, both the encoding processing speed and parallelism are significantly improved.

2. Enhancing System Architecture for Improved Encoding Performance

In live streaming scenarios, it is challenging to achieve real-time encoding and compression efficiency for 8K videos solely through upgrades to the encoding kernel. It requires adjustments to the overall system architecture. Tencent MPS optimizes the system architecture to address the specific needs of live streaming scenarios.

Currently, a common solution is to input the 8K AVS3 video source into a hardware encoder, which outputs multiple streams including 8K H.265, 4K H.265, 1080P H.264, and even 720P H.264 for distribution. While this solution can achieve the desired outcome, it also presents several issues.

Firstly, hardware encoders for 8K are generally expensive, especially for 8K/AV1 encoders, which have limited options and higher prices.
Secondly, compared to optimized software encoders, hardware encoders still have lower compression efficiency. This is due to certain characteristics of hardware encoders that prevent the application of parallelizable acceleration algorithms.
Thirdly, hardware encoders often come with fixed architectures and chips, making it difficult to adapt quickly to various business scenarios. The constantly changing and upgrading business requirements pose a significant challenge for hardware encoders. However, if the same encoding results can be achieved through software encoding, it would be possible to balance transcoding efficiency and business flexibility.

To address these issues, we have made several adjustments to the overall architecture of our live streaming system. First, let's briefly explain the architecture of a typical live streaming system. The stream is pushed to an upload gateway, which then processes and transcodes the live stream. The transcoded output is then pushed to a CDN for distribution and viewing.

For 8K video encoding, relying on a single machine or transcoding node in the existing live streaming processing chain makes it difficult to achieve real-time software encoding. Therefore, in this context, we have designed a platform for processing ultra-high-definition live streams.

The ultra-high-definition live transcoding node does not perform actual transcoding but rather handles stream encapsulation. It slices the incoming source stream into TS (Transport Stream) segments and sends these TS segments as files to a video transcoding processing cluster. The transcoding processing cluster can parallelly process multiple TS segments, enabling distributed encoding across multiple machines. This optimization consolidates the original single-stream encoding tasks into a cluster of transcoding processing machines, allowing for distributed processing of live streams.

The benefits of this approach are twofold. Firstly, it offers high flexibility through pure software control. Whether it's expanding processing capacity or upgrading the business, the processing workflow is highly convenient. Secondly, it helps save costs by allowing offline transcoding clusters and live streaming clusters to coexist, enabling greater resource reuse and improving resource utilization for a wider range of business scenarios. However, it does have a drawback: the latency is relatively higher than the standard transcoding process. This is because parallel transcoding requires an initial encapsulation process during stream processing, which introduces some waiting time to generate independent TS segments. However, the latency remains within an acceptable range, especially when using HLS (HTTP Live Streaming) for downstream live streaming, where the latency does not show significant changes.

In our system, we convert real-time live streaming of 4K/8K ultra-high-definition videos into multiple parallel and independent offline transcoding tasks using a parallel encoding approach in an offline processing cluster. The capability of ultra-high-definition encoding can be utilized within the offline transcoding nodes. During the transcoding process within these nodes, we can ensure a bandwidth savings of over 50% for the same subjective quality rating. Compared to hardware encoders, we can achieve a compression efficiency improvement of over 70%. In other words, with the aforementioned system solution, the required encoding bitrate for live streaming 4K/8K ultra-high-definition videos is only around 30% of that of hardware encoders for the same visual quality. Alternatively, at the same bitrate, ultra-high-definition encoding can achieve a subjective score improvement of over 20%.

In the internal processing workflow of each independent offline transcoding node, considering the entire end-to-end chain, after receiving the video source, it undergoes decoding. Following decoding, we perform scene classification on the video source, applying different encoding strategies based on the classification. Subsequently, scene detection takes place, including noise detection, artifact detection, and more, to analyze the presence of noise, artifacts, and edge issues in the video source, preparing for subsequent encoding optimizations.

3. Solutions for Content Enhancement

Currently, many playback devices support 4K resolution, but not all content is available in 4K resolution. Tencent MPS leverages its technological capabilities to upgrade older content and achieve the clarity of 4K resolution. This allows viewers to enjoy a true 4K playback experience that is visually appealing to the human eye.

The generation process of our ultra-high-definition 4K videos involves several steps. Firstly, the source is analyzed for factors such as noise presence and compression artifacts. Based on these results, comprehensive data degradation is performed, including denoising, texture enhancement, noise suppression, and more. One notable aspect worth mentioning, which we have observed in our practice, is that focusing on areas that are highly important to human viewers, such as faces and text, can significantly enhance the viewing experience.

After enhancing the details, color correction is applied. With the widespread use of HDR (High Dynamic Range) capabilities in 4K/8K videos, we perform SDR (Standard Dynamic Range) to HDR conversion for content that does not have HDR playback effects. This allows us to achieve a true 4K effect with vibrant colors while maintaining high-definition resolution.

4. Optimizing Memory Bottlenecks

Meanwhile, memory bandwidth may become a major bottleneck when processing high-resolution content such as 8K, and optimizing memory bottlenecks is crucial in high-resolution scenarios such as 8K.

For example, let's consider a CPU with a clock speed of 3.2GHz and four 32GB memory modules, resulting in a memory bottleneck bandwidth of approximately 102GB. However, when performing real-time encoding for 8K content at 50 frames per second with 10-bit color depth, the required data bandwidth per second is around 4.7GB. In practice, the memory transfer process can only occupy a very small portion of the overall encoding time, with the majority of time dedicated to encoding computations. The instantaneous bandwidth required can be several times higher than 4.7GB, especially when the system supports multi-threaded parallel video frame copying, leading to increased instantaneous bandwidth and limiting the speed of 8K encoding.

The encoding and decoding of 8K videos impose significant demands on memory bandwidth. When configuring hardware for such tasks, it is important to consider memory configuration based on the CPU's memory channels, aiming to maximize memory bandwidth. In this regard, Tencent MPS has undergone memory pool reconstruction, performing decoding, pre-processing operations, and watermarking in-place without requesting new memory. This approach minimizes the need for memory data copying, thereby reducing memory bandwidth usage.

Conclusion

These solutions and optimization strategies are designed to enhance the encoding efficiency of 4K/8K videos, improve the transmission and playback experience while reducing costs and enhancing the quality and availability of content. Through these technological innovations, it becomes possible to better adapt to the demands of emerging technology scenarios and drive the development of ultra-high-definition video technology.

If you would like to further understand Tencent MPS, welcome to Contact Us for more information.