With the widespread adoption of the Internet and 5G infrastructure, video consumption represented by live and vod has become an integral part of people's daily lives. More and more users are watching videos for entertainment and learning, and excellent audiovisual quality has a crucial impact on the viewing experience. On the one hand, people want high-quality video in these scenarios. On the other hand, a better auditory experience is also indispensable, including less noise interference and more stable audio loudness. Tencent Cloud Media Processing Service (MPS) already covers video, audio, and subtitles, and its audio processing capabilities are constantly evolving to help live, vod, and other businesses achieve the ultimate audio experience.
Currently, Tencent MPS audio processing capabilities are mainly divided into four parts, covering noise cancelation, audio separation, volume equalization, and audio improvement. Each audio processing capability can independently enhance audio for different application scenarios and practical needs, or they can be combined to comprehensively process audio streams under complex requirements, thereby improving the overall auditory experience.
MPS Audio Enhancement Template
Traditional noise cancelation solutions based on signal processing methods can only handle stationary noise to some extent, while they have little effect on transient noise. To address this, Tencent MPS has developed an AI-based noise cancelation solution. Based on a large amount of clean speech data and real noise data, the noisy speech signals under different environments and different signal-to-noise ratio conditions are simulated by randomly mixing clean speech data and noise data. In this way, a supervised learning method is used to train a powerful and generalized speech enhancement and denoising model. Tencent MPS's noise cancelation solution has the following characteristics:
Case Study of Noise cancelation:
Noisy Environment | Before Processing | After Processing |
Outdoor Noise (Natural Wind Noise + Bird Chirping) | ||
Indoor Noise (Microphone Hissing + Background Voices + Constant Noise) |
Case Study of Controllable Noise Reduction:
Before Processing | |
Weak Noise Reduction | |
Relatively Weak Noise Reduction | |
Strong Noise Reduction | |
Intense Noise Reduction |
The goal of audio-denoising tasks is to recover a cleaner speech signal from a mixed audio signal contaminated with noise. Audio separation tasks are similar, aiming to extract the target signal components from the mixed audio stream, but the target components are not limited to a single speech signal. Tencent MPS has designed an AI-based audio separation solution. Tencent MPS's audio separation solution has the following characteristics:
Separating Voiceover and Background Music in Film and TV
Before Processing | Vocal Separation | Background Sound Separation |
Separation of Song Accompaniment
Before Processing | Vocal Separation | Background Sound Separation |
In live and vod scenarios, we need to use adaptive volume equalization algorithms to automatically adjust the loudness of audio streams, stabilizing them within an appropriate range and enhancing the user's auditory experience. Tencent MPS has developed a volume equalization solution based on automatic gain control algorithms and the EBU R.128 audio loudness standard, which can solve issues such as volume being too large, too small, or fluctuating. The volume equalization solution has the following features:
Case Study of Volume Equalization:
Volume issues | Before Processing | After Processing |
Volume too high | ||
Low volume | ||
Volume fluctuates |
In live and vod scenarios, there may be impulse noise and popping sounds caused by microphone malfunctions, network transmission packet loss, discontinuous audio frame processing, etc. These types of audio faults can also negatively impact the listening experience. Therefore, Tecnent MPS has developed a noise detection and repair technology that can diagnose audio streams in real-time, determine whether there is noise interference, and automatically repair faults to restore high-definition audio. For speech signals, Tencent MPS has developed a sibilance suppression solution that can beautify the hissing sound caused by high-frequency airflow, improving the quality of speech perception.
Case Study of Audio Restoration and Improvement:
Types of audio Improvement | Before Processing | After Processing |
Noise removal | ||
De-essing |
Tencent MPS has achieved significant advantages in audio-denoising technology due to algorithmic improvements. By proposing new solutions based on MPCRN and VSANet, the algorithm has been refined and enhanced, resulting in a more efficient and effective noise reduction approach. These advancements not only maintain the integrity of the original audio but also provide a competitive edge in delivering high-quality audio experiences across various scenarios, such as live streaming and video-on-demand services.
This article mainly introduces the audio processing capabilities of Tencent MPS. In fact, we have a deep technical accumulation in the field of audio processing, having published numerous academic papers and technical invention patents. Regarding the future development prospects of MPS, we have the following considerations:
If you're interested in learning more, welcome to Try Demo and Contact Us.
MPS Audio Processing Experience Demo