The terminal SDK is a suite of audio and video terminal product capabilities launched by Tencent Cloud. It encompasses three types of SDKs for video encoding, audio enhancement, and video enhancement. Tailored to meet diverse customer needs, it supports access from multiple terminals such as mobile, web, and PC.
Terminal Video Codec SDK
Tencent's Top Speed Codec (TSC) terminal video encoder is designed for scenarios requiring low computational power, low latency, and high-quality image on the terminal side. Compared with hardware encoding, its advantages include:
1. Stable, reliable, and quick to start.
2. At the same quality level, it saves bitrate, enhances transmission stability, reduces downlink distribution bandwidth, and saves on storage costs.
3. At the same bitrate, it improves image quality and enhances user experience.
4. A rich set of features meets diverse business needs, such as using Regions of Interest (ROI) encoding to improve the image quality in the face region and dynamically adjusting encoding configuration to adapt to network fluctuations.
The client audio SDK provides audio encoding and enhancement capabilities. It achieves effects including adaptive noise suppression, acoustic echo cancellation, and automatic gain control, significantly improving audio quality by eliminating echo and noise.
The client enhancement SDK, based on efficient image processing algorithms and AI model inference capabilities, achieves terminal video super-resolution, image quality enhancement, frame interpolation, and other features.
Compared with Video on Demand (VOD) and Cloud Streaming Services (CSS) encoding, terminal-side encoding requires different solutions.
Encoding Mode
VOD
CSS
Terminal-side Codec
Typical Business
WeTV, video account, and other mainstream on-demand services
Video account live streaming, Tencent sports live streaming, and other mainstream live streaming services
VooV Meeting, WeChat video call, and 5G remote control services
Latency Requirements
Pursues an extreme compression rate, with no latency requirements.
Pursues a high compression rate, allowing second-level latency.
Pursues a high compression rate while requiring zero latency.
Real-Time Requirements
Pursues an extreme compression rate, with no real-time requirements.
Allows multi-frame average real-time under multi-threading.
Requires real-time encoding under single-threading.
Network Condition Constraints
Encoding process is unrelated to network status, with fixed encoding configuration.
Encoding process is unrelated to network status, with fixed encoding configuration.
Encoding process is strongly related to network status, requiring dynamic adjustment of encoding configuration based on network status.
Scenario Characteristics
1 -> N, no interaction
1 -> N, no interaction
N <-> N, strong interaction
Solution
Server-side encoding
Server-side encoding
Terminal-side encoding
Tencent's Top Speed Codec (TSC) terminal video encoder is designed for scenarios requiring low computational power, low latency, and high-quality image on the terminal side. Compared with hardware encoding, its advantages include:
1. Stable, reliable, and quick to start.
2. At the same quality level, it saves bitrate, enhances transmission stability, reduces downlink distribution bandwidth, and saves on storage costs.
3. At the same bitrate, it improves image quality and enhances user experience.
4. A rich set of features meets diverse business needs, such as using Regions of Interest (ROI) encoding to improve the image quality in the face region and dynamically adjusting encoding configuration to adapt to network fluctuations.
SDK Access Process
1. Evaluation and Trial: Customers provide system platform and demand information, and apply for product trial.
System platform: Android, iOS, Windows, macOS, etc.
Use cases: live streaming, VOD
Encoding specification: encoding format, resolution, frame rate, bitrate, latency requirements, etc.
Optimization objectives: bitrate savings, image quality enhancement, CPU savings, and respective assessment metrics (PSNR, SSIM, VMAF, etc)
2. Development and Integration: Integrate the beta version of the SDK into the app, for performance evaluation and custom optimization.
Based on customer effect evaluation results and specific business scenario needs, provide in-depth optimization support.
3. Launch and Release: Apply for a license, integrate the official version of the SDK with license authorization, and test and launch the app.
If the license is about to expire or has expired, you can apply for a license renewal.
SDK Integration
The video codec SDK is implemented in C/C++/Assembly, providing a unified C interface for various system platforms.
Android
● Provides ARMv7 and ARMv8 version dynamic libraries, and the application is integrated via NDK.
● Provides Java interface encapsulation. The interface is basically consistent with Android's hardware encoding MediaCodec, facilitating parallel replacement of MediaCodec.
iOS
Provides ARMv8 and x86_64 version XCFramework.
macOS
Provides ARMv8 and x86_64 version framework.
Windows
Provides x86 and x86_64 version dynamic libraries.
Basic Video Encoding Process
TSC Terminal Audio SDK
Product Overview
The client audio SDK provides audio encoding and enhancement capabilities, significantly improving audio quality by eliminating echo and noise.
Details of features for each edition are as follows:
Feature Point
Standard Edition
Professional Edition
Premium Edition
Acoustic Echo Cancellation
Supported
Supported
Supported
Automatic Gain Control
Supported
Supported
Supported
Adaptive Noise Suppression
Supported
Supported
Supported
Echo Cancellation Music Mode
-
Supported
Supported
Volume Equalization
-
Supported
Supported
AI Intelligent Noise Reduction
-
Supported
Supported
Audio Encoding
-
-
Supported
AI Codec
-
-
Supported
Real-Time Communication Audio 3A
Audio 3A technology is a set of basic features in sound signal processing, commonly used in real-time communication systems such as video conferencing, calls, and live microphone connections, to ensure high-quality audio signal transmission, and provide better communication quality and audio listening experience. 3A stands for Adaptive Noise Suppression (ANS), Acoustic Echo Cancellation (AEC), and Automatic Gain Control (AGC).
Real-time communication audio link
ANS
The main feature of ANS is to eliminate the background noise components in the voice signal, reduce interference, and therefore improve speech intelligibility and perceptual quality. Based on the additive noise model assumption, the audio signal captured by the microphone can be considered as a superposition of the pure voice signal and noise interference. By tracking and estimating noise in non-voice segments of the audio, and then subtracting the noise component energy in the voice segments, a clearer voice signal can be obtained.
AEC
AEC mainly addresses the echo problem in audio communication. During a call, the sound played by the speaker is directly captured by the microphone or captured after reflection, causing the remote user to hear their own voice. This can seriously affect call quality. AEC technology can process the near-end signal based on the remote reference signal, effectively eliminating or reducing this echo phenomenon, thereby enhancing the call experience.
AGC
AGC is responsible for adjusting the volume during the transmission of audio signals. When the volume of the sound source is too low or too high, it can significantly affect the call experience. AGC can automatically detect the loudness of the audio stream and dynamically adjust the volume level to keep it within a comfortable range. AGC can alleviate the volume instability caused by factors such as differences in recording device collection, speaker volume, and distance.
Use Cases
The SDK can be applied in the preprocessing of audio encoding in uplink push and the post-processing of audio decoding in downlink pull, to enhance sound quality. Currently, it supports Android, iOS, Windows, and macOS clients.
Online teaching scenario: Eliminating noise and echo enhances the clarity of sound during the teaching process.
In-game voice scenario: Equalizing loud and soft voices improves player listening experience and game experience.
Live streaming scenario: Anchor voice noise reduction and voice gain control improve the overall live streaming quality in voice chat, song rooms, and similar scenarios.
SDK API Calling Process
TSC Terminal Enhancement SDK
Product Overview
The client enhancement SDK, based on efficient image processing algorithms and AI model inference capabilities, achieves terminal video super-resolution, image quality enhancement, frame interpolation, and other features.
Details of features for each edition are as follows:
Feature Point
Standard Edition
Professional Edition
Premium Edition
Standard super-resolution
Supported
Supported
Supported
Standard super-resolution+Enhancement parameters
(Contrast/Color/Brightness)
Supported
Supported
Supported
Professional super-resolution
-
Supported
Supported
AI image quality enhancement
-
Supported
Supported
AI frame interpolation enhancement
-
-
Supported
The advantage of the Standard Edition is the performance, with our algorithms achieving good super-resolution effects at minimal time and energy consumption. It is compatible with almost all mobile phones of different performances.
Additionally, the Standard Edition offers image enhancement features, which can adjust image brightness, color saturation, and contrast.
The advantage of the Professional Edition is the effect. Using AI model inference, it can regenerate missing texture details in the original image, achieving the best image enhancement and super-resolution effects. The Professional Edition requires computational power of the device and is recommended for use on mid to high-end mobile phones.
1. Enhance terminal players to improve video playback quality and smoothness.
2. Save costs by reducing the resolution and bitrate of video distribution, and then minimize experience loss through terminal playback enhancement.
For example, in cloud gaming scenarios, the capability of real-time video super-resolution on the terminal can reduce the computational power of cloud rendering and encoding, save transmission bandwidth, and save costs. In the following example, a game scene transmitted from the cloud at 720P (5.6Mbps) is up-scaled to 1080P in real-time on the terminal. The viewing effect is close to a scene transmitted directly at 1080P (8.2Mbps) from the cloud, saving 30% of bandwidth.
SDK Integration
Compatibility
Android platform: Applicable to Android 5.0 and later (API 21, OpenGL ES 3.1).
iOS platform: Applicable to iPhone 5s and later versions of devices, with the minimum system version being iOS 12.
Package Size
Standard Edition: Android AAR is approximately 0.3 MB (arm64-v8a), and iOS Framework is approximately 0.4 MB.
Professional Edition: Android AAR is approximately 2.1 MB (Single arm64-v8a architecture), and iOS Framework is approximately 1.9 MB.
Integration Guide
Please refer to the Android and iOS integration guides.