Overview
Menu

Terminal SDK integration

Last updated: 2024-10-15 12:06:40Download PDF
The terminal SDK is a suite of audio and video terminal product capabilities launched by Tencent Cloud. It encompasses three types of SDKs for video encoding, audio enhancement, and video enhancement. Tailored to meet diverse customer needs, it supports access from multiple terminals such as mobile, web, and PC.




Terminal Video Codec SDK

Tencent's Top Speed Codec (TSC) terminal video encoder is designed for scenarios requiring low computational power, low latency, and high-quality image on the terminal side. Compared with hardware encoding, its advantages include:
1. Stable, reliable, and quick to start.
2. At the same quality level, it saves bitrate, enhances transmission stability, reduces downlink distribution bandwidth, and saves on storage costs.
3. At the same bitrate, it improves image quality and enhances user experience.
4. A rich set of features meets diverse business needs, such as using Regions of Interest (ROI) encoding to improve the image quality in the face region and dynamically adjusting encoding configuration to adapt to network fluctuations.

Terminal Audio SDK

The client audio SDK provides audio encoding and enhancement capabilities. It achieves effects including adaptive noise suppression, acoustic echo cancellation, and automatic gain control, significantly improving audio quality by eliminating echo and noise.
For details, see TSC Terminal Audio SDK.

Terminal Enhancement SDK

The client enhancement SDK, based on efficient image processing algorithms and AI model inference capabilities, achieves terminal video super-resolution, image quality enhancement, frame interpolation, and other features.

TSC Terminal Video Codec SDK

Product Overview

Compared with Video on Demand (VOD) and Cloud Streaming Services (CSS) encoding, terminal-side encoding requires different solutions.
Encoding Mode
VOD
CSS
Terminal-side Codec
Typical Business
WeTV, video account, and other mainstream on-demand services
Video account live streaming, Tencent sports live streaming, and other mainstream live streaming services
VooV Meeting, WeChat video call, and 5G remote control services
Latency Requirements
Pursues an extreme compression rate, with no latency requirements.
Pursues a high compression rate, allowing second-level latency.
Pursues a high compression rate while requiring zero latency.
Real-Time Requirements
Pursues an extreme compression rate, with no real-time requirements.
Allows multi-frame average real-time under multi-threading.
Requires real-time encoding under single-threading.
Network Condition Constraints
Encoding process is unrelated to network status, with fixed encoding configuration.
Encoding process is unrelated to network status, with fixed encoding configuration.
Encoding process is strongly related to network status, requiring dynamic adjustment of encoding configuration based on network status.
Scenario Characteristics
1 -> N, no interaction
1 -> N, no interaction
N <-> N, strong interaction
Solution
Server-side encoding
Server-side encoding
Terminal-side encoding
Tencent's Top Speed Codec (TSC) terminal video encoder is designed for scenarios requiring low computational power, low latency, and high-quality image on the terminal side. Compared with hardware encoding, its advantages include:
1. Stable, reliable, and quick to start.
2. At the same quality level, it saves bitrate, enhances transmission stability, reduces downlink distribution bandwidth, and saves on storage costs.
3. At the same bitrate, it improves image quality and enhances user experience.
4. A rich set of features meets diverse business needs, such as using Regions of Interest (ROI) encoding to improve the image quality in the face region and dynamically adjusting encoding configuration to adapt to network fluctuations.

SDK Access Process




1. Evaluation and Trial: Customers provide system platform and demand information, and apply for product trial.
System platform: Android, iOS, Windows, macOS, etc.
Use cases: live streaming, VOD
Encoding specification: encoding format, resolution, frame rate, bitrate, latency requirements, etc.
Optimization objectives: bitrate savings, image quality enhancement, CPU savings, and respective assessment metrics (PSNR, SSIM, VMAF, etc)
2. Development and Integration: Integrate the beta version of the SDK into the app, for performance evaluation and custom optimization.
Based on customer effect evaluation results and specific business scenario needs, provide in-depth optimization support.
3. Launch and Release: Apply for a license, integrate the official version of the SDK with license authorization, and test and launch the app.
If the license is about to expire or has expired, you can apply for a license renewal.

SDK Integration

The video codec SDK is implemented in C/C++/Assembly, providing a unified C interface for various system platforms.

Android

● Provides ARMv7 and ARMv8 version dynamic libraries, and the application is integrated via NDK.
● Provides Java interface encapsulation. The interface is basically consistent with Android's hardware encoding MediaCodec, facilitating parallel replacement of MediaCodec.

iOS

Provides ARMv8 and x86_64 version XCFramework.

macOS

Provides ARMv8 and x86_64 version framework.

Windows

Provides x86 and x86_64 version dynamic libraries.

Basic Video Encoding Process





TSC Terminal Audio SDK

Product Overview

The client audio SDK provides audio encoding and enhancement capabilities, significantly improving audio quality by eliminating echo and noise.
Details of features for each edition are as follows:
Feature Point
Standard Edition
Professional Edition
Premium Edition
Acoustic Echo Cancellation
Supported
Supported
Supported
Automatic Gain Control
Supported
Supported
Supported
Adaptive Noise Suppression
Supported
Supported
Supported
Echo Cancellation Music Mode
-
Supported
Supported
Volume Equalization
-
Supported
Supported
AI Intelligent Noise Reduction
-
Supported
Supported
Audio Encoding
-
-
Supported
AI Codec
-
-
Supported

Real-Time Communication Audio 3A

Audio 3A technology is a set of basic features in sound signal processing, commonly used in real-time communication systems such as video conferencing, calls, and live microphone connections, to ensure high-quality audio signal transmission, and provide better communication quality and audio listening experience. 3A stands for Adaptive Noise Suppression (ANS), Acoustic Echo Cancellation (AEC), and Automatic Gain Control (AGC).
Real-time communication audio link
Real-time communication audio link

ANS
The main feature of ANS is to eliminate the background noise components in the voice signal, reduce interference, and therefore improve speech intelligibility and perceptual quality. Based on the additive noise model assumption, the audio signal captured by the microphone can be considered as a superposition of the pure voice signal and noise interference. By tracking and estimating noise in non-voice segments of the audio, and then subtracting the noise component energy in the voice segments, a clearer voice signal can be obtained.
AEC
AEC mainly addresses the echo problem in audio communication. During a call, the sound played by the speaker is directly captured by the microphone or captured after reflection, causing the remote user to hear their own voice. This can seriously affect call quality. AEC technology can process the near-end signal based on the remote reference signal, effectively eliminating or reducing this echo phenomenon, thereby enhancing the call experience.
AGC
AGC is responsible for adjusting the volume during the transmission of audio signals. When the volume of the sound source is too low or too high, it can significantly affect the call experience. AGC can automatically detect the loudness of the audio stream and dynamically adjust the volume level to keep it within a comfortable range. AGC can alleviate the volume instability caused by factors such as differences in recording device collection, speaker volume, and distance.

Use Cases

The SDK can be applied in the preprocessing of audio encoding in uplink push and the post-processing of audio decoding in downlink pull, to enhance sound quality. Currently, it supports Android, iOS, Windows, and macOS clients.




Online teaching scenario: Eliminating noise and echo enhances the clarity of sound during the teaching process.
In-game voice scenario: Equalizing loud and soft voices improves player listening experience and game experience.
Live streaming scenario: Anchor voice noise reduction and voice gain control improve the overall live streaming quality in voice chat, song rooms, and similar scenarios.

SDK API Calling Process






TSC Terminal Enhancement SDK

Product Overview

The client enhancement SDK, based on efficient image processing algorithms and AI model inference capabilities, achieves terminal video super-resolution, image quality enhancement, frame interpolation, and other features.
Details of features for each edition are as follows:
Feature Point
Standard Edition
Professional Edition
Premium Edition
Standard super-resolution
Supported
Supported
Supported
Standard super-resolution+Enhancement parameters
(Contrast/Color/Brightness)
Supported
Supported
Supported
Professional super-resolution
-
Supported
Supported
AI image quality enhancement
-
Supported
Supported
AI frame interpolation enhancement
-
-
Supported






The advantage of the Standard Edition is the performance, with our algorithms achieving good super-resolution effects at minimal time and energy consumption. It is compatible with almost all mobile phones of different performances.
Additionally, the Standard Edition offers image enhancement features, which can adjust image brightness, color saturation, and contrast.
The advantage of the Professional Edition is the effect. Using AI model inference, it can regenerate missing texture details in the original image, achieving the best image enhancement and super-resolution effects. The Professional Edition requires computational power of the device and is recommended for use on mid to high-end mobile phones.

Performance

Standard super-resolution
System
Device Model
Device Configuration
Basic Super-Resolution Parameter
CPU
(%)
Memory
(MB)
Frame Rate
GPU
(%)
Power Consumption
(mAh)
Android
HUAWEI Mate50 (2022)
Chip: Snapdragon 8+ Gen1 CPU: 3.0 GHz GPU: Adreno 730 Battery: 4272.8 mAh
720P - Off
2.8
48
59.9
5
138.01
720P x 1.5
3
64
60.4
10
196.55
576P x 1.25
3
60.1
59.9
7
/
4K x 1.25
3
163.2
59.9
46.4
/
Android
Sony Xperia 5 II (2020)
Chip: Snapdragon 865 CPU: 2.84 GHz GPU: Adreno 650 Battery: 3104 mAh
720P - Off
1
135.9
59.1
4
133.78
720P x 1.5
2
146.8
59.2
10
152.41
576P x 1.25
2
139.2
59.2
6
/
4K x 1.25
2
311.2
59.2
46.7
/
Android
Xiaomi 6 (2017)
Chip: Snapdragon 835 CPU: 2.45 GHz GPU: Adreno 540
720P x 1.5
2.9
119
60
18.9
/
Android
Redmi Note 4 (2016)
Chip: MediaTek MT6797 Helio X20 CPU: MT6797 2.0 GHz GPU: ARM Mali-T880
720P x 1.5
9.4
137.9
60.6
74.5
/
Android
Honor 8 Youth Edition (2016, budget phone)
Chip: HiSilicon Kirin 655 CPU: HI6250 2.3 GHz GPU: ARM Mali-T830
720P - Off
2
77
58.8
Not supported
/
720P x 1.5
2
83.4
58.1
Not supported
/
iOS
iPhone 13 (2021)
CPU: 3.23 GHz GPU: quad-core Battery: 3065.65 mAh
720P - Off
5.9
54.4
59.5
15.9
64.99
720P x 1.5
6
63.8
59.5
24
88.29
576P x 1.25
4.7
57.3
59.5
18.9
/
4K x 1.25
9.2
162.2
59.5
60.6
/
iOS
iPhone 6P (2014)
CPU: Apple A9 GPU: PowerVR GT7600
720P - Off
13
40.5
59.5
22.8
/
720P x 1.5
18.8
49.4
59.6
50.2
/
Professional super-resolution
System
Device Model
Device Configuration
Professional Super-Resolution Parameter
CPU
(%)
Memory
(MB)
Frame Rate
GPU
(%)
Power Consumption
(mAh)
Android
HUAWEI Mate50 (2022)
Chip: Snapdragon 8+ Gen1 CPU: 3.0 GHz GPU: Adreno 730 Battery: 4272.8 mAh
720P - Off
3
66
60
3
138.01
720P x 1.5
13
123
48
10
342.9
576P x 1.25
13
105
60
7
333.13
540P x 2
13
105
60
11
322.73
Android
Sony Xperia 5 II (2020)
Chip: Snapdragon 865 CPU: 2.84 GHz GPU: Adreno 650 Battery: 3104 mAh
720P - Off
1
142
59.1
3
133.78
720P x 1.5
13
196
39
8
294.06
576P x 1.25
13
148
58
8
/
540P x 2
13
159
40
7
/
iOS
iPhone 13 (2021)
CPU: 3.23 GHz GPU: quad-core Battery: 3065.65 mAh
720P - Off
6
73
60
14
64.99
720P x 1.5
15
94
40
14
/
576P x 1.25
10
84
60
16
/
540P x 2
9
76
60
21
/
AI image quality enhancement
System
Device Model
Device Configuration
Professional Enhancement Resolution
CPU
(%)
Memory
(MB)
Frame Rate
GPU
(%)
Android
HUAWEI Mate50 (2022)
Chip: Snapdragon 8+ Gen1 CPU: 3.0 GHz GPU: Adreno 730 Battery: 4272.8 mAh
720P
13
140
55
7
576P
13
126
74
5
540P
13
130
78
7
Android
Sony Xperia 5 II (2020)
Chip: Snapdragon 865 CPU: 2.84 GHz GPU: Adreno 650 Battery: 3104 mAh
720P
13
184
41
5
576P
13
174
59
5
540P
13
142
43
4
iOS
iPhone 13 (2021)
CPU: 3.23 GHz GPU: quad-core Battery: 3065.65 mAh
720P
17
91
40
11
576P
12
70
60
11
540P
9
68
60
11

Use Cases

1. Enhance terminal players to improve video playback quality and smoothness.



2. Save costs by reducing the resolution and bitrate of video distribution, and then minimize experience loss through terminal playback enhancement.



For example, in cloud gaming scenarios, the capability of real-time video super-resolution on the terminal can reduce the computational power of cloud rendering and encoding, save transmission bandwidth, and save costs. In the following example, a game scene transmitted from the cloud at 720P (5.6Mbps) is up-scaled to 1080P in real-time on the terminal. The viewing effect is close to a scene transmitted directly at 1080P (8.2Mbps) from the cloud, saving 30% of bandwidth.

SDK Integration

Compatibility

Android platform: Applicable to Android 5.0 and later (API 21, OpenGL ES 3.1).
iOS platform: Applicable to iPhone 5s and later versions of devices, with the minimum system version being iOS 12.

Package Size

Standard Edition: Android AAR is approximately 0.3 MB (arm64-v8a), and iOS Framework is approximately 0.4 MB.
Professional Edition: Android AAR is approximately 2.1 MB (Single arm64-v8a architecture), and iOS Framework is approximately 1.9 MB.

Integration Guide

Please refer to the Android and iOS integration guides.