Features
Last updated: 2022-08-10 15:36:52Download PDF
MPS transcodes audio/video files to different bitrates and resolutions for smooth playback on various devices with different bandwidth options. It has the following features:
Audio/Video Transcoding
Transcoding is an offline task that changes the codec, resolution, bitrate, and other characteristics of an audio/video stream to suit different playback devices and network conditions. The benefits of transcoding include:
Feature | Description |
Increased compatibility | A source video can be transcoded to formats (such as MP4) that are compatible with more types of devices for smooth playback. |
Increased bandwidth adaptability | A source video can be transcoded for output in multiple definitions such as LD, SD, HD, and UHD. End users can select the most appropriate bitrate depending on their network conditions. |
Improved playback efficiency | The moov atom can be moved from the end of an MP4 file to the beginning of the file, allowing the video to be played before it is entirely downloaded. |
Reduced bandwidth consumption | With a more advanced codec (such as H.265), the bitrate of a video can be substantially reduced while retaining the original quality, which helps reduce the bandwidth consumption. |
The parameters you can specify for transcoding include codec, resolution, bitrate, etc. For details, see the table below.
Category | Parameter | Description |
Input | Container format | 3GP, AVI, FLV, MP4, M3U8, MPG, ASF, WMV, MKV, MOV, TS, WebM, MXF |
| Video codec | AV1, AVS2, H.264/AVC, H.263, H.263+, H.265, MPEG-1, MPEG-2, MPEG-4, MJPEG, VP8, VP9, RealVideo, Windows Media Video, QuickTime |
| Audio codec | AAC, ADPCM, AMR, DSD, MP1, MP2, MP3, PCM, RealAudio, Windows Media Audio, Vorbis, AC-3 |
Output | Container format | Video: FLV, MP4, HLS (M3U8 + TS), MXF |
| | Audio: MP3, MP4, Ogg, FLAC, M4A |
| | Image: GIF, WebP |
| Video codec | H.264/AVC, H.265/HEVC, AV1 |
| Audio codec | MP3, AAC, FLAC, MP2, Vorbis |
Packaging | Delete video streams | If this is enabled, the transcoding result will contain only audio streams. |
| Delete audio streams | If this is enabled, the transcoding result will contain only video streams. |
Audio/Video Enhancement
By combining the image quality remastering and video enhancement modules with AI algorithms, MPS supports image noise removal, contour restoration, super resolution reconstruction, and other features while improving the resolution, making it suitable for various business scenarios such as UGC/PGC video quality improvement, digital remastering, and 4K video production.
Capability | Description |
Image noise removal | Removes the random noise introduced from the camera and the environment during video recording while maintaining details of the video image. |
Artifact (glitch) removal | Effectively repairs distortions caused by repeated compressions of videos during transcoding that compromise the visual quality, such as blocking artifacts, ringing artifacts, color contamination, and mosquito noise. |
Banding removal | Repairs banding and snow caused by various factors that affect the film during video recording, storage, or transfer. |
Detail enhancement | Makes the video image clearer by enhancing details which may have been compromised by the camera quality or during video saving or transcoding. |
Overall enhancement | Uses AI-based analysis to improve the overall image quality in videos by balancing image textures, removing compression artifacts, and enhancing key details. |
Super resolution | Enhances and restores details in low-resolution videos that can't meet today's requirements for a high definition. It uses an AI model to output high-resolution videos with clearer details. |
Face enhancement | Uses face detection to enhance the detail and quality of faces in the video. |
Color enhancement | Restores video color that may have been distorted due to camera problems or video storage and enhances the color to to make it more pleasing to viewers. |
Low-light enhancement | Due to the environmental conditions and the hardware limitations of the camera, the video image of certain scenes may lack brightness and contrast, leading to loss of details in dark areas. This feature automatically recognizes scenes and adaptively enhances the video image to increase details and contrast in dark image areas and improve the image quality, especially in low-light scenes. |
HDR | Converts general SDR videos to HDR videos. It can increase the color depth to 10 bits to get a wider gamut and display more color details, providing higher-quality video content. |
Frame interpolation | Adds additional video frames between the original video frames to offer a smoother visual effect, improving image quality in older videos shot at a low frame rate and reducing lag and jitter. |
Watermarking
Watermarking is an offline task that adds an image at the specified position of the video during video transcoding or screencapturing. MPS supports the following types of watermarks:
Static watermark: Non-animated watermark in PNG format, which can be the logo of a copyright owner or TV station, and is usually used as a copyright claim.
Animated watermark: Animated watermark in APNG format
MPS allows you to add multiple watermarks to a video or screenshot and specify the size and position of each watermark in the video or screenshot.
The parameters you can specify for watermarking include watermark type, aspect ratio, position, etc. For details, see the table below.
Parameter | Description |
Type | The watermark type. Watermarks can be static or animated. |
Position | The relative position of a watermark in the video. |
ImageSize | The size of the watermark in the video. |
ImageContent | Binary data of a watermark. |
Video Screencapturing
Screencapturing is an offline task that captures a screenshot of a video at a certain point in time. MPS provides the following types of screenshots:
Time point screenshot: Screenshots taken at specified time points
Sampled screenshot: Screenshots taken at regular intervals
Image sprite: MPS can capture a set of screenshots of a video (subimages) at the specified time interval and splice them together to generate a large image (i.e., an image sprite).
The parameters you can specify for screenshot taking include screenshot format, aspect ratio, etc. For details, see the table below.
Time point screenshots
Parameter | Description |
Format | The screenshot format (only JPG is supported currently) |
Width | Screenshot width (px). Value range: 128-4096 |
Height | Screenshot height (px). Value range: 128-4096 |
FillType | The fill mode ( FillType ) specifies how the source video image processed when the aspect ratio does not match the specified aspect ratio of a screenshot. The following fill modes are supported: Scale to fill: Source video images are stretched to match the aspect ratio of screenshots. This may cause images to appear distorted. Black bars: The aspect ratio of source video images is retained, and the empty spaces are painted black. White bars: The aspect ratio of source video images is retained, and the empty spaces are painted white. Gaussian blur: The aspect ratio of source video images is retained, and Gaussian blur is applied to the empty spaces. |
Sampled screenshots
Parameter | Description |
Format | The screenshot format (only JPG is supported currently) |
Width | Screenshot width (px). Value range: 128-4096 |
Height | Screenshot height (px). Value range: 128-4096 |
SampleType | How sampling intervals are measured. Sampling intervals can be measured in two ways: By percent: Intervals are measured by percent. For example, if Interval is set to 5 (%), 20 screenshots will be generated for a video. By time: Intervals are measured by time. For example, if Interval is set to 10 (sec), the number of screenshots generated will depend on the video length. |
Interval | The sampling interval. If the interval measurement ( SampleType ) is by percent, this parameter is a percent value. If interval measurement is by time, this parameter is a time value (sec). |
FillType | The fill mode ( FillType ) specifies how the source video image processed when the aspect ratio does not match the specified aspect ratio of a screenshot. The following fill modes are supported: Scale to fill: Source video images are stretched to match the aspect ratio of screenshots. This may cause images to appear distorted. Black bars: The aspect ratio of source video images is retained, and the empty spaces are painted black. White bars: The aspect ratio of source video images is retained, and the empty spaces are painted white. Gaussian blur: The aspect ratio of source video images is retained, and Gaussian blur is applied to the empty spaces. |
Image sprites
Parameter | Description |
Format | The format of the image sprite (only JPG is supported currently). |
Width | The width of the subimage in an image sprite. |
Height | The height of the subimage in an image sprite. |
Rows | The number of image rows in a sprite. |
Columns | The number of image columns in a sprite. |
SampleType | How sampling intervals are measured. Currently, only sampling by time is supported. |
Interval | The time interval for image sampling. |
Note:
The result of multiplying
Width
x Columns
(i.e., sprite width) should be within the range of 128-4096.The result of multiplying
Height
x Rows
(i.e., sprite height) should be in the range of 128-4096.Animated screenshots
Animated screenshot generating is an offline task that converts a video segment to an animated screenshot such as in GIF or WebP format. An animated screenshot is a seamless cycle of continuous frames, which can deliver an animation effect with a small file size.
The parameters you can set for animated image generation include format, width, height, frame rate, etc. For details, see the table below.
Parameter | Description |
Format | The format of the animated image (only GIF and WebP are supported currently). |
Width | The animated image width. Value range: 128–4096 px. |
Height | The animated screenshot height. Value range: 128–4096 px. |
FPS | The frame rate. Value range: 1–60 fps. |
Content Discovery
Content recognition
Based on the work of Tencent's research labs, content recognition recognizes various forms of video content such as people, speech, text, and frame tags and performs multidimensional structured analysis.
Recognition Type | Description |
Face Recognition | Quickly recognizes facial information in a video based on deep learning and locates the frames in which a person is present as well as the position of the person’s face. You can use custom person libraries or call video AI-enabled public person libraries to recognize faces. |
Speech recognition | Quickly recognizes the speech in a video and converts it to text based on deep learning. You can specify custom keywords and locate the time points in the video at which the keywords are spoken. |
Text recognition | Recognizes text in a video, including vertically oriented text, and automatically extracts keywords from the text. |
Frame tag recognition | Uses deep learning to automatically recognize tags in the video frames captured at the custom frame capturing interval, and locates the tags in the video. Frame tags are divided into nine categories, such as people, landscape, artificial object, building, plant, animal, and food, covering various aspects of daily life. You can use custom tags based on the tag system. It has transfer learning capabilities, so you can customize classifiers simply by providing the raw user data. In this way, it meets the requirements of different types of users and makes the tag system more flexible. |
Opening and ending credits recognition | Automatically recognizes and locates the time points of opening and ending credits of movies and TV series based on the video image characteristics, text, speech, and other information. |
Content analysis
Analysis Type | Description |
Category recognition | Recommends a category for the target video by analyzing the video content. Currently, it supports 19 categories, including food, travel, animation, and music. Custom categories are also supported as a paid feature. |
Video tag recognition | Intelligently recognizes top five tags that best fit the video content based on Tencent's deep learning solution. It is suitable for video recommendation and search scenarios. You can customize the number of tags to be returned in the API. |
Intelligent thumbnail | Automatically generates a file thumbnail based on characteristic information such as video image texture and scene recognition. It allows you to output static thumbnails quickly, making it easier to create thumbnails for videos and improving video click rates. |
Smart Moderation
Smart moderation includes security moderation and quality moderation.
Security moderation uses AI to detect erotic, illegal, and non-compliant content in video images, audio, and text.
Quality moderation moderates the image frames and audio quality in live and on-demand videos. It supports 13 detection types such as blurred screen, black bar, pixelization, and noise. It also moderates and scores the overall quality of the video.
Moderation Type | Detection Type | Detection Item Description |
Security moderation | Video image moderation | Moderates the video image to detect erotic and non-compliant content, specifically including: Erotic content detection `porn`: Pornographic content `vulgar`: Vulgar content `intimacy`: Content that displays intimacy `sexy`: Content that displays sexiness Illegal and non-compliant content detection `guns`: Weapons and guns `bloody`: Bloodiness `explosion`: Explosions and fires `violation_photo`: Banned icons |
| Audio moderation | Moderates the speech in the audio based on the following: Erotic content detection: Analyzes speech in the audio to detect keywords related to erotic content. Illegal and non-compliant content detection: Analyzes speech in the audio to detect keywords related to illegal and non-compliant content. |
| Text moderation | Moderates the text in video images, specifically including: Erotic content detection: Analyzes text in the video image to detect keywords related to erotic content. Illegal and non-compliant content detection: Analyzes text in the video image to detect keywords related to illegal and non-compliant content. |
Quality moderation | Image quality | Detects the following in the video image: JitterResults: Jitter BlurResults: Blur AbnormalLightingResults: Low light or overexposure CrashScreenResults: Blurred screen BlackWhiteEdgeResults: Black bar, white bar, black screen, white screen, and solid color screen durations NoiseResults: Noise MosaicResults: Pixelization QRCodeResults: QR code |
| Audio quality | Detects the following in the speech in the video: VoiceResults: Audio exceptions, including no sound, low volume level, and cracking |
Smart Editing
Based on AI and audio/video technologies developed by Tencent, smart editing comprehensively discovers video content in various dimensions and supports smart highlights generation and video splitting to assist with video content production.
Capability | Description |
Smart splitting | Performs structured analysis on the video content and intelligently splits the video into segments based on scene, speech, and text information. Currently, it is supported for news and ads. |
Smart highlights generation | Based on video temporal/spatial characteristics matching, scene recognition, target detection, and other technologies, it automatically collects video highlights in various video scenes such as soccer, basketball, PlayerUnknown's Battlegrounds, and Honor of Kings. Custom video scenes are supported on a paid basis. |
Editing and production | Allows you to clip and splice videos, convert images into videos, add roll images and text to videos, implement picture-in-picture, and edit audio. |