The Smart Captions and Subtitles Function offers real-time voice recognition for video files and live streams, converting speech to subtitles in multiple languages. It's ideal for live broadcasts and international video transcription, with customizable hotwords and glossary libraries for improved accuracy.
Key features
Comprehensive Platform Support: Offers processing capabilities for on-demand files, live streams, and RTC streams. Live broadcast real-time simultaneous captioning supports steady and gradient modes, with a low barrier to integration and no need for modifications on the playback end.
High Accuracy: Utilizes large-scale models, and supports hotwords and glossary databases, achieving industry-leading accuracy.
Rich Language Variety: Supports hundreds of languages, including various dialects. Capable of recognizing mixed-language speech, such as combinations of Chinese and English.
Customizable Styles: Enables embedding open subtitles into videos, with customizable subtitle styles (font, size, color, background, position, etc.).
Scenario 1: Processing Offline Files
Method 1: Initiating a Zero-Code Task from the Console
Initiating a Task Manually
Log in to the Media Processing Service (MPS) console and click Create Task > Create VOD Processing Task.
1. Specify an input file.
You can choose a video file from a Tencent Cloud Object Storage (COS) bucket or provide a video download URL. The current subtitle generation and translation feature does not support using AWS S3 as an input file source.
2. Process the input file.
Select Create Orchestration and insert the "Smart Subtitle" node.
Recognize the English speech in the source video, translate it into Chinese, and generate an English-Chinese bilingual subtitle file.
3. Specify an output path.
Specify the storage path of the output file.
4. Initiate a task.
Click Create to initiate a task.
Automatically Triggering a Task Through the Orchestration (Optional)
If you want to upload a video file to the COS bucket and achieve automatic smart subtitles according to preset parameters, you can:
1. Enter On-demand Orchestration in the menu, click Create VDD Orchestration, select the smart subtitle node in task configuration, and configure parameters such as the bucket and directory to be triggered.
2. Go to the On-demand Orchestration list, find the new orchestration, and enable the switch at Enable. Subsequently, any new video files added to the triggered directory will automatically initiate tasks according to the preset process and parameters of the orchestration, and the processed video files will be saved to the output path configured in the orchestration.
Note:
It takes 3-5 minutes for the orchestration to take effect after being enabled.
Method 2: API Call
Method 1
Call the ProcessMedia API and initiate a task by specifying the Template ID. Example:
{
"InputInfo":{
"Type":"URL",
"UrlInputInfo":{
"Url":"https://test-1234567.cos.ap-guangzhou.myqcloud.com/video/test.mp4"// Replace it with the video URL to be processed.
}
},
"SmartSubtitlesTask":{
"Definition":122//122 is the ID of the preset Chinese source video—generate Chinese and English subtitles template, which can be replaced with the ID of a custom smart subtitle template.
},
"OutputStorage":{
"CosOutputStorage":{
"Bucket":"test-1234567",
"Region":"ap-guangzhou"
},
"Type":"COS"
},
"OutputDir":"/output/",
"Action":"ProcessMedia",
"Version":"2019-06-12"
}
Method 2
Call the ProcessMedia API and initiate a task by specifying the Orchestration ID. Example:
{
"InputInfo":{
"Type":"COS",
"CosInputInfo":{
"Bucket":"facedetectioncos-125*****11",
"Region":"ap-guangzhou",
"Object":"/video/123.mp4"
}
},
"ScheduleId":12345,//Replace it with a custom orchestration ID. 12345 is a sample code and has no practical significance.
"Action":"ProcessMedia",
"Version":"2019-06-12"
}
Note:
If there is a callback address set, see the ParseNotification document for response packets.
Subtitle Application to Videos (Optional Capability)
Call the ProcessMedia API, initiate a transcoding task, specify the vtt file path for the subtitle, and specify subtitle application styles through the SubtitleTemplate field.
Example:
{
"MediaProcessTask":{
"TranscodeTaskSet":[
{
"Definition":100040,//Transcoding template ID. It should be replaced with the transcoding template you need.
"OverrideParameter":{//Overwriting parameters that are used for flexibly overwriting some parameters in the transcoding template.
Navigate to the VOD Tasks in the console, where the list will display the tasks that have just been initiated.
When the subtask status is "Successful", clicking on View Result allows for a preview of the subtitle.
The generated VTT subtitle file can be found in Orchestration > COS Bucket > Output Bucket.
Sample Chinese-English subtitles:
Event Notification Callbacks
When initiating a media processing task with ProcessMedia, you can configure event callbacks through the TaskNotifyConfig parameter. Upon the completion of the task, the results will be communicated back to you via the configured callback information, which you can decipher using ParseNotification.
Querying Task Results by Calling an API
Call the DescribeTaskDetail API and fill in the task ID (for example, 24000022-WorkflowTask-b20a8exxxxxxx1tt110253 or 24000022-ScheduleTask-774f101xxxxxxx1tt110253) to query task results. Example:
Scenario 2: Live Streams
There are currently 2 solutions for using subtitles and translations in live streams: Enable the subtitle feature through the Cloud Streaming Services (CSS) console, or use MPS to call back text and embed it into live streams. It is recommended to enable the subtitle feature through the CSS console. The solution is introduced as follows:
Method 1: Enabling the Subtitle Feature in the CSS Console
When the transcoding stream (append the transcoding template name _transcoding template name bound with the subtitle template to the corresponding live stream's StreamName to generate a transcoding stream address) is obtained, subtitles will be displayed. For detailed rules of splicing addresses for obtaining streams, see Splicing Playback URLs.
Note:
Currently, there are 2 forms of subtitle display: real-time dynamic subtitles and delayed steady-state subtitles. For real-time dynamic subtitles, the subtitles in live broadcast will dynamically correct the content word by word based on the speech content, and the output subtitles change in real time. For delayed steady-state subtitles, the system will display the live broadcast with a delay according to the set time, but the viewing experience of the complete sentence subtitle mode is better.
Method 2: Calling Back Text through MPS
Currently, it is not supported to use the MPS console to initiate live stream smart subtitle tasks. You can initiate them through the API.
Currently, using MPS to process live streams requires the use of the Intelligent Identification template. This is achieved using Automatic Speech Recognition or speech translation.
"Definition":10101//10101 is the preset Chinese subtitle template ID, which can be replaced with the ID of a custom intelligent identification template.