WebSocket Protocol for Recognition
Last updated: 2025-07-23 14:58:54Download PDF
WebSocket URL Format
The URL format is as follows:
wss://mps.cloud.tencent.com/wss/v1/<appid>?{request parameters}
<appid>
is the unique identifier (UInt64) of a Tencent Cloud user account. It can be obtained on the Account Center > Account Information page in the console.
The request parameter format is as follows:
key1=value2&key2=value2...(Both keys and values should be in the URL-encoded format.)
Request parameters are listed in the table below:
Name | Type | Required | Description | Example |
asrDst | string | No | Language for Automatic Speech Recognition (ASR). | zh |
transSrc | string | No | Source language for translation. | zh |
transDst | string | No | Target language for translation. | en |
fragmentNotify | int | No | 0: steady mode notification; 1: non-steady mode notification. Default value: 0. | 0 |
resultType | int | No | Whether to retain the punctuation at the end. 0: delete. 1: retain. Default value: 1. | 1 |
timeStamp | uint | Yes | Current Unix timestamp. Unit: seconds. | 1750217009 |
expired | uint | Yes | Expiration Unix timestamp. Unit: seconds. | 1750220609 |
timeoutSec | uint | No | Timeout. Unit: seconds. The connection is interrupted if no audio data has been received for a long time. Default value: 120. Maximum value: 300. | 120 |
secretId | string | Yes | Key ID. | - |
nonce | uint | Yes | 10-bit random integer. | 7549145852 |
signature | string | Yes | Generated signature. | - |
Note:
If
asrDst
is not left blank, transSrc
and transDst
do not take effect. At this point, only ASR is performed, and subtitles in the source language are generated.If
asrDst
is left blank, transSrc
and transDst
cannot be empty. At this point, both ASR and subtitle translation are performed.The value of
fragmentNotify
is 0 by default.Signature Generation
For example, sign the following URL:
wss://mps.cloud.tencent.com/wss/v1/1258344699?asrDst=zh&expired=1750220609&fragmentNotify=0&nonce=7549145852&secretId=<sid>&timeStamp=1750217009
<sid>
is the key ID.Step 1: Concatenating a Canonical Request String
CanonicalRequest =HTTPRequestMethod + '\n' +CanonicalURI + '\n' +CanonicalQueryString + '\n' +CanonicalHeaders + '\n' +SignedHeaders + '\n'
Field Name | Explanation |
HTTPRequestMethod | The value is fixed as post. |
CanonicalURI | URI parameter path. Format: /wss/v1/<appid>, where <appid> is the user's AppId. For example, /wss/v1/1258344699 . |
CanonicalQueryString | Query string in the URL of the initiated HTTP request. For example, asrDst=zh&expired=1750220609&fragmentNotify=0&nonce=7549145852&secretId=<sid>&timeStamp=1750217009 The parameters should be sorted alphabetically. Note: CanonicalQueryString should be in URL-encoded format as described in RFC 3986. (Special characters should be in uppercase after encoding.) |
CanonicalHeaders | Format: content-type:application/json;charset=utf-8\nhost:<host>\n. <host> is generally a domain name, such as mps.cloud.tencent.com . |
SignedHeaders | The value is fixed as content-type;host. |
Step 2: Concatenating a String for Signing
StringToSign =Algorithm + "\n" +RequestTimestamp + "\n" +CredentialScope + "\n" +HashedCanonicalRequest
Field Name | Explanation |
Algorithm | Signature algorithm. The value is currently fixed as TC3-HMAC-SHA256. |
RequestTimestamp | Timestamp in the URL. For example, 1750217009. |
CredentialScope | Credential scope. Format: <date>/mps/tc3_request. <date> is a date in UTC format, such as 2025-06-18. For example, 2025-06-18/mps/tc3_request . |
HashedCanonicalRequest | Hash value of the canonical request string concatenated in the previous step. Pseudocode for calculation: Lowercase(HexEncode(Hash.SHA256(CanonicalRequest))). |
Step 3: Calculating the Signature
1. Calculating the Derived Signature Key
The pseudocode is as follows:
SecretKey = "********************************"SecretDate = HMAC_SHA256("TC3" + SecretKey, Date)SecretService = HMAC_SHA256(SecretDate, Service)SecretSigning = HMAC_SHA256(SecretService, "tc3_request")
Field Name | Explanation |
SecretKey | Original SecretKey, which is masked with asterisks (*). |
Date | Information in the <date> field of CredentialScope. |
Service | The value is fixed as mps. |
2. Calculating the Signature
The pseudocode is as follows:
signature = HexEncode(HMAC_SHA256(SecretSigning, StringToSign))
The final generated URL is as follows:
wss://mps.cloud.tencent.com/wss/v1/1258344699?asrDst=zh&expired=1750220609&fragmentNotify=0&nonce=7549145852&secretId=<sid>&timeStamp=1750217009&signature=<signature>
This URL is used to establish a WebSocket persistent connection.
WebSocket Handshake Phase
After a WebSocket connection is established, the server performs the check and authentication and then returns the handshake result in JSON text message format. Example:
{"Code":0, //0: successful; values other than 0: failed."Message":"success", //Returned message."TaskId":"RnKu9FODFHK5FPpsrN" //Task ID, which is a unique identifier.}
Name | Type | Required | Description |
Code | int | Yes | 0: successful; values other than 0: failed. |
Message | string | Yes | Returned message. |
TaskId | string | Yes | Task ID, which is a unique identifier. |
Error Code
Error Code | Description |
0 | Successful |
4001 | Invalid parameter. |
4002 | Timed out. It is usually because no audio data has been received successfully for a long time. The default timeout is 2 minutes. It can be specified with a parameter. |
4003 | The format of the upload audio is invalid. |
4004 | The number of persistent connections that exist at the same time exceeds the limit, which is 2 by default. |
4005 | Invalid user status. It is usually because the user account is in arrears. |
4100 | Identity verification failed. |
4101 | Unauthorized access to APIs. |
4102 | Unauthorized access to resources. |
4104 | The SecretId does not exist. |
4105 | Incorrect session ID. |
4106 | MFA authentication failed. |
4110 | Authentication failed. |
4111 | Invalid AppId. |
4500 | Replay attacks, which are usually caused by QPS exceeding the limit. Such attacks may occur when too many WebSocket connections are established for the same AppId in a short time. |
5000 | Internal error. |
Audio Upload
After authentication is successful, the server receives audio data pushed by the client as a binary message. The message definition is shown in the table below and uses the network byte order.
Field | Type | Length | Description |
format | uint8 | 1 byte | Audio format. |
IsEnd | uint8 | 1 byte | 1: The user has no audio data during the follow-up period, and the recognition result is forcibly refreshed. 0: The user has audio data during the follow-up period. |
timeStamp | uint64 | 8 bytes | Timestamp. Unit: ms. |
userIdLen | uint16 | 2 bytes | User ID length. |
userId | string | Same as the value of userIdLen | User ID. It identifies an audio source in the a connection. |
extLen | uint16 | 2 bytes | Extension length. Default value: 0. |
extData | char[] | Same as the value of extLen | Extended data for future expansion. |
Audio | char[] | Other data in the binary message | Audio data. |
Note:
The value of
format
can only be 1 currently, indicating PCM 16-kHz s16 (16-bit) single-channel.Recognition Result Sending
After the recognition result is output, the server sends the recognition result in JSON text message format.
Translation Result Notification
{"Response": {"NotificationType": "AiRecognitionResult","TaskId": "1258344699-wsssubtitle-d482fa50-5e1c-4c5c-b5b5-1083430e0d54","AiRecognitionResultInfo": {"ResultSet": [{"Type": "TransTextRecognition","TransTextRecognitionResultSet": [{"Text":"How to ensure that global users can enjoy high-definition and smooth video content.""Trans": "How to ensure that global users can enjoy high-definition and smooth video content.","StartPtsTime": 0.2,"EndPtsTime": 4.6,"Confidence": 100,"SteadyState": true,"StartTime": "2025-06-18T12:01:54Z","EndTime": "2025-06-18T12:01:58Z","UserId": "123456"}]}]}}}
Recognition Result Notification
If only ASR is performed without translation, the result notification only contains
Text
, without Trans
.{"Response": {"NotificationType": "AiRecognitionResult","TaskId": "1258344699-wsssubtitle-ce42ecfe-0f70-4244-91e0-07e6c20a5ab1","AiRecognitionResultInfo": {"ResultSet": [{"Type": "AsrFullTextRecognition","AsrFullTextRecognitionResultSet": [{"Text":"How to ensure that global users can enjoy high-definition and smooth video content.""StartPtsTime": 0.2,"EndPtsTime": 4.6,"Confidence": 100,"SteadyState": true,"StartTime": "2025-06-18T12:00:41Z","EndTime": "2025-06-18T12:00:45Z","UserId": "123456"}]}]}}}
Ending Notification
{"Response": {"NotificationType": "ProcessEof","TaskId":"1258344699-wsssubtitle-033a7ae4-50ef-4d1f-a73f-0e51a28d3a68","ProcessEofInfo": {"ErrCode": 4002,"Message": "data timeout"}}}
Field Description
Name | Description |
NotificationType | Valid values: AiRecognitionResult and ProcessEof. |
TaskId | Task ID. |
Type | Notification type. Valid values: AsrFullTextRecognition, TransTextRecognition, and ProcessEof. |
Text | Recognized text. |
StartPtsTime | Start timestamp. Unit: seconds. It corresponds to the timeStamp field in Audio Upload. |
EndPtsTime | End timestamp. Unit: seconds. It corresponds to the timeStamp field in Audio Upload. |
StartTime | Time when the server receives the audio packet. It is a UTC time. |
EndTime | End time. It is a UTC time. |
Confidence | Confidence. Value range: 0–100. |
SteadyState | Steady status flag. It indicates that the result will not change. |
UserId | User ID. It corresponds to userId in Audio Upload. |
ErrCode | Same as the error code of the handshake phase. Only 4002 and 4003 are usually used. |
Message | Returned message. |