Real-Time Streaming (RTS) is realized based on the Web Real-Time Communication (WebRTC) signaling method. RTS supports low-latency live streaming with the help of worldwide Alibaba Cloud points of presence (POPs) and excellent scheduling algorithms of Alibaba Cloud. This topic describes the specifications of the RTS signaling protocol. This topic is intended for developers who master the basic knowledge of WebRTC.
Signaling process
The following figure shows the signaling process.
Signaling process
The client sends a request with a Session Description Protocol (SDP) offer.
Create an RTCPeerConnection object on the client, specify whether to receive or send audio and video signals, and then create an SDP offer.
// Specify whether to receive or send audio and video signals. { offerToReceiveVideo: true, offerToReceiveAudio: true }
Send a stream pulling request from the client to ApsaraVideo Live by using the HTTPS POST method. The request body is a JSON string. For more information about the request parameters, see the Definition of the RTS signaling protocol section of this topic.
NoteThe
version
parameter specifies the version of the RTS signaling protocol. Set the value to 2.The
sdk_version
parameter specifies the version of the RTS SDK. You can set the parameter as needed.
Send the constructed request to ApsaraVideo Live based on the signaling URL by using the POST method. Specify the source URL in the JSON-formatted request body.
POST /app/streamname?auth=xxx HTTP/1.1 Host: domain Connection: keep-alive Content-Length: 2205 Content-Type: application/json
NoteThe content of a signaling URL is basically the same as that of a source URL, except the protocol header. The following URLs provide examples:
Signaling URL:
https://domain/app/streamname?auth=xxx
Source URL:
artc://domain/app/streamname?auth=xxx
The server returns a response with an SDP answer.
After the server of ApsaraVideo Live verifies the request, the server generates an SDP answer and returns a response that contains the information about the live streaming node to the client. For more information about the response parameters, see the Definition of the RTS signaling protocol section of this topic.
The client initiates Interactive Connectivity Establishment (ICE).
After the client receives the response with an SDP answer, specify the session description in the RTCPeerConnection object.
peerConnection.setRemoteDescription(new RTCSessionDescription(answer.jsep));
Use the RTCPeerConnection object to initiate ICE and Datagram Transport Layer Security (DTLS) encryption. After the signaling channel is established, the client can pull streams from ApsaraVideo Live. This way, you can implement stream pulling and playback based on the standards of WebRTC.
The client initiates a disconnection.
The client sends a DTLS alert message that initiates a disconnection to stop stream ingest or playback.
Sample code for the HTML5 player
// Create peer connection and local offer sdp.
peerConnection = new RTCPeerConnection();
peerConnection.onicecandidate = iceCandidateCallback;
peerConnection.ontrack = remoteStreamCallback;
peerConnection.createOffer({ offerToReceiveVideo: true, offerToReceiveAudio: true })
.then(signaling_pull).catch(errorHandler);
// CDN live post pull stream request.
function signaling_pull(offer_sdp) {
console.log('local offer sdp', offer_sdp);
peerConnection.setLocalDescription(offer_sdp).then(function() {
// Get pull stream url.
var stream_url = $("#stream_url").val();
console.log("stream url:" , stream_url);
// Add sdk and protocol versions.
var protocol_version = 2;
var sdk_version = "0.0.1";
$.ajax({url: stream_url, data: JSON.stringify({
mode: "live",
version: protocol_version,
sdk_version: sdk_version,
jsep:description,
}),
type: "post",
success:function(result){
var signal = JSON.parse(result);
peerConnection.setRemoteDescription(new RTCSessionDescription(signal.jsep)).then(function() {
console.log("get remote answer sdp: ", signal.jsep.sdp);
}).catch(errorHandler);
}});
}).catch(errorHandler);
}
Definition of the RTS signaling protocol
The RTS signaling protocol establishes a short-lived connection based on HTTPS. The protocol uses messages in the JSON format. This section describes the request, response, and error codes based on the RTS signaling protocol.
Sample request
Request:
{
"version":2,
"sdk_version":"0.0.1",
"mode":"live",
"pull_streams":[
{
"url":"artc://demo.aliyundoc.com/liveApp****/liveStream****",
"amsid":[
"rts audio"
],
"vmsid":[
"rts video"
]
}
],
"jsep":{
"type":"offer",
"sdp":"v=0\n\ro=- 6839248142876176651 2 IN IP4 127.0.0.1\n\rs=-\n\r Omitted content"
}
}
Parameter | Type | Required | Description |
mode | string | Yes | The mode of the stream. In this example, set the parameter to live. |
version | int | Yes | The version of the protocol. In this example, set the parameter to 2. |
push_stream | string | No | The ingest URL. |
pull_streams | []object | No | The stream that you want to pull. You can pull multiple streams at a time. For more information about the attributes of the pull_stream parameter, see the following table. |
sdk_version | string | No | The version of the SDK. |
jsep.type | string | Yes | The type of the SDP message. In this example, set the parameter to offer. |
jsep.sdp | string | Yes | The description of the SDP message. |
Attribute | Type | Required | Description |
url | string | Yes | The source URL that starts with |
amsid | []string | Yes | The media stream ID (MSID) of the audio stream that you want to pull. In this example, set the parameter to |
vmsid | []string | Yes | The MSID of the video stream that you want to pull. In this example, set the parameter to |
Sample success response
Response:
{
"trace_id":"2_1591173296_101.227.XX.XX_702080732320_dec327eb6eed0e0b07b349c8a565****",
"code":200,
"jsep":{
"type":"answer",
"sdp":"v=0\r\no=- 1591173291 2 IN IP4 127.0.0.1\n\r Omitted content"
}
}
Parameter | Type | Required | Description |
code | int | Yes | The HTTP status code. If the request is successful, the code 200 is returned. For more information about status codes, see the "Status codes" section. |
trace_id | string | Yes | The globally unique ID (GUID) of the request. The GUID is generated by Alibaba Cloud CDN and can be used to troubleshoot issues. Keep the GUID properly. |
jsep.type | string | Yes | The type of the SDP message. In this example, the value answer is returned. |
jsep.sdp | string | Yes | The description of the SDP message that is generated when POPs pull streams from the origin. |
Status code | Description |
403 | Indicates that the authentication failed. |
404 | Indicates that the stream does not exist. |
611 | Indicates that the client must play the stream over TCP. |
302 | Indicates that the client must send the request to a new address. |
Enhanced SDP negotiation
Messages are exchanged in the SDP format during signaling. SDP negotiation is generally based on RFC 4566. RTS expands more semantics to make the negotiation compatible with the characteristics of the live streaming industry. RTS supports more container formats of videos and audio and more communications protocols. This way, RTS resolves the issue that WebRTC supports only the Opus format for audio and does not support B-frames. RTS meets the needs of increasing streaming protocols.
AAC audio supported
RTS can transmit audio in various AAC formats over RTMP. The AAC formats include AAC-LC, HE-AACv1, and HE-AACv2. For more information about AAC formats, see ISO IEC 14496-3.
RTS can transmit audio in AAC formats by using the Low-overhead MPEG-4 Audio Transport Multiplex (LATM) container format. LATM determines whether the encoding information about audio is transmitted in in-band or out-of-band mode based on whether the audio contains the encoding information. In-band transmission sends the encoding information for each audio frame. Out-of-band transmission sends the encoding information only once. The muxconfigPresent parameter in an AudioMuxElement array specifies whether the information in AudioSpecificConfig is transmitted in in-band or out-of-band mode. Therefore, LATM is more flexible than Audio Data Transport Stream (ADTS). If the information in AudioSpecificConfig remains unchanged, the information in StreamMuxConfig can be first transmitted in an SDP message.
During signaling, RTS parses the encoding information during audio stream ingest and returns the parsed information in the negotiation response, as shown in the following code.
SDP offer | SDP answer | ||
AAC-LC | HE-AACv1 | HE-AACv2 | |
|
|
|
|
|
|
|
If SBR-enabled=1
is added in the fmtp attribute of MP4A-LATM, the AAC format is AAC-HE. If SBR-enabled=1
and PS-enabled=1
are added, the AAC format is HE-AACv2. The AAC format is evolved from AAC-LC to HE-AACv2. Therefore, the SBR and PS fields can be used in the fmtp attribute to indicate AAC formats. In addition, you can add config=StreamMuxConfig
in the fmtp attribute. StreamMuxConfig is obtained from AudioSpecificConfig of the ingested stream and contains parameters that are related to the details of the encoding information. The client can obtain the details as needed.
For more information, see AAC-LC / HE-AACv1 / HE-AACv2 Encoder Parameters.
H.265 videos supported
RTS parses the encoding information of videos in the H.264 or H.265 format during stream ingest and returns the information about the videos in the H.264
or H.265
format in the SDP answer.
Encoding format | SDP offer | SDP answer |
H.265 |
|
|
Videos that contain B-frames supported
During signaling, the client can add a field in the SDP offer to specify whether to decode videos that contain B-frames. For example, if the client adds BFrame-enabled = 1
in the fmtp attribute, the client can decode videos that contain B-frames. In this case, RTP timestamp = PTS
can be added, which means that the client decodes each frame based on the increasing sequence number. If videos that contain B-frames are not supported, RTS can transcode the source streams to remove B-frames.
In addition, the server can return a composition timestamp (CTS). This allows the client to calculate the decoding timestamp (DTS) based on the following formula: Presentation timestamp (PTS) = DTS + CTS. If an SDP offer contains a=extmap:{$id} uri:webrtc:rtc:rtp-hdrext:video:CompositionTime
, RTS adds extension identifier = {$id}
to the first Real-time Transport Protocol (RTP) packet of each video frame. The value of the id
variable is determined by the SDP offer that is sent by the client. The following figures show the partial content of the SDP offer and the packet capture during stream pulling:
RTS allows the client to determine whether to decode videos that contain B-frames and whether to return CTS information. This ensures general capabilities in communications.
MSID mechanism
For more information about MSID, see The Msid Mechanism.