Specifications of the RTS signaling protocol - - Alibaba Cloud Documentation Center

Real-Time Streaming (RTS) is realized based on the Web Real-Time Communication (WebRTC) signaling method. RTS supports low-latency live streaming with the help of worldwide Alibaba Cloud points of presence (POPs) and excellent scheduling algorithms of Alibaba Cloud. This topic describes the specifications of the RTS signaling protocol. This topic is intended for developers who master the basic knowledge of WebRTC.

Signaling process

The following figure shows the signaling process.

Signaling process

The client sends a request with a Session Description Protocol (SDP) offer.
1. Create an RTCPeerConnection object on the client, specify whether to receive or send audio and video signals, and then create an SDP offer.
```
// Specify whether to receive or send audio and video signals.
{ offerToReceiveVideo: true, offerToReceiveAudio: true }
```
2. Send a stream pulling request from the client to ApsaraVideo Live by using the HTTPS POST method. The request body is a JSON string. For more information about the request parameters, see the Definition of the RTS signaling protocol section of this topic.
  Note
  - The version parameter specifies the version of the RTS signaling protocol. Set the value to 2.
  - The sdk_version parameter specifies the version of the RTS SDK. You can set the parameter as needed.
3. Send the constructed request to ApsaraVideo Live based on the signaling URL by using the POST method. Specify the source URL in the JSON-formatted request body.
```
POST /app/streamname?auth=xxx HTTP/1.1
Host: domain
Connection: keep-alive
Content-Length: 2205
Content-Type: application/json
```
  Note
  The content of a signaling URL is basically the same as that of a source URL, except the protocol header. The following URLs provide examples:
  - Signaling URL: https://domain/app/streamname?auth=xxx
  - Source URL: artc://domain/app/streamname?auth=xxx
The server returns a response with an SDP answer.
After the server of ApsaraVideo Live verifies the request, the server generates an SDP answer and returns a response that contains the information about the live streaming node to the client. For more information about the response parameters, see the Definition of the RTS signaling protocol section of this topic.
The client initiates Interactive Connectivity Establishment (ICE).
1. After the client receives the response with an SDP answer, specify the session description in the RTCPeerConnection object.
```
peerConnection.setRemoteDescription(new RTCSessionDescription(answer.jsep));
```
2. Use the RTCPeerConnection object to initiate ICE and Datagram Transport Layer Security (DTLS) encryption. After the signaling channel is established, the client can pull streams from ApsaraVideo Live. This way, you can implement stream pulling and playback based on the standards of WebRTC.
The client initiates a disconnection.
The client sends a DTLS alert message that initiates a disconnection to stop stream ingest or playback.

Sample code for the HTML5 player

// Create peer connection and local offer sdp.
peerConnection = new RTCPeerConnection();
peerConnection.onicecandidate = iceCandidateCallback;
peerConnection.ontrack = remoteStreamCallback;
peerConnection.createOffer({ offerToReceiveVideo: true, offerToReceiveAudio: true })
      .then(signaling_pull).catch(errorHandler);


// CDN live post pull stream request.
function signaling_pull(offer_sdp) {
  console.log('local offer sdp', offer_sdp);

  peerConnection.setLocalDescription(offer_sdp).then(function() {
    // Get pull stream url.
    var stream_url = $("#stream_url").val();
    console.log("stream url:" , stream_url);

    // Add sdk and protocol versions.
    var protocol_version = 2;
    var sdk_version = "0.0.1";

    $.ajax({url: stream_url, data: JSON.stringify({
          mode: "live",
          version: protocol_version,
          sdk_version: sdk_version,
          jsep:description,
      }),
      type: "post",
      success:function(result){
          var signal = JSON.parse(result);
          peerConnection.setRemoteDescription(new RTCSessionDescription(signal.jsep)).then(function() {
              console.log("get remote answer sdp: ", signal.jsep.sdp);
          }).catch(errorHandler);
      }});
  }).catch(errorHandler);
}

Definition of the RTS signaling protocol

The RTS signaling protocol establishes a short-lived connection based on HTTPS. The protocol uses messages in the JSON format. This section describes the request, response, and error codes based on the RTS signaling protocol.

Sample request

Request:
{
    "version":2,
    "sdk_version":"0.0.1",
    "mode":"live",
    "pull_streams":[
        {
            "url":"artc://demo.aliyundoc.com/liveApp****/liveStream****",
            "amsid":[
                "rts audio"
            ],
            "vmsid":[
                "rts video"
            ]
        }
    ],
    "jsep":{
        "type":"offer",
        "sdp":"v=0\n\ro=- 6839248142876176651 2 IN IP4 127.0.0.1\n\rs=-\n\r Omitted content"
    }
}

Parameter	Type	Required	Description
mode	string	Yes	The mode of the stream. In this example, set the parameter to live.
version	int	Yes	The version of the protocol. In this example, set the parameter to 2.
push_stream	string	No	The ingest URL.
pull_streams	[]object	No	The stream that you want to pull. You can pull multiple streams at a time. For more information about the attributes of the pull_stream parameter, see the following table.
sdk_version	string	No	The version of the SDK.
jsep.type	string	Yes	The type of the SDP message. In this example, set the parameter to offer.
jsep.sdp	string	Yes	The description of the SDP message.

Table 1. Attributes of the pull_stream parameter
Attribute	Type	Required	Description
url	string	Yes	The source URL that starts with `artc://<Source URL>`.
amsid	[]string	Yes	The media stream ID (MSID) of the audio stream that you want to pull. In this example, set the parameter to `rts audio`.
vmsid	[]string	Yes	The MSID of the video stream that you want to pull. In this example, set the parameter to `rts video`.

Sample success response

Response:
{
    "trace_id":"2_1591173296_101.227.XX.XX_702080732320_dec327eb6eed0e0b07b349c8a565****",
    "code":200,
    "jsep":{
        "type":"answer",
        "sdp":"v=0\r\no=- 1591173291 2 IN IP4 127.0.0.1\n\r Omitted content"
    }
}

Parameter	Type	Required	Description
code	int	Yes	The HTTP status code. If the request is successful, the code 200 is returned. For more information about status codes, see the "Status codes" section.
trace_id	string	Yes	The globally unique ID (GUID) of the request. The GUID is generated by Alibaba Cloud CDN and can be used to troubleshoot issues. Keep the GUID properly.
jsep.type	string	Yes	The type of the SDP message. In this example, the value answer is returned.
jsep.sdp	string	Yes	The description of the SDP message that is generated when POPs pull streams from the origin.

Table 2. Status codes
Status code	Description
403	Indicates that the authentication failed.
404	Indicates that the stream does not exist.
611	Indicates that the client must play the stream over TCP.
302	Indicates that the client must send the request to a new address.

Enhanced SDP negotiation

Messages are exchanged in the SDP format during signaling. SDP negotiation is generally based on RFC 4566. RTS expands more semantics to make the negotiation compatible with the characteristics of the live streaming industry. RTS supports more container formats of videos and audio and more communications protocols. This way, RTS resolves the issue that WebRTC supports only the Opus format for audio and does not support B-frames. RTS meets the needs of increasing streaming protocols.

AAC audio supported

RTS can transmit audio in various AAC formats over RTMP. The AAC formats include AAC-LC, HE-AACv1, and HE-AACv2. For more information about AAC formats, see ISO IEC 14496-3.

RTS can transmit audio in AAC formats by using the Low-overhead MPEG-4 Audio Transport Multiplex (LATM) container format. LATM determines whether the encoding information about audio is transmitted in in-band or out-of-band mode based on whether the audio contains the encoding information. In-band transmission sends the encoding information for each audio frame. Out-of-band transmission sends the encoding information only once. The muxconfigPresent parameter in an AudioMuxElement array specifies whether the information in AudioSpecificConfig is transmitted in in-band or out-of-band mode. Therefore, LATM is more flexible than Audio Data Transport Stream (ADTS). If the information in AudioSpecificConfig remains unchanged, the information in StreamMuxConfig can be first transmitted in an SDP message.

During signaling, RTS parses the encoding information during audio stream ingest and returns the parsed information in the negotiation response, as shown in the following code.

SDP offer	SDP answer
SDP offer	AAC-LC	HE-AACv1	HE-AACv2
`m=audio 9 UDP/RTP/AVPF 120 96 a=rtpmap:120 MP4A-LATM/44100/2`	`AudioSpecificConfig = 0x1210`	`AudioSpecificConfig = 2b920800`	`AudioSpecificConfig = eb8a0800`
	`a=rtpmap:120 MP4A-LATM/44100/2 a=fmtp:120 cpresent=0;profile-level-id=1;object=2;config=400024203fc0`	`a=rtpmap:120 MP4A-LATM/44100/2 a=fmtp:120 cpresent=0;profile-level-id=1;object=2;config=4000572410003fc0;SBR-enabled=1`	`a=rtpmap:120 MP4A-LATM/44100/2 a=fmtp:120 cpresent=0;object=2;profile-level-id=1;config=4001d71410003fc0;PS-enabled=1;SBR-enabled=1`

If SBR-enabled=1 is added in the fmtp attribute of MP4A-LATM, the AAC format is AAC-HE. If SBR-enabled=1 and PS-enabled=1 are added, the AAC format is HE-AACv2. The AAC format is evolved from AAC-LC to HE-AACv2. Therefore, the SBR and PS fields can be used in the fmtp attribute to indicate AAC formats. In addition, you can add config=StreamMuxConfig in the fmtp attribute. StreamMuxConfig is obtained from AudioSpecificConfig of the ingested stream and contains parameters that are related to the details of the encoding information. The client can obtain the details as needed.

002

For more information, see AAC-LC / HE-AACv1 / HE-AACv2 Encoder Parameters.

H.265 videos supported

RTS parses the encoding information of videos in the H.264 or H.265 format during stream ingest and returns the information about the videos in the H.264 or H.265 format in the SDP answer.

Encoding format	SDP offer	SDP answer
H.265	`a=rtpmap:102 H265/90000`	`a=rtpmap:122 H265/90000 a=fmtp:122`

Videos that contain B-frames supported

During signaling, the client can add a field in the SDP offer to specify whether to decode videos that contain B-frames. For example, if the client adds BFrame-enabled = 1 in the fmtp attribute, the client can decode videos that contain B-frames. In this case, RTP timestamp = PTS can be added, which means that the client decodes each frame based on the increasing sequence number. If videos that contain B-frames are not supported, RTS can transcode the source streams to remove B-frames.

In addition, the server can return a composition timestamp (CTS). This allows the client to calculate the decoding timestamp (DTS) based on the following formula: Presentation timestamp (PTS) = DTS + CTS. If an SDP offer contains a=extmap:{$id} uri:webrtc:rtc:rtp-hdrext:video:CompositionTime, RTS adds extension identifier = {$id} to the first Real-time Transport Protocol (RTP) packet of each video frame. The value of the id variable is determined by the SDP offer that is sent by the client. The following figures show the partial content of the SDP offer and the packet capture during stream pulling:

Offer SDP片段 004

RTS allows the client to determine whether to decode videos that contain B-frames and whether to return CTS information. This ensures general capabilities in communications.

MSID mechanism

For more information about MSID, see The Msid Mechanism.