Audio/Video Transport WG M.M. Hannuksela Internet Draft Y.-K. Wang Intended status: Standards track Nokia Expires: January 2009 July 14, 2008 Session Multiplexing for SVC Video draft-hannuksela-avt-rtp-svc-01.txt Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html This Internet-Draft will expire on January 14, 2009. Copyright Notice Copyright (C) The IETF Trust (2008). Hannuksela, Wang Expires January 14, 2009 [Page 1] Internet-Draft Session Multiplexing for SVC Video July 2008 Abstract This memo describes two alternative methods for decoding order recovery of the Network Abstraction Layer (NAL) units carried in multiple RTP sessions for Scalable Video Coding (SVC), which is defined in Annex G of the ITU-T Recommendation H.264 video codec that is technically identical to Amendment 3 of ISO/IEC International Standard 14496-10. The methods apply when non-interleaved transmission of NAL units using the Single NAL Unit packetization mode or the Non-Interleaved packetization mode defined in RFC 3984 is in use. Table of Contents Status of this Memo...............................................1 Copyright Notice..................................................1 Abstract..........................................................2 Table of Contents.................................................2 1. Introduction...................................................4 2. Conventions....................................................4 3. Definitions and Abbreviations..................................4 3.1. Definitions...............................................4 3.1.1. Definitions from the SVC Specification...............4 3.1.2. Definitions Specific to This Memo....................4 3.2. Abbreviations.............................................4 4. RTP Payload Format.............................................5 4.1. Design Principles.........................................5 4.2. RTP Header Usage..........................................5 4.3. Common Structure of the RTP Payload Format................5 4.4. NAL Unit Header Usage.....................................5 4.5. Packetization Modes.......................................5 4.5.1. Packetization Modes for Multi-Session Transmission...5 4.6. Decoding Order Number (DON)...............................7 4.7. Identification of Access Units for Decoding Order Recovery in Multi-Session Transmission.....................................7 4.7.1. Access Unit Identifier (AUID) for the NI-A Mode......8 4.7.2. Timestamp Difference (TSD) for the NI-TSD Mode.......8 4.8. Aggregation Packets.......................................9 4.9. Fragmentation Units (FUs).................................9 4.10. Payload Content Scalability Information (PACSI) NAL Unit10 4.10.1. PACSI NAL Unit Modifications for the NI-A Mode.....10 4.10.2. PACSI NAL Unit Modifications for the NI-TSD Mode...10 5. Packetization Rules...........................................10 5.1. Packetization Rules for Multi-Session Transmission.......10 Hannuksela, Wang Expires January 14, 2009 [Page 2] Internet-Draft Session Multiplexing for SVC Video July 2008 5.1.1. NI-A and NI-TSD MST Packetization Rules.............11 5.1.2. Packetization rules for non-VCL NAL units...........12 5.1.3. Packetization rules for Prefix NAL units............12 6. De-Packetization Process......................................12 6.1. De-Packetization Process for Multi-Session Transmission..12 6.1.1. Decoding Order Recovery for the NI-A Mode...........12 6.1.1.1. Example 1 (Informative)........................13 6.1.1.2. Example 2 (Informative)........................15 6.1.2. Decoding Order Recovery for the NI-TSD Mode.........17 6.1.2.1. Example 1 (Informative)........................18 6.1.2.2. Example 2 (Informative)........................20 6.1.3. Informative Algorithm for NI-A and NI-TSD Decoding Order Recovery within an Access Unit.............................22 7. Payload Format Parameters.....................................22 7.1. Media Type Registration..................................22 7.2. SDP Parameters...........................................23 7.3. Examples.................................................23 7.4. Parameter Set Considerations.............................23 8. Security Considerations.......................................23 9. Congestion Control............................................23 10. IANA Consideration...........................................23 11. Informative Appendix: Application Examples...................23 12. References...................................................23 12.1. Normative References....................................23 12.2. Informative References..................................24 13. Authors' Addresses...........................................24 Intellectual Property Statement..................................24 Disclaimer of Validity...........................................25 Copyright Statement..............................................25 Hannuksela, Wang Expires January 14, 2009 [Page 3] Internet-Draft Session Multiplexing for SVC Video July 2008 1. Introduction Section 1 of draft-ietf-avt-rtp-svc-13 applies. This memo specifies two alternative methods for decoding order recovery of NAL units carried in a non-interleaved manner in multiple RTP sessions, referred to as Multi-Session Transmission (MST). Either of these two introduced MST packetization modes could be used to replace those specified in draft-ietf-avt-rtp-svc-13. 2. Conventions Section 2 of draft-ietf-avt-rtp-svc-13 applies. 3. Definitions and Abbreviations 3.1. Definitions 3.1.1. Definitions from the SVC Specification Section 3.1.1 of draft-ietf-avt-rtp-svc-13 applies. 3.1.2. Definitions Specific to This Memo Section 3.1.2 of draft-ietf-avt-rtp-svc-13 applies with the following addition. access unit identifier (AUID): A variable that is derived for each access unit when the single NAL unit packetization mode or the non-interleaved packetization mode is in use in Multi-Session Transmission. The value of AUID is identical for all NAL units of an access unit regardless of the session in which the NAL units are conveyed in. The AUID values of consecutive access units differ regardless of which sessions are decoded, but there are no other constraints of AUID values. 3.2. Abbreviations Section 3.2 of draft-ietf-avt-rtp-svc-13 applies with the following additions. AUID: Access Unit Identifier TSD: Timestamp Difference Hannuksela, Wang Expires January 14, 2009 [Page 4] Internet-Draft Session Multiplexing for SVC Video July 2008 4. RTP Payload Format 4.1. Design Principles Section 5.1 of draft-ietf-avt-rtp-svc-13 applies. 4.2. RTP Header Usage Section 5.2 of draft-ietf-avt-rtp-svc-13 applies. 4.3. Common Structure of the RTP Payload Format Section 5.3 of draft-ietf-avt-rtp-svc-13 applies. 4.4. NAL Unit Header Usage Section 5.4 of draft-ietf-avt-rtp-svc-13 applies. 4.5. Packetization Modes Section 5.4 of RFC 3984 applies when MST is not in use. The packetization modes specified in Section 5.4 of RFC 3984 are also referred to as session packetization modes. When MST is in use, the following applies in addition. 4.5.1. Packetization Modes for Multi-Session Transmission This memo specifies two MST packetization modes for non-interleaved MST: o Non-interleaved AUID-based mode (NI-A) o Non-interleaved timestamp-difference-based mode (NI-TSD) In the NI-A and NI-TSD modes, NAL units in each RTP session are transmitted in NAL unit decoding order. NI-A or NI-TSD could be used instead of the MST packetization modes NI-T, NI-C, and NI-TC specified in draft-ietf-avt-rtp-svc-13. The NI-A and NI-TSD modes simplify the packetization rules compared to those of NI-T, NI-C, and NI-TC. In the NI-A and NI-TSD modes, senders need not add NAL units to the stream and receivers need not remove the added NAL units as must be done in the NI-T and NI-TC modes. Moreover, the NI-MTAP packet introduced for NI-T and NI-TC modes is not needed and hence one precious NAL unit type value (the last one left for use in RTP payload specifications after the Hannuksela, Wang Expires January 14, 2009 [Page 5] Internet-Draft Session Multiplexing for SVC Video July 2008 introduction of the PACSI NAL unit in the SVC draft) is saved for future extensions. The decoding order recovery process for the NI-A and NI-TSD modes does not require the reception and processing of RTCP sender reports, which makes the decoding order recovery process more straightforward compared to that of the NI-T mode. The operation of the NI-A mode is very similar to the NI-TSD mode - the only difference being how access units are identified. The NI-A mode labels each access unit with an identifier, while the NI-TSD mode identifies access units with their RTP timestamp, which is indicated relative to the current packet in order to avoid dependencies on the random initial RTP timestamp. However, when the NI-TSD mode is in use, the same initial RTP timestamp offset MUST be used in each associated RTP session as proposed in [I-D.lennox-avt- rtp-layered-encoding-timestamps]. As the NI-TSD mode leaves less implementation freedom for senders and hence reduces the likelihood of ill-behaving sender implementations, it is the preferred mode proposed in this memo. However, as the usage of the same initial RTP offset in all sessions as proposed in [I-D.lennox-avt-rtp-layered- encoding-timestamps] has not been agreed yet, we included both NI-A and NI-TSD in this memo. This memo does not specify any MST mode for interleaved transmission, which would allow transmission of NAL units out of NAL unit decoding order in each RTP session. The MST packetization mode in use is signaled by the pmode media type parameter or by external means. The used MST packetization mode governs which session packetization modes are allowed in the involved RTP sessions, which in turn govern which NAL unit types are allowed as RTP payloads. Table 3.1 summarizes the allowed session packetization modes for the NI-A and NI-TSD MST packetization modes. Table 3.1 Summary of allowed session packetization modes for the NI- A and NI-TSD MST packetization modes (yes = allowed, no = disallowed) Session-Specific Mode Base Session Enhancement Session ---------------------------------------------------------- Single NAL Unit Mode yes no Non-Interleaved Mode yes yes Interleaved Mode no no Hannuksela, Wang Expires January 14, 2009 [Page 6] Internet-Draft Session Multiplexing for SVC Video July 2008 Table 3.2 summarizes the allowed packet payload types for each allowed session packetization mode of the NI-A and NI-TSD MST packetization modes. Table 3.2 Summary of allowed packet payload types for each session packetization mode of the NI-A and NI-TSD MST packetization modes (yes = allowed, no = disallowed, ig = ignore) Packet Packet Single NAL Non-Interleaved Payload Type Unit Mode Mode Type ------------------------------------------------ 0 undefined ig ig 1-23 NAL unit yes yes 24 STAP-A no yes 25 STAP-B no no 26 MTAP16 no no 27 MTAP24 no no 28 FU-A no yes 29 FU-B no no (base session) yes (enh. session) 30 PACSI yes yes 31 undefined ig ig Informative note: FU-B are allowed in the enhancement session as specified in Section 4.9. The packet payload type values indicated as undefined in Table 3.2 are reserved for future extensions. NAL units of those type values SHOULD NOT be sent by a sender (as packet payloads in single NAL unit packets or aggregation units in aggregation packets, or in FU packets) and MUST be ignored by a receiver. Note that NAL unit types 30 and 31 are indicated as undefined in RFC 3984, therefore RFC 3984 receivers MUST ignore NAL units of these types, if present. 4.6. Decoding Order Number (DON) Section 5.5 of [RFC3984] applies when MST is not in use. 4.7. Identification of Access Units for Decoding Order Recovery in Multi-Session Transmission The decoding order recovery process in the NI-A and NI-TSD MST packetization modes proposed in this memo consists of three steps. First, a set of candidate access units is formed by including the next access unit in transmission order (in relation to the access unit that has just been processed) in each of the sessions. Second, Hannuksela, Wang Expires January 14, 2009 [Page 7] Internet-Draft Session Multiplexing for SVC Video July 2008 for each candidate access unit, the previous access unit in decoding order in the same or a lower session is identified by information in the associated PACSI NAL unit or FU-B NAL unit. In the NI-A mode, the Access Unit Identifier is used for the identification of the previous access unit. In the NI-TSD mode, the signed timestamp difference between the current access unit and the previous access unit in the same or a lower session is indicated. Third, the next access unit in decoding order is the access unit in the highest session among the candidate access units for which the indicated previous access unit is not a candidate access unit. 4.7.1. Access Unit Identifier (AUID) for the NI-A Mode When the NI-A MST packetization mode is in use, the packetization of each session MUST be as specified in Section 5.1. and the following applies. The NI-A mode uses two fields, AUID and PAUID, for the recovery of the decoding order of NAL units. AUID and PAUID are conveyed in PACSI NAL units or in FU-B packets. AUID and PAUID MUST be conveyed in at least one PACSI NAL unit or FU-B packet for each access unit in each session. AUID indicates the access unit identifier. The AUID value for all NAL units having the same NALU-time MUST be identical. The AUID value for consecutive access units in any set of sessions in the session dependency order MUST differ. PAUID indicates the access unit identifier of the previous access unit in decoding order among the session containing the packet including the PAUID field and the sessions below it in the session dependency hierarchy specified according to [I-D.ietf-mmusic- decoding-dependency]. AUID and PAUID are 8-bit unsigned integers. 4.7.2. Timestamp Difference (TSD) for the NI-TSD Mode When the NI-TSD MST packetization mode is in use, the packetization of each session MUST be as specified in Section 5.1. and the following applies. The NI-TSD mode uses the RTP timestamp and one field, TSD, for the recovery of the decoding order of NAL units. TSD is conveyed in PACSI NAL units or in FU-B packets. TSD MUST be conveyed in at least one PACSI NAL unit or FU-B packet for each access unit in each session. Hannuksela, Wang Expires January 14, 2009 [Page 8] Internet-Draft Session Multiplexing for SVC Video July 2008 The TSD field SHALL be set as follows: TSD = (TS(p) - TS(c)) / AUTICK, when abs(TS(p) - TS(c)) <= 2^31 TSD = (TS(p) - 2^32 - TS(c)) / AUTICK, when TS(p) - TS(c) > 2^31 TSD = (2^32 - TS(p) - TS(c)) / AUTICK, when TS(c) - TS(p) > 2^31 where TS(p) is the RTP timestamp of the previous access unit containing NAL units within this session (conveying the TSD field), TS(c) is the RTP timestamp of the current access unit (conveying the TSD field), and AUTICK is the value of the sprop-au-tick media type parameter. Informative note: The second and third equation above cover the cases where TS(c) and TS(p), respectively, have wrapped over the maximum value for 32-bit unsigned integer, while the first equation covers the cases where neither of TS(p) and TS(c) have wrapped over. TSD is a 16-bit signed integer. 4.8. Aggregation Packets Section 5.6 of draft-ietf-avt-rtp-svc-13 applies. 4.9. Fragmentation Units (FUs) Section 5.7 of draft-ietf-avt-rtp-svc-13 applies with the following modifications. When fragmentation units are used in the NI-A mode, FU-B MUST be used in enhancement sessions for the first fragmentation unit of a fragmented NAL unit. The DON field of the FU-B header in enhancement sessions is replaced by the AUID field followed by the PAUID field. The AUID field MUST be equal to the AUID value for the access unit containing the fragmented NAL unit. The semantics of the PAUID field are specified in Section 4.7.1. When fragmentation units are used in the NI-TSD mode, FU-B MUST be used in enhancement sessions for the first fragmentation unit of a fragmented NAL unit. The DON field of the FU-B header in enhancement sessions is replaced by the TSD field. The semantics of the TSD field are specified in Section 4.7.2. Hannuksela, Wang Expires January 14, 2009 [Page 9] Internet-Draft Session Multiplexing for SVC Video July 2008 4.10. Payload Content Scalability Information (PACSI) NAL Unit Section 5.8 of draft-ietf-avt-rtp-svc-13 applies with the following modifications. 4.10.1. PACSI NAL Unit Modifications for the NI-A Mode The DONC field is replaced by the AUID field followed by the PAUID field. The semantics of DONC are removed. The occurrences of "DONC" are replaced with "AUID and PAUID". The semantics of AUID and PAUID are specified as follows. o When present, the field AUID indicates the access unit identifier for all the NAL units in the aggregation packet (when the PACSI NAL unit is included in an aggregation packet) or the AUID of the next non-PACSI NAL unit in transmission order (when the PACSI NAL unit is included in a single NAL unit packet). The constraints in Section 4.7.1. apply for the AUID. o The semantics of the PAUID field are specified in Section 4.7.1. 4.10.2. PACSI NAL Unit Modifications for the NI-TSD Mode The DONC field is replaced by the TSD field. The semantics of DONC are removed. The occurrences of "DONC" are replaced with "TSD". The semantics of TSD are specified in Section 4.7.2. 5. Packetization Rules Section 6 of draft-ietf-avt-rtp-svc-13 applies. 5.1. Packetization Rules for Multi-Session Transmission When MST is used, decoding order recovery for NAL units carried in the associated RTP sessions is needed. The following packetization rules ensure that decoding order of NAL units carried in the associated sessions can be correctly recovered for each of the MST packetization modes according to the de-packetization process specified in Section 6.1. . Hannuksela, Wang Expires January 14, 2009 [Page 10] Internet-Draft Session Multiplexing for SVC Video July 2008 5.1.1. NI-A and NI-TSD MST Packetization Rules When the NI-A or NI-TSD mode is in use, the following applies. o For each single NAL unit packet containing a non-PACSI NAL unit, if present, the previous packet MUST have the same RTP timestamp as the single NAL unit packet, and the following applies. If the NALU-time of the non-PACSI NAL unit is not equal to the NALU-time of the previous non-PACSI NAL unit in decoding order, the previous packet MUST contain a PACSI NAL unit containing the AUID and PAUID fields when the NI-A mode is in use or the TSD field when the NI-TSD mode is in use; Otherwise (the NALU-time of the non-PACSI NAL unit is equal to the NALU-time of the previous non-PACSI NAL unit in decoding order), the previous packet MAY contain a PACSI NAL unit containing the AUID and PAUID fields when the NI-A mode is in use or the TSD field when the NI-TSD mode is in use. o For each STAP-A packet, if present, if the RTP timestamp is different from the RTP timestamp of the previous STAP-A packet, the first NAL unit in the STAP-A packet MUST be a PACSI NAL unit containing the AUID and PAUID fields when the NI-A mode is in use or the TSD field when the NI-TSD mode is in use. o For each FU-A packet, if present, the previous packet MUST have the same RTP timestamp as the FU-A packet, and the following applies. If the FU-A packet is the start of the fragmented NAL unit, the following applies; If the NALU-time of the fragmented NAL unit is not equal to the NALU-time of the previous non-PACSI NAL unit in decoding order, the previous packet MUST contain a PACSI NAL unit containing the AUID and PAUID fields when the NI-A mode is in use or the TSD field when the NI-TSD mode is in use; Otherwise (the NALU-time of the fragmented NAL unit is equal to the NALU-time of the previous non-PACSI NAL unit in decoding order), the previous packet MAY contain a PACSI NAL unit containing the AUID and PAUID fields when the NI-A mode is in use or the TSD field when the NI-TSD mode is in use. Hannuksela, Wang Expires January 14, 2009 [Page 11] Internet-Draft Session Multiplexing for SVC Video July 2008 o For each single NAL unit packet containing a PACSI NAL unit, if present, the PACSI NAL unit MUST contain the AUID and PAUID fields when the NI-A mode is in use or the TSD field when the NI-TSD mode is in use. 5.1.2. Packetization rules for non-VCL NAL units Section 6.1.4 of draft-ietf-avt-rtp-svc-13 applies. 5.1.3. Packetization rules for Prefix NAL units Section 6.1.5 of draft-ietf-avt-rtp-svc-13 applies. 6. De-Packetization Process For single-session transmission, where a single RTP session is used, the de-packetization process specified in Section 7 of [RFC3984] applies. For multi-session transmission, where more than one RTP sessions are used to receive data from the same SVC bitstream, the de- packetization process is specified in Section 6.1. 6.1. De-Packetization Process for Multi-Session Transmission 6.1.1. Decoding Order Recovery for the NI-A Mode The following process SHALL be applied when the NI-A mode is in use. The decoding order recovery SHOULD start from an access unit where NAL units are present for the base session, herein referred to as access unit F. Any packets preceding the first received packet of access unit F in reception order SHOULD be discarded. The decoding order of NAL units of access unit F is specified below. For subsequent access units to be ordered, the following applies. Let AUID(n) and PAUID(n) be the AUID and PAUID values, respectively, of the first access unit in decoding order containing data in session n. The first access unit in decoding order containing data in session n can be identified by the smallest value of RTP sequence number within session n (taking into account the potential wraparound of RTP sequence numbers) among those packets whose payloads have not been passed to the decoder yet. Let a set of sessions S consist of those values of n for which NAL units are present in the first access unit in decoding order containing data in session n but are not present in a higher session in the same access unit. In other words, Hannuksela, Wang Expires January 14, 2009 [Page 12] Internet-Draft Session Multiplexing for SVC Video July 2008 the set of sessions S contains the highest session of those access units that are candidates of being next in decoding order. The next access unit in decoding order is the access unit with the greatest value of m, where PAUID(m) is not equal to AUID(i), where m is any value within the set of sessions S and i is any value less than m within the set of sessions S. In other words, the next access unit in decoding order is found by investigating the candidate access units in session dependency order from the highest session to the lowest session according to the highest session for which the candidate access units contain NAL units. The next access unit in decoding order is the first access unit in the above investigation order that is not indicated to follow any candidate access unit in a lower session in decoding order. The decoding order of NAL units of the access unit having AUID equal to AUID(m) is specified below. Informative note: In practical implementations, the set of sessions S can be formed by considering only those access units that have arrived within a certain inter-session jitter compensation period. Consequently, it may not be necessary to wait access units from all sessions to arrive at a particular time for decoding order recovery. If several NAL units share the same value of AUID, the order in which NAL units are passed to the decoder is specified as follows: o Collect all NAL units NU(y) associated with the same value of AUID. o Place the collected NAL units in the session dependency order specified according to [I-D.ietf-mmusic-decoding-dependency] and then in the consecutive order of appearance within each session into an access unit while satisfying the NAL unit order rules in SVC access units as specified in [SVC] and summarized as an informative algorithm in Section 6.1.3. 6.1.1.1. Example 1 (Informative) The example shown in Figure 1 refers to three RTP sessions A, B and C containing a multiplexed SVC bitstream. In the example, the dependency signaling [I-D.ietf-mmusic-decoding-dependency] indicates that Session A is the base RTP session, B is the first enhancement RTP session and depends on A, and C is the second RTP enhancement session and depends on A and B. In the example, Session A has the lowest frame rate and Session B and C have the same, but a higher frame rate (using a hierarchical prediction structure). Arbitrary values of AUID values have been used in the example. Hannuksela, Wang Expires January 14, 2009 [Page 13] Internet-Draft Session Multiplexing for SVC Video July 2008 Figure 1 shows an example for de-jitter buffering with different jitters present in the sessions, i.e. at buffering startup not all packets with the same timestamp are available in all the de-jittering buffers. Jitter between the sessions is first assumed to be compensated by removing all NAL units preceding NAL unit with AUID equal to 2 (TS[1]). At the next step, the first access unit with data present in the base session is identified. In this example, it is the access unit with AUID euqal to 4 (TS[8]). The preceding access units (with AUID equal to 2 (TS[1]) and AUID equal to 5 (TS[3])) are removed. NAL units of access unit with AUID equal to 4 (TS[8]) are passed to the decoder in layer dependency order. The next access unit (with AUID equal to 6 (TS[6])) has NAL units present in each session, hence it is selected as the next access unit to be decoded. Within independent sessions the next NAL units in decoding order belong to the access unit with AUID equal to 8 (TS[5]) (in sessions B and C) and to access unit AUID equal to 9 (TS[12]) (in session A). As session B and session A are not the highest sessions for the access unit with AUID equal to 8 and 9, respectively, the set of sessions S consists of only one session and the access unit with AUID equal to AUID(C) is selected as the next access unit in decoding order. The decoding order recovery process is then continues similarly for the following access units. Hannuksela, Wang Expires January 14, 2009 [Page 14] Internet-Draft Session Multiplexing for SVC Video July 2008 Decoding order and dependency of NAL units per received RTP session with different jitter in sessions at buffering startup time: C: -------------(2,3)-(5,2)-(4,5)-(6,4)-(8,6)-(7,8)-(9,7)- | | | | | | | | | B: -(1,a)-(3,1)-(2,3)-(5,2)-(4,5)-(6,4)-(8,6)-(7,8)-(9,7)- | | | | | A: -------(3,a)-------------(4,3)-(6,4)-------------(9,6)- ----------------------------------------------------------> TS: [4] [2] [1] [3] [8] [6] [5] [7] [12] Key: A, B, C - RTP sessions '( )' - (AUID, PAUID) a=any value in this example '|' - indicates corresponding NAL units of the same access unit AU(TS[..]) in the RTP sessions Integer values in '[]' - media Timestamp (TS), sampling time as derived from RTP timestamps associated to the access unit AU(TS[..]). Figure 1 Example for MST with different jitter in session at startup 6.1.1.2. Example 2 (Informative) The example shown in Figure 2 refers to three RTP sessions A, B and C containing a multiplexed SVC bitstream. In the example, the dependency signaling [I-D.ietf-mmusic-decoding-dependency] indicates that Session A is the base RTP session, B is the first enhancement RTP session and depends on A, and C is the second RTP enhancement session and depends on A and B. Sessions A, B and C represent different levels of temporal scalability. Arbitrary values of AUID values have been used in the example. The initial de-jittering is assumed to be tackled similarly as in the previous example and not illustrated in Figure 2. At the beginning, the first access unit with data present in the base session is identified. In this example, it is the access unit with AUID euqal to 3 (TS[8]). The preceding access unit (with AUID equal to 2 (TS[3]) is removed. Hannuksela, Wang Expires January 14, 2009 [Page 15] Internet-Draft Session Multiplexing for SVC Video July 2008 The next NAL units in decoding order belong to access unit with AUID equal to 9, 5, and 1 for session A, B, and C respectively, hence AUID(A)=9, PAUID(A)=3, AUID(B)=5, PAUID(B)=3, AUID(C)=1, PAUID(C)=5. All three sessions are present in the set of sessions S. As PAUID(C) is equal to AUID(B), the access unit with AUID equal to AUID(C) is not selected as the next access unit in decoding order. As PAUID(B) is not equal to AUID(A), the access unit with AUID equal to AUID(B) is selected as the next access unit in decoding order. The next NAL units in decoding order belong to access unit with AUID equal to 9, 8, and 1 for session A, B, and C respectively, hence AUID(A)=9, PAUID(A)=3, AUID(B)=8, PAUID(B)=9, AUID(C)=1, PAUID(C)=5. All three sessions are present in the set of sessions S. As PAUID(C) is not equal to AUID(B) or AUID(A), the access unit with AUID equal to AUID(C) is selected as the next access unit in decoding order. After that, access unit with AUID equal to 4 is selected similarly as the next in decoding order. The next NAL units in decoding order belong to access unit with AUID equal to 9, 8, and 7 for session A, B, and C respectively, hence AUID(A)=9, PAUID(A)=3, AUID(B)=8, PAUID(B)=9, AUID(C)=7, PAUID(C)=8. All three sessions are present in the set of sessions S. As PAUID(C) is equal to AUID(B) and PAUID(B) is equal to AUID(A), the access unit with AUID equal to AUID(C) or AUID(B) is not selected as the next access unit in decoding order. As there is no session below session A, the access unit with AUID equal to AUID(A) is selected as the next access unit in decoding order. The decoding order recovery process is then continues similarly for the following access units. Hannuksela, Wang Expires January 14, 2009 [Page 16] Internet-Draft Session Multiplexing for SVC Video July 2008 Decoding order and dependency of NAL units per received RTP session: C: --(2,a)-------------(1,5)-(4,1)-------------(7,8)-(6,7)- B: --------------(5,3)-------------------(8,9)------------- A: --------(3,a)-------------------(9,3)------------------- -----------------------------------------------------------> TS: [3] [8] [6] [5] [7] [12] [10] [9] [11] Key: A, B, C - RTP sessions '( )' - (AUID, PAUID) a=any value in this example '|' - indicates corresponding NAL units of the same access unit AU(TS[..]) in the RTP sessions Integer values in '[]' - media Timestamp (TS), sampling time as derived from RTP timestamps associated to the access unit AU(TS[..]). Figure 2 Example for MST with different jitter in session at startup 6.1.2. Decoding Order Recovery for the NI-TSD Mode The following process SHALL be applied when the NI-TSD session- multiplexing packetization mode is in use. The decoding order recovery SHOULD start from an access unit where NAL units are present for the base session, herein referred to as access unit F. Any packets preceding the first received packet of access unit F in reception order SHOULD be discarded. The decoding order of NAL units of access unit F is specified below. For subsequent access units to be ordered, the following applies. Let TS(n) and TSD(n) be the RTP timestamp and TSD values, respectively, of the first access unit in decoding order containing data in session n. The first access unit in decoding order containing data in session n can be identified by the smallest value of RTP sequence number within session n (taking into account the potential wraparound of RTP sequence numbers) among those packets whose payloads have not been passed to the decoder yet. Let a set of sessions S consist of those values of n for which NAL units are Hannuksela, Wang Expires January 14, 2009 [Page 17] Internet-Draft Session Multiplexing for SVC Video July 2008 present in the first access unit in decoding order containing data in session n but are not present in a higher session in the same access unit. In other words, the set of sessions S contains the highest session of those access units that are candidates of being next in decoding order. The next access unit in decoding order is the access unit with the greatest value of m, where TS(m) + TSD(m) * AUTICK (where AUTICK is the value of the sprop-au-tick media type parameter) is not equal to TS(i), where m is any value within the set of sessions S and i is any value less than m within the set of sessions S. In other words, the next access unit in decoding order is found by investigating the candidate access units in session depedency order from the highest session to the lowest session according to the highest session for which the candidate access units contain NAL units. The next access unit in decoding order is the first access unit in the above investigation order that is not indicated to follow any candidate access unit in a lower session in decoding order. The decoding order of NAL units of the access unit having RTP timestamp equal to TS(m) is specified below. Informative note: In practical implementations, the set of sessions S can be formed by considering only those access units that have arrived within a certain inter-session jitter compensation period. Consequently, it may not be necessary to wait access units from all sessions to arrive at a particular time for decoding order recovery. If several NAL units share the same value of RTP timestamp, the order in which NAL units are passed to the decoder is specified as follows: o Collect all NAL units NU(y) associated with the same value of RTP timestamp. o Place the collected NAL units in the session dependency order specified according to [I-D.ietf-mmusic-decoding-dependency] and then in the consecutive order of appearance within each session into an access unit while satisfying the NAL unit order rules in SVC access units as specified in [SVC] and summarized as an informative algorithm in Section 6.1.3. 6.1.2.1. Example 1 (Informative) The video stream in this example is identical to that of Section 6.1.1.1. Hannuksela, Wang Expires January 14, 2009 [Page 18] Internet-Draft Session Multiplexing for SVC Video July 2008 The example shown in Figure 3 refers to three RTP sessions A, B and C containing a multiplexed SVC bitstream. In the example, the dependency signaling [I-D.ietf-mmusic-decoding-dependency] indicates that Session A is the base RTP session, B is the first enhancement RTP session and depends on A, and C is the second RTP enhancement session and depends on A and B. In the example, Session A has the lowest frame rate and Session B and C have the same, but a higher frame rate (using a hierarchical prediction structure). Figure 3 shows an example for de-jitter buffering with different jitters present in the sessions, i.e. at buffering startup not all packets with the same timestamp are available in all the de-jittering buffers. Jitter between the sessions is first assumed to be compensated by removing all NAL units preceding NAL unit with TS[1]. At the next step, the first access unit with data present in the base session is identified. In this example, it is the access unit with TS[8]. The preceding access units (with TS[1] and TS[3]) are removed. NAL units of access unit with TS[8] are passed to the decoder in layer dependency order. The next access unit (with TS[6]) has NAL units present in each session, hence it is selected as the next access unit to be decoded. Within independent sessions the next NAL units in decoding order belong to the access unit with TS[5] (in sessions B and C) and to access unit with TS[12] (in session A). As session B and session A are not the highest sessions for the access unit with TS[5] and TS[12], respectively, the set of sessions S consists of only one session and the access unit with TS[5] is selected as the next access unit in decoding order. The decoding order recovery process is then continues similarly for the following access units. Hannuksela, Wang Expires January 14, 2009 [Page 19] Internet-Draft Session Multiplexing for SVC Video July 2008 Decoding order and dependency of NAL units per received RTP session with different jitter in sessions at buffering startup time: C: -------------(1)---(-2)--(-5)--(2)---(1)---(-2)--(-5)-- | | | | | | | | | B: -( )---(2)---(1)---(-2)--(-5)--(2)---(1)---(-2)--(-5)-- | | | | | A: -------(2)---------------(-6)--(2)---------------(-6)-- ----------------------------------------------------------> TS: [4] [2] [1] [3] [8] [6] [5] [7] [12] Key: A, B, C - RTP sessions '( )' - (TSD) '|' - indicates corresponding NAL units of the same access unit AU(TS[..]) in the RTP sessions Integer values in '[]' - media Timestamp (TS), sampling time as derived from RTP timestamps associated to the access unit AU(TS[..]). Figure 3 Example for MST with different jitter in session at startup 6.1.2.2. Example 2 (Informative) The video stream in this example is identical to that of Section 6.1.1.2. The example shown in Figure 4 refers to three RTP sessions A, B and C containing a multiplexed SVC bitstream. In the example, the dependency signaling [I-D.ietf-mmusic-decoding-dependency] indicates that Session A is the base RTP session, B is the first enhancement RTP session and depends on A, and C is the second RTP enhancement session and depends on A and B. Sessions A, B and C represent different levels of temporal scalability. The initial de-jittering is assumed to be tackled similarly as in the previous example and not illustrated in Figure 4. At the beginning, the first access unit with data present in the base session is identified. In this example, it is the access unit with TS[8]. The preceding access unit (with TS[3] is removed. Hannuksela, Wang Expires January 14, 2009 [Page 20] Internet-Draft Session Multiplexing for SVC Video July 2008 The next NAL units in decoding order belong to access unit with TS[12], TS[6], and TS[5] for sessions A, B, and C, respectively, hence TS(A)=12, TSD(A)=-4, TS(B)=6, TSD(B)=2, TS(C)=5, and TSD(C)=1. All three sessions are present in the set of sessions S. As TS(C) + TSD(C) = 5 + 1 = 6 = TS(B), the access unit with TS[5] is not selected as the next access unit in decoding order. As TS(B) + TSD(B) = 6 + 2 = 8 is not equal to TS(A), the access unit with TS[6] is selected as the next access unit in decoding order. The next NAL units in decoding order belong to access unit with TS[12], TS[10], and TS[5] for sessions A, B, and C, respectively, hence TS(A)=12, TSD(A)=-4, TS(B)=10, TSD(B)=2, TS(C)=5, and TSD(C)=1. All three sessions are present in the set of sessions S. As TS(C) + TSD(C) = 5 + 1 = 6 is not equal to TS(A) or TS(B), the access unit with TS[5] is selected as the next access unit in decoding order. After that, access unit with TS[7] is selected similarly as the next in decoding order. The next NAL units in decoding order belong to access unit with TS[12], TS[10], and TS[9] for sessions A, B, and C, respectively, hence TS(A)=12, TSD(A)=-4, TS(B)=10, TSD(B)=2, TS(C)=9, and TSD(C)=1. All three sessions are present in the set of sessions S. As TS(C) + TSD(C) = 9 + 1 = 10 = TS(B) and TS(B) + TSD(B) = 10 + 2 = 12 = TS(A), the access unit with TS[9] or TS[10] is not selected as the next access unit in decoding order. As there is no session below session A, the access unit with TS[12] is selected as the next access unit in decoding order. The decoding order recovery process is then continues similarly for the following access units. Hannuksela, Wang Expires January 14, 2009 [Page 21] Internet-Draft Session Multiplexing for SVC Video July 2008 Decoding order and dependency of NAL units per received RTP session: C: --(-2)--------------(1)---(-2)--------------(1)---(-2)- B: --------------(2)---------------------(2)-------------- A: --------(-4)--------------------(-4)------------------- ----------------------------------------------------------> TS: [3] [8] [6] [5] [7] [12] [10] [9] [11] Key: 0, 1, 2 - RTP sessions '( )' - (TSD) '|' - indicates corresponding NAL units of the same access unit AU(TS[..]) in the RTP sessions Integer values in '[]' - media Timestamp (TS), sampling time as derived from RTP timestamps associated to the access unit AU(TS[..]). Figure 4 Example for MST with different jitter in session at startup 6.1.3. Informative Algorithm for NI-A and NI-TSD Decoding Order Recovery within an Access Unit Section 7.1.1.1 of draft-ietf-avt-rtp-svc-13 applies. 7. Payload Format Parameters Section 8 of draft-ietf-avt-rtp-svc-13 applies. 7.1. Media Type Registration Section 8.1 of draft-ietf-avt-rtp-svc-13 applies with the following modifications. pmode: This parameter signals the properties of a NAL unit stream carried in more than one RTP session using MST or the capabilities of a receiver implementation. When the value of pmode is equal to "NI-A", the NI-A mode MUST be used. When the value of pmode is equal to "NI-TSD", the NI-TSD mode MUST Hannuksela, Wang Expires January 14, 2009 [Page 22] Internet-Draft Session Multiplexing for SVC Video July 2008 be used. This parameter MUST NOT be present, when "packetization-mode" is present. sprop-au-tick: This parameter indicates the number of 90000-kHz clock ticks used as a multiplier in the NI-TSD mode. The parameter MUST NOT be present when pmode is not equal to "NI-TSD". If the parameter is not present and the NI-TSD mode is in use, sprop- au-tick is inferred to be equal to 1. The value of sprop-au- tick MUST be a positive integer. 7.2. SDP Parameters Section 8.2 of draft-ietf-avt-rtp-svc-13 applies. 7.3. Examples Section 8.3 of draft-ietf-avt-rtp-svc-13 applies. 7.4. Parameter Set Considerations Section 8.4 of draft-ietf-avt-rtp-svc-13 applies. 8. Security Considerations Section 9 of draft-ietf-avt-rtp-svc-13 applies. 9. Congestion Control Section 10 of draft-ietf-avt-rtp-svc-13 applies. 10. IANA Consideration Section 11 of draft-ietf-avt-rtp-svc-13 applies. 11. Informative Appendix: Application Examples Section 12 of draft-ietf-avt-rtp-svc-13 applies. 12. References 12.1. Normative References Section 13.1 of draft-ietf-avt-rtp-svc-13 applies with the following additions. Hannuksela, Wang Expires January 14, 2009 [Page 23] Internet-Draft Session Multiplexing for SVC Video July 2008 [I-D.ietf-avt-rtp-svc] Wenger, S., Wang, Y.-K., Schierl, T., and Eleftheriadis, A., "RTP payload format for SVC video", draft-ietf-avt-rtp-svc-13 (work in progress), July 2008. [I-D.lennox] Lennox, J., Schierl, T., and Ganesan S., "Real-Time Transport Protocol (RTP) Timestamps for Layered Encodings", draft-lennox-avt-rtp-layered-encoding-timestamps-00, June 2, 2008. 12.2. Informative References Section 13.2 of draft-ietf-avt-rtp-svc-13 applies. 13. Authors' Addresses Miska M. Hannuksela Nokia Research Center P.O. Box 1000 33721 Tampere Finland Phone: +358-7180-73151 EMail: miska.hannuksela@nokia.com Ye-Kui Wang Nokia Research Center P.O. Box 1000 33721 Tampere Finland Phone: +358-50-466-7004 EMail: ye-kui.wang@nokia.com Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Hannuksela, Wang Expires January 14, 2009 [Page 24] Internet-Draft Session Multiplexing for SVC Video July 2008 Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Disclaimer of Validity This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Copyright Statement Copyright (C) The IETF Trust (2008). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. Hannuksela, Wang Expires January 14, 2009 [Page 25]