SIPPING Working Group V. Hilt (Ed.) Internet-Draft Bell Labs/Alcatel-Lucent Intended status: Informational July 5, 2008 Expires: January 6, 2009 Design Considerations for Session Initiation Protocol (SIP) Overload Control draft-hilt-sipping-overload-design-00 Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on January 6, 2009. Abstract Overload occurs in Session Initiation Protocol (SIP) networks when SIP servers have insufficient resources to handle all SIP messages they receive. Even though the SIP protocol provides a limited overload control mechanism through its 503 (Service Unavailable) response code, SIP servers are still vulnerable to overload. This document discusses models and design considerations for a SIP overload control mechanism. Hilt (Ed.) Expires January 6, 2009 [Page 1] Internet-Draft Overload Control July 2008 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Implicit vs. Explicit Overload Control . . . . . . . . . . . . 4 3. System Model . . . . . . . . . . . . . . . . . . . . . . . . . 5 4. Degree of Cooperation . . . . . . . . . . . . . . . . . . . . 6 4.1. Hop-by-Hop . . . . . . . . . . . . . . . . . . . . . . . . 7 4.2. End-to-End . . . . . . . . . . . . . . . . . . . . . . . . 8 4.3. Local Overload Control . . . . . . . . . . . . . . . . . . 9 5. Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . 9 6. Type of Overload Control Feedback . . . . . . . . . . . . . . 11 6.1. Rate-based Overload Control . . . . . . . . . . . . . . . 11 6.2. Loss-based Overload Control . . . . . . . . . . . . . . . 12 6.3. Window-based Overload Control . . . . . . . . . . . . . . 13 6.4. On-/Off Overload Control . . . . . . . . . . . . . . . . . 14 7. Overload Control Algorithms . . . . . . . . . . . . . . . . . 14 8. Self-Limiting . . . . . . . . . . . . . . . . . . . . . . . . 15 9. Security Considerations . . . . . . . . . . . . . . . . . . . 15 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 Appendix A. Contributors . . . . . . . . . . . . . . . . . . . . 15 11. Informative References . . . . . . . . . . . . . . . . . . . . 16 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 16 Intellectual Property and Copyright Statements . . . . . . . . . . 17 Hilt (Ed.) Expires January 6, 2009 [Page 2] Internet-Draft Overload Control July 2008 1. Introduction As with any network element, a Session Initiation Protocol (SIP) [RFC3261] server can suffer from overload when the number of SIP messages it receives exceeds the number of messages it can process. Overload can pose a serious problem for a network of SIP servers. During periods of overload, the throughput of a network of SIP servers can be significantly degraded. In fact, overload may lead to a situation in which the throughput drops down to a small fraction of the original processing capacity. This is often called congestion collapse. Overload is said to occur if a SIP server does not have sufficient resources to process all incoming SIP messages. These resources may include CPU, memory, network bandwidth, input/output, or disk resources. For overload control, we only consider failure cases where SIP servers are unable to process all SIP requests due to resource constraints. There are other cases where a SIP server can successfully process incoming requests but has to reject them due to other failure conditions. For example, a PSTN gateway that runs out of trunk lines but still has plenty of capacity to process SIP messages should reject incoming INVITEs using a 488 (Not Acceptable Here) response [RFC4412]. Similarly, a SIP registrar that has lost connectivity to its registration database but is still capable of processing SIP messages should reject REGISTER requests with a 500 (Server Error) response [RFC3261]. Overload control does not apply to these cases and SIP provides appropriate response codes for them. The SIP protocol provides a limited mechanism for overload control through its 503 (Service Unavailable) response code. However, this mechanism cannot prevent overload of a SIP server and it cannot prevent congestion collapse. In fact, the use of the 503 (Service Unavailable) response code may cause traffic to oscillate and to shift between SIP servers and thereby worsen an overload condition. A detailed discussion of the SIP overload problem, the problems with the 503 (Service Unavailable) response code and the requirements for a SIP overload control mechanism can be found in [I-D.rosenberg-sipping-overload-reqs]. This document discusses the models, assumptions and design considerations for a SIP overload control mechanism. The document is a product of the SIP overload control design team. Hilt (Ed.) Expires January 6, 2009 [Page 3] Internet-Draft Overload Control July 2008 2. Implicit vs. Explicit Overload Control Two fundamental approaches to overload control exist: implicit and explicit overload control. A key contributor to the SIP congestion collapse [I-D.rosenberg-sipping-overload-reqs] is the regenerative behavior of overload in the SIP protocol. Messages that get dropped by a SIP server due to overload are retransmitted and increase the offered load for the already overloaded server. This increase in load worsens the severity of the overload condition and, in turn, causes more messages to be dropped. The goal of an implicit overload control is therefore to change the fundamental mechanisms of the SIP protocol such that regenerative behavior of overload is avoided. In the ideal case, overload behavior of SIP would be fully non- regenerative, which would lead to a stable operation during overload. Even if a fully non-regenerative behavior for SIP is challenging to achieve, changes to the SIP retransmission timer mechanisms can help to reduce the degree of regeneration during overload. More work is needed to understand the impact of SIP retransmission timers on the regenerative overload behavior of SIP. For a SIP INVITE transaction to be successful a minimum of three messages need to be forwarded by a SIP server, often five or more. If a SIP server under overload randomly discards messages without evaluating them, the chances that all messages belonging to a transaction are passed on will decrease as the load increases. Thus, the number of successful transactions will decrease even if the message throughput of a server remains up and the overload behavior is fully non-regenerative. A SIP server might (partially) parse incoming messages to determine if it is a new request or a message belonging to an existing transaction. However, after having spend resources on parsing a SIP message, discarding this message becomes expensive as the resources already spend are lost. The number of successful transactions will therefore decline with an increase in load as less and less resources can be spent on forwarding messages. The slope of the decline depends on the amount of resources spent to evaluate each message. The main idea of a explicit overload control is to use an explicit overload signal to request a reduction in the offered load. This enables a SIP server to adjust the offered load to a level at which it can perform at maximum capacity. Reducing the extent to which SIP server overload is regenerative and an efficient explicit overload control mechanism to control incoming load are two complementary approaches to improve SIP performance under overload. Hilt (Ed.) Expires January 6, 2009 [Page 4] Internet-Draft Overload Control July 2008 3. System Model The model shown in Figure 1 identifies fundamental components of an explicit SIP overload control mechanism: SIP Processor: The SIP Processor processes SIP messages and is the component that is protected by overload control. Monitor: The Monitor measures the current load of the SIP processor on the receiving entity. It implements the mechanisms needed to determine the current usage of resources relevant for the SIP processor and reports load samples (S) to the Control Function. Control Function: The Control Function implements the overload control algorithm. The control function uses the load samples (S) and determines if overload has occurred and a throttle (T) needs to be set to adjust the load sent to the SIP processor on the receiving entity. The control function on the receiving entity sends load feedback (F) to the sending entity. Actuator: The Actuator implements the algorithms needed to act on the throttles (T) and to adjust the amount of traffic forwarded to the receiving entity. For example, a throttle may instruct the Actuator to reduce the traffic destined to the receiving entity by 10%. The algorithms in the Actuator then determine how the traffic reduction is achieved, e.g., by selecting the messages that will be affected and determining whether they are rejected or redirected. The type of feedback (F) conveyed from the receiving to the sending entity depends on the overload control method used (i.e., loss-based, rate-based or window-based overload control; see Section 6), the overload control algorithm (see Section 7) as well as other design parameters. In any case, the feedback (F) enables the sending entity to adjust the amount of traffic forwarded to the receiving entity to a level that is acceptable to the receiving entity without causing overload. Hilt (Ed.) Expires January 6, 2009 [Page 5] Internet-Draft Overload Control July 2008 Sending Receiving Entity Entity +----------------+ +----------------+ | Server A | | Server B | | +----------+ | | +----------+ | -+ | | Control | | F | | Control | | | | | Function |<-+------+--| Function | | | | +----------+ | | +----------+ | | | T | | | ^ | | Overload | v | | | S | | Control | +----------+ | | +----------+ | | | | Actuator | | | | Monitor | | | | +----------+ | | +----------+ | | | | | | ^ | -+ | v | | | | -+ | +----------+ | | +----------+ | | <-+--| SIP | | | | SIP | | | SIP --+->|Processor |--+------+->|Processor |--+-> | System | +----------+ | | +----------+ | | +----------------+ +----------------+ -+ Figure 1: System Model for Overload Control 4. Degree of Cooperation A SIP request is often processed by more than one SIP server on its path to the destination. Thus, a design choice for overload control is where to place the components of overload control along the path of a request and, in particular, where to place the Monitor and Actuator. This design choice determines the degree of cooperation between the SIP servers on the path. Overload control can be implemented hop-by-hop with the Monitor on one server and the Actuator on its direct upstream neighbor. Overload control can be implemented end-to-end with Monitors on all SIP servers along the path of a request and one Actuator on the sender. In this case, Monitors have to cooperate to jointly determine the current resource usage on this path. Finally, overload control can be implemented locally on a SIP server if Monitor and Actuator reside on the same server. In this case, the sending entity and receiving entity are the same SIP server and Actuator and Monitor operate on the same SIP processor (although, the Actuator typically operates on a pre- processing stage in local overload control). These three configurations are shown in Figure 2. Hilt (Ed.) Expires January 6, 2009 [Page 6] Internet-Draft Overload Control July 2008 +-+ +---------+ v | +------+ | | +-+ +-+ +---+ | | | +---+ v | v | //=>| C | v | v //=>| C | +---+ +---+ // +---+ +---+ +---+ // +---+ | A |===>| B | | A |===>| B | +---+ +---+ \\ +---+ +---+ +---+ \\ +---+ \\=>| D | ^ \\=>| D | +---+ | +---+ ^ | | | +-+ +---------+ (a) local (b) hop-by-hop +------(+)---------+ | ^ | | | +---+ v | //=>| C | +---+ +---+ // +---+ | A |===>| B | +---+ +---+ \\ +---+ ^ | \\=>| D | | | +---+ | v | +------(+)---------+ (c) end-to-end ==> SIP request flow <-- Overload feedback loop Figure 2: Degree of Cooperation between Servers 4.1. Hop-by-Hop The idea of hop-by-hop overload control is to instantiate a separate control loop between all neighboring SIP servers that directly exchange traffic. I.e., the Actuator is located on the SIP server that is the direct upstream neighbor of the SIP server that has the corresponding Monitor. Each control loop between two servers is completely independent of the control loop between other servers further up- or downstream. In the example in Figure 2(b), three independent overload control loops are instantiated: A - B, B - C and B - D. Each loop only controls a single hop. Overload feedback received from a downstream neighbor is not forwarded further upstream. Instead, a SIP server acts on this feedback, for example, by re-routing or rejecting traffic if needed. If the upstream Hilt (Ed.) Expires January 6, 2009 [Page 7] Internet-Draft Overload Control July 2008 neighbor of a server also becomes overloaded, it will report this problem to its upstream neighbors, which again take action based on the reported feedback. Thus, in hop-by-hop overload control, overload is always resolved by the direct upstream neighbors of the overloaded server without the need to involve entities that are located multiple SIP hops away. Hop-by-hop overload control reduces the impact of overload on a SIP network and, in particular, can avoid congestion collapse. In addition, hop-by-hop overload control is simple and scales well to networks with many SIP entities. It does not require a SIP entity to aggregate a large number of overload status values or keep track of the overload status of SIP servers it is not communicating with. 4.2. End-to-End End-to-end overload control implements an overload control loop along the entire path of a SIP request, from UAC to UAS. An end-to-end overload control mechanism consolidates overload information from all SIP servers on the way including all proxies and the UAS and uses this information to throttle traffic as far upstream as possible. An end-to-end overload control mechanism has to be able to frequently collect the overload status of all servers on the potential path(s) to a destination and combine this data into meaningful overload feedback. A UA or SIP server only needs to throttle requests if it knows that these requests will eventually be forwarded to an overloaded server. For example, if D is overloaded in Figure 2(c), A should only throttle requests it forwards to B when it knows that they will be forwarded to D. It should not throttle requests that will eventually be forwarded to C, since server C is not overloaded. In many cases, it is difficult for A to determine which requests will be routed to C and D since this depends on the local routing decision made by B. The main problem of end-to-end path overload control is its inherent complexity since UAC or SIP servers need to monitor all potential paths to a destination in order to determine which requests should be throttled and which requests may be sent. In addition, the routing decisions of a SIP server depend on local policy, which can be difficult to infer for an upstream neighbor. Therefore, end-to-end overload control is likely to only work well in simple, well-known topologies (e.g., a server that is known to only have one downstream neighbor) or if a UA/server sends many requests to the exact same destination. Hilt (Ed.) Expires January 6, 2009 [Page 8] Internet-Draft Overload Control July 2008 4.3. Local Overload Control Local overload control does not require an explicit overload signal between SIP entities as it is implemented locally on a SIP server. It can be by a SIP server to determine when to reject incoming requests instead of forwarding them based on current resource usage. Local overload control can be used in conjunction with an explicit overload control mechanisms and provides an additional layer of protection against overload, for example, when upstream servers do not support explicit overload control. In general, servers should use an explicit mechanisms if available to throttle upstream neighbors before using local overload control as a mechanism of last resort. 5. Topologies The following topologies describe four generic SIP server configurations, which each poses specific challenges for an overload control mechanism. In the "load balancer" configuration shown in Figure 3(a) a set of SIP servers (D, E and F) receives traffic from a single source A. A load balancer is a typical example for such a configuration. In this configuration, overload control needs to prevent server A (i.e., the load balancer) from sending too much traffic to any of its downstream neighbors D, E and F. If one of the downstream neighbors becomes overloaded, A can direct traffic to the servers that still have capacity. If one of the servers serves as a backup, it can be activated once one of the primary servers reaches overload. If A can reliably determine that D, E and F are its only downstream neighbors and all of them are in overload, it may choose to report overload upstream on behalf of D, E and F. However, if the set of downstream neighbors is not fixed or only some of them are in overload then A should not use overload control since A can still forward the requests destined to non-overloaded downstream neighbors. These requests would be throttled as well if A would use overload control towards its upstream neighbors. In the "multiple sources" configuration shown in Figure 3(b), a SIP server D receives traffic from multiple upstream sources A, B and C. Each of these sources can contribute a different amount of traffic, which can vary over time. The set of active upstream neighbors of D can change as servers may become inactive and previously inactive servers may start contributing traffic to D. If D becomes overloaded, it needs to generate feedback to reduce the Hilt (Ed.) Expires January 6, 2009 [Page 9] Internet-Draft Overload Control July 2008 amount of traffic it receives from its upstream neighbors. D needs to decide by how much each upstream neighbor should reduce traffic. This decision can require the consideration of the amount of traffic sent by each upstream neighbor and it may need to be re-adjusted as the traffic contributed by each upstream neighbor varies over time. In many configurations, SIP servers form a "mesh" as shown in Figure 3(c). Here, multiple upstream servers A, B and C forward traffic to multiple alternative servers D and E. This configuration is a combination of the "load balancer" and "multiple sources" scenario. +---+ +---+ /->| D | | A |-\ / +---+ +---+ \ / \ +---+ +---+-/ +---+ +---+ \->| | | A |------>| E | | B |------>| D | +---+-\ +---+ +---+ /->| | \ / +---+ \ +---+ +---+ / \->| F | | C |-/ +---+ +---+ (a) load balancer (b) multiple sources +---+ | A |---\ a--\ +---+-\ \---->+---+ \ \/----->| D | b--\ \--->+---+ +---+--/\ /-->+---+ \---->| | | B | \/ c-------->| D | +---+---\/\--->+---+ | | /\---->| E | ... /--->+---+ +---+--/ /-->+---+ / | C |-----/ z--/ +---+ (c) mesh (d) edge proxy Figure 3: Topologies Overload control that is based on reducing the number of messages a sender is allowed to send is not suited for servers that receive requests from a very large population of senders, each of which only infrequently sends a request. This scenario is shown in Figure 3(d). Hilt (Ed.) Expires January 6, 2009 [Page 10] Internet-Draft Overload Control July 2008 An edge proxy that is connected to many UAs is a typical example for such a configuration. Since each UA typically only contributes a few requests, which are often related to the same call, it can't decrease its message rate to resolve the overload. In such a configuration, a SIP server can resort to local overload control by rejecting a percentage of the requests it receives with 503 (Service Unavailable) responses. Since there are many upstream neighbors that contribute to the overall load, sending 503 (Service Unavailable) to a fraction of them can gradually reduce load without entirely stopping all incoming traffic. Using 503 (Service Unavailable) towards individual sources can, however, not prevent overload if a large number of users places calls at the same time. Note: The requirements of the "edge proxy" topology are different than the ones of the other topologies, which may require a different method for overload control. 6. Type of Overload Control Feedback The type of feedback generated by a receiver to limit the amount of traffic it receives is an important aspect of the design. We discuss the following three different types of overload control feedback: rate-based, loss-based and window-based overload control. 6.1. Rate-based Overload Control The key idea of rate-based overload control is to limit the request rate at which an upstream element is allowed to forward to the downstream neighbor. If overload occurs, a SIP server instructs each upstream neighbor to send at most X requests per second. Each upstream neighbor can be assigned a different rate cap. The rate cap ensures that the number of requests received by a SIP server never increases beyond the sum of all rate caps granted to upstream neighbors. It can protect a SIP server against overload even during load spikes if no new upstream neighbors start sending traffic. New upstream neighbors need to be factored into the rate caps assigned as soon as they appear. The current overall rate cap used by a SIP server is determined by an overload control algorithm, e.g., based on system load. An algorithm for the sending entity to implement a rate cap of a given number of requests per second X is request gapping. After transmitting a request to a downstream neighbor, a server waits for 1/X seconds before it transmits the next request to the same Hilt (Ed.) Expires January 6, 2009 [Page 11] Internet-Draft Overload Control July 2008 neighbor. Requests that arrive during the waiting period are not forwarded and are either redirected, rejected or buffered. A drawback of this mechanism is that it requires a SIP server to assign a certain rate cap to each of its upstream neighbors during an overload condition based on its overall capacity. Effectively, a server assigns a share of its capacity to each upstream neighbor during overload. The server needs to ensure that the sum of all rate caps assigned to upstream neighbors is not (significantly) higher than its actual processing capacity. This requires a SIP server to keep track of the set of upstream neighbors and to adjust the rate cap if a new upstream neighbor appears or an existing neighbor stops transmitting. If the cap assigned to upstream neighbors is too high, the server may still experience overload. However, if the cap is too low, the upstream neighbors will reject requests even though they could be processed by the server. A SIP server can evaluate the amount of load it receives from each upstream neighbor and assign a rate cap that is suitable for this neighbor without limiting it too much. This way, the SIP server can allocate resources that are not used by one upstream neighbor because it is sending less requests than allowed by the rate cap to another server. An alternative technique to allocate a rate cap to each upstream neighbor is using a fixed proportion of some control variable, X, where X is initially equal to the capacity of the SIP server. The server then increases or decreases X until the workload arrival rate matches the actual server capacity. Usually, this will mean that the sum of the rate caps sent out by the server (=X) exceeds its actual capacity, but enables upstream neighbors who are not generating more than their fair share of the work to be effectively unrestricted. An advantage of this approach is that the server only has to measure the aggregate arrival rate, and that the calculation of the individual rate caps is fairly trivial. 6.2. Loss-based Overload Control A loss percentage enables a SIP server to ask an upstream neighbor to reduce the number of requests it would normally forward to this server by a percentage X. For example, a SIP server can ask an upstream neighbor to reduce the number of requests this neighbor would normally send by 10%. The upstream neighbor then redirects or rejects X percent of the traffic that is destined for this server. An algorithm for the sending entity to implement a loss percentage is to draw a random number between 1 and 100 for each request to be forwarded. The request is not forwarded to the server if the random Hilt (Ed.) Expires January 6, 2009 [Page 12] Internet-Draft Overload Control July 2008 number is less than or equal to X. An advantage of loss-based overload control is that, the receiving entity does not need to track the set of upstream neighbors or the request rate it receives from each upstream neighbor. It is sufficient to monitor the overall system utilization. To reduce load, a server can ask its upstream neighbors to lower the traffic forwarded by a certain percentage. The server calculates this percentage by combining the loss percentage that is currently in use (i.e., the loss percentage the upstream neighbors are currently using when forwarding traffic), the current system utilization and the desired system utilization. For example, if the server load approaches 90% and the current loss percentage is set to a 50% traffic reduction, then the server can decide to increase the loss percentage to 55% in order to get to a system utilization of 80%. Similarly, the server can lower the loss percentage if permitted by the system utilization. The main drawback of percentage throttling is that the throttle percentage needs to be adjusted to the current number of requests received by the server. This is in particular important if the number of requests received fluctuates quickly. For example, if a SIP server sets a throttle value of 10% at time t1 and the number of requests increases by 20% between time t1 and t2 (t1