Archive for the ‘TCP’ Category

Max-Min Fairness

Sunday, October 25th, 2009

What is max-min fairness? It is a fairness scheme in which capacity is always allocated to the lowest rate flow(s) before allocating remaining capacity to the rest sharing an output link. But, what do we meant by lowest rate flow? Flow with the lowest rate? How do we determine that? Let’s consider a case in which ten flows show an output link and every flow has much information to send. Then it become obvious that a fair scheme is to assign each flow one tenth of the output bandwidth. This is a particular case of max-min fairness. On the other hand, suppose one of the flows has little information to send and chooses to transmit at one twentieth of the bandwidth. Then this flow is allocated its bandwidth required and the excess will be shared by the other nine more aggressive flows. In other words, the rest of the flows will receive one ninth of remaining bandwidth according to definitions of max-min fairness.

How do we compute the max-min fairness bandwidth of a flow if it passes through a few shared links?  Recall that max-min fairness implies that bandwidth is always allocated to the flow with lowest rate first. Therefore, the short-cut for computing bandwidth allocation is always to compute the flow(s) with lowest rate first. Suppose two flows share a link rate R1 before these two flows share another link of rate R2 with another 5 flows. The rate allocations at link 1 and link 2 independently are R1/2 and R2/7 respectively.  Which one is smaller? Allocate the smaller rate first.  If  R2/7 < R1/2, then all flows get R2/7 each. If R2/7 > R2/2, then allocate R/2 to the two flows first and capacity left at link 2 to the other 5 flows, that is, (R2 – R1)/5 each.  Have fun.

TCP Fairness

Sunday, October 25th, 2009

What is TCP fairness? Is there real fairness in TCP?  Like many good anwers, the answer is yes and no!  Note that a TCP Reno sender adjust its rate based on the AIMD algorithm.  To understand TCP fairness, consider two flow sharing an output link. In steady-state periods, the window of a flow, operating in congestion avoidance phase, increases linearly by 1 every RTT. When packet loss is detected, the flow exercises congestion control by halving its window size and then continue with congestion avoidance. Note that the SS threshold is also equal to half the window size when packet loss is detected. Let’s consider the case in which both flows have the same RTT. Then they operate in a similar manner. But, the two flows can be out-of-phase with flow 1 having a larger window than flow 2.  Will there be fairness? If the output link has capacity R, then the fair rate for each flow is R/2. Because transmission rate can be approximated by CW/RTT, we need both congestion window to be approximately equal for fair queueing.  Let CW1 and CW2 be the window sizes of flow 1 and 2 respectively and CW1 > Cw2 when a loss occur. Note that CW1/2 > CW2/2 meaning that flow 1 decreses its window size faster than flow 2. Then both flow increase then window size by 1 per RTT.  CW1 and CW2 will increase until packet losses are detected and CW1 and Cw2 are halved again. With this cycle repeating, Cw1 and CW2 tend towards CW1 = CW2 meaning that both flows share the bandwidth fairly.

What happen if RTTs are different?  TCP is a self clocking protocol that utlizes ACKs received in rate adjustment. Let’s consider flow 1 and flow 2 with RTT1 and RTT2, where RTT1 > RTT2. If packet loss is detected in a flow, the CW is halved followed by congestion avoidance. Notably, ACKs for flow 1 will take longer to reach its sender than flow 2. Therefore, flow 2 will icrease its window faster than flow 1. Let RTT1 = 2xRTT2. Then flow 1’s rate increase is appromately half that of flow 2. Because of TCP’s self clocking design,  flow 2 will get higher rate than flow 1. When RTT1 >> RTT2, then  flow 1’s rate  <<  flow 2’s rate, where unfairness arises.

TCP

Wednesday, October 21st, 2009

This is a summary of my lectures on TCP and not meant to be exhaustive in the coverage. The key purpose of the transport layer (TL) is to provide end-to-end delivery.  There are two well-known TL protocol used today. They are Transmission Control Protocol (TCP) and User Datagram Protocol (UDP).  UDP is designed to provide simple no-frill delivery service while TCP for end-to-end reliable data transfer.

Note that both protocols assume an underlying best-effort network. A best-effort network attempts to deliver packets with whatever resources under its control. However, it may not have sufficient resources from time to time. As it turns out, some packets at a node will be lost when its output buffer overflows when too much traffic arrives at its link. While best-effort network is simple and scalable, we are not sure whether a particle packet will arrive at its destination.

While this is acceptable for video streaming as small amount of packet loss is tolerable or  information stream with sufficient redundancy, best effort service is not acceptable for transmission of information that requires high reliability. This requirement led to the development of TCP, a protocol for end-to-end reliabile data transfer. TCP and UDP are two protocols developed for reliable data transfer and no-frill (unreliable) data transfer respectively.

Just like every good thing comes at a price, TCP is more complex than UDP. TCP is a connection oriented while UDP is a non-connection oriented. TCP is a stateful and UDP is stateless. Obviously, UDP is more efficient.

Transport layer communications is based on sockets. A socket is a unique identifier that maps to a process. Because a host usually have more than one process running, a few sockets can be active and each of these sockets must be unique  so that the dsignated process is activated for the incoming information.

TCP uses a 4-tuple socket and UDP a 2-tuple socket. For TCP, the socket consists of the combination of source IP address, source port, destination IP address and destination port and UDP a combination of destination address and destination port. In other words, a socket is a unique identifier for the communications between two processes in two host computer. If and only if each socket identifier is unique to a process, it is not possible to identify the process to activate.

A comprehensive review of TCP follows. A best-effort network does not guarantee its delivery service provided to upper layer. As a packet traverses the network, it may encounter many events such as node failure, link failure, insufficient transmission capacity, buffer full or the packet becomes corrupted. For a corrupted packet, which field is corrupted us an important question. Does the corruption occur in the address field or in the payload? Address corruption will lead to packet lost and payload corruption allows the corrupted packet to reach its destination.

What is sent is what is received in a Reliable Data Transfer (RDT) model. In this case, there is no error due to packet lost or corruption in the communications channel. The channel is error free.  The RDT model assumes an ideal channel!  TCP  is supposed to emulate a RDT model even though the underlying network provides only best effort service.

The  first TCP feature we consider is packet acknowledgement. Positive (ACK) and negative (NACK) acknowledgements can be used but TCP uses only ACKs with sequence no. By means of the sequence no, the sender is notified whether a packet is correctly received. A duplicated ACK in TCP implies that the packet following this packet is assumed lost. In order to be sure that 2 packets after the lost packet reaches the receiver, TCP retransmits a lost packet when two duplicated ACKs are received.

To keep the best-effort network simple, functionalities of the network are kept to routing a packet from the sender to the receiver. It does not deal with congestion except for discarding packets when its output buffer is full, making the best-effort network highly scalable.

The Finite State Machine (FSM) is a tool in which all possible states of a protocol may be modelled to eliminate unknown states that may lead to communications failures. FSM allows an analyst to model the operation of a communications protocol clearly. In so doing, any undefined state may be identified and hence software bugs are grossly reduced by means of this tool.

There are four different events due to transmission errors that a sender or receiver may encounter.  The foure events are packet corruption, packet loss, acknowledgement loss, duplicated packet and duplicated acknowledgement.  To provide end-to-end reliability, the Transport Layer must take specific actions. For packet corruption, ACK with packet sequence no is normally used. This feedback confirms which packets are received correctly.  When a packet is lost, the lack of acknowledgement over a reasonable period of time greater than the round-trip-time (RTT) allows the sender to deduce that the packet is lost. RTT is the time between the transmission of a packet to the time when an ACK is received. The packet would have traversed many nodes and links in this return trip. Assuming that the return path is uncongested, the RTT estimates of how congested the forward channel is.  The duration that the sender waits before taking further action is called the “time-out” duration. Therefore, a lost packet is recovered by packet retransmission when the timeout event occurs.  Sometimes, a missing ACK could be due to the loss an ACK in the return path. The sender will not be able to differentiate between a lost packet or a lost ACK. In this case, the sender recovers the loss by retransmitting the packet in question.

If the actual event is a lost packet, no duplicated packet will arrive at the receiver. The receiver will reponse with an ACK to the sender and normal operation continues. If the actual event is due to a lost ACK, then a duplicated packet will arrive at the receiver. The receiver must be able to response to duplicated packet appropriately. For a duplicated packet, the receiver will discard the packet but send a duplicated ACK. A duplicated packet received implies that the previous ACK is lost.

In summary, the transport layer uses feedback  to sender in the form of ACK with packet no or the lack of feedback for reliable data transfer. Lack of feedback indicates that a packet is probably lost that will activate sender to retransmit the packet involved. This event is triggered by a timeout counter. However, such retransmission may be due to an overly delayed ACK and result in duplicated packet at the receiver. Selection an appropriate value for timeout ensures that only a small number of duplicated packets.

Both selective repeat and Go-back-N ARQ may be used for packet recovery. Selective repeat ARQ uses a timer for each packet and individualised acknowledgement allows retranmission of missing packets only. Selective repeat limits the number of packet retransmissions in an error prone channel and improves overall transmission efficiency. Go-back-N ARQ uses cummulation acknowledgement. An ACK acknowledges the packet and all preceding packets. A timeout event will trigger retransmission of all unacknowledgement packets. This will introduce many duplicated packets in a connection with a large window size that lowers connection efficiency.

TCP has to deal with a real channel. Packets can therefore be corrupted, lost or duplicated. Similarly, these events applies to ACKs. Therefore, the finite state machine becomes a useful tool to model all possible states that a connection may enter and how the  connection moves from one state to another. One key advantage of FSM is a deadlock state may be uncovered.

TCP use ACKs with sequence numbers that specify the bytes rather than packets received. Hence the numbering in a packet sent is in terms of number of the first byte in the packet transmitted or the expected first byte number of the next packet in an ACK. TCP uses four bytes for sequence number.  Because a duplicated packet may buffered in the network,  sequence numbers cannot be reused unless they exceed their expected livetime in the network. Otherwise, the receiver will not be able to differentiate them. In TCP, the time limit is 3 minutes after the termination of a connection.

Sequence numbers

Because a receiver does not know what was sent by its sender, a set of sequence numbers for a connection has to be chosen carefully. First the sequence number has to be at least twice the window size used. Otherwise, a series of packet loss can lead to an indeterminate condition. The receiver cannot ascertain whether a packet received is a retransmitted packet or a new packet. Another indeterminate scenario can be due to recently terminated connection. Because the network may buffer packets that appear sometime later, reuse of a set of sequence numbers is allowed after a lapse of 3 minutes.

Transmission Control Protocol (TCP) was designed during the 1980s era when link bandwidth was small. This has interesting implications on the way how parameters of TCP was set. Because the underlying best-effort network does not response to congestion, sender TCP was designed response to congestion via rate control.   TCP transmission rate depends on two parameters: congestion window (CW) size and RTT. CW size limits the number of packets that a flow may transmit without receiving an ACK. A large window size implies that many packets can be transmit continuously meaning that an avalanche of packets could be released. Such burstiness can overwhelmed the buffers in routers leading to many discarded packets. Recovery of these packets could be costly. On the contrary, using a small window size implies that the transmission rate will be small. Therefore choosing an appropriate CW size for effiicient operation.

Tahoe and Reno

The early versions of TCP are TCP Tahoe and TCP Reno. TCP Tahoe and Reno use feedback in terms of ACK packets for rate adjustment. For a connection, the ACK to a packet sent depends largely on the RTT of the connection. RTT of a connection will increase with congestion. As the path of the connection gets more congested, its RTT will increase and ACK feedback  will take a much longer time to arrive.  Under this scenario, we have to adapt the CW so that all connections sharing a link share the bandwidth fairly.

Because an ACK will be sent for every packet received after an RTT duration, TCP rate can be approximated by CW/RTT for a connection. Therefore, TCP rate can be increase by CW increase or RTT decrease.

TCP is a greedy protocol and attempts to use whatever bandwidth available. This is accomplished by a so-called “slow-start” phase when the CW begins with a value of 1 and doubles every RTT till a congestion in terms of a lost packet is detected. Tahoe is the first version of TCP and it enters slow-start when a packet is lost. How does TCP detect a lost packet? There are basically two ways that a TCP connection detects a lost packet. Because a TCP receiver response with an ACK with the next expected packet for every packet received, a lost packet in a stream of packets can trigger a stream of duplicated ACKs. Upon receipt of three duplicated ACKs, TCP assumes that the expected packet is lost. Note that this scenario is very different from the sceneario of a timeout event. When three other packets, following the lost packet, were successful in reaching the receiver, the congestion is not so serious as compared to a timeout event.  TCP Tahoe does not differentiate between these two events. The TCP Tahoe will enter slow-start phase when it detects a lost packet.

In contrast, TCP Reno treats the two events separately. Because timeout is a serious congestion event, Reno connection will enter the slow-start phase. In the event of triple duplicated ACKs, Reno uses a so-called Fast-Retranmission (FR) procedure. In FR, the congestion window is halved and the connection proceeds with its recovery and transmission.

Slow-start threshold is another important control parameter in a TCP connection. This threshold marks the beginning of a congestion avoidance phase. Note that  the robustness of  TCP is due to its congestion control and avoidance actions. Detecting a packet loss, TCP will enter congestion control phase by reducing its current window to half its value. The SS threshold is also set to this value.  At the beginning of a connection, the SS threshold is normally set to a high value above the CW size at which a packet lost is detected. At a congestion control event, the SS threshold is set to CW/2. When the CW of a connection reaches its SS threshold, it enters a congestion avoidance phase by increasing its CW by 1 for every RTT, allowing the connection to increase  its rate very slowly, the property of the congestion avoidance phase. In other words, both TCP Tahoe or Reno will enter congestion avoidnce phase if CW is equal to SS threshold.

More details on TCP Reno will be provided as it is the most popular implementation. It is obvious that Reno is more efficient than TCP Tahoe because it uses FR whenever a lost packet is detected via three duplicated ACKs.

Suppose a TCP Reno connection is set up in a network. After connection establishment, TCP Reno will start transmission with slow-start operations. The window size starts with 1 and doubles every RTT. Very soon, the CW will exceed the path bandwidth as CW/RTT rises rapidly as CW doubles every RTT. Congestion starts and output buffers overflow followed by a timeout event at the sender. The sender will set its SS threshold to CW/2 and repeat slow-start. The CW will reach the SS threshold value and enters congestion avoidance phase in which the CW will increase by 1 for every RTT.  This slow linear increase will still cause congestion but not as serious as that due to slow-start. This is the phase where triple duplicated ACKs would be received and the CW and SS threshold are then both set to current CW/2 and fast retransmit is executed. The connection will then alternate between congestion control and congestion avoidance till a timeout event caused by an avalanche of packets from other connections and the above cycle is repeated.

Finite State Machine (FSM)

FSM is a useful repesentation of different states that a process may enter and therefore a uesful tool for protocol design. Because a connection will enter a specific state  according to the prevailing conditions, a state diagram helps to eliminate unknown states. The general conventions for FSM are: a dotted arrow points to the initial state; an arrow will show the possible transition due to an activation, shown above a horizontal line,  an action if any below it. Sometimes, the arrow is looped back to the original state meaning that the activation does not warrant a transition to another state. Therefore, it is possible to trace all the states of a protocol by means of this FSM tool.

An analyst will start with the initial state and sketch all other possibe states the sender or receiver will enter during the operations of their communications protocol.  For example, for a receiver FSM diagram, waiting for the first packet is its initial state. There are many possible conditions at thearrival of a packet. Is it corrupted, non-corrupted, the expected packet, an out-of-sequence packet and within range of sequence number used?  Do we need additional states in the FSM diagram?  The activation and action along with the transition arrow are then added according to the condition of a packet received. All possible activations must be consider otherwise, your protocol will fail if it enters an unknown state.