This is a summary of my lectures on TCP and not meant to be exhaustive in the coverage. The key purpose of the transport layer (TL) is to provide end-to-end delivery. There are two well-known TL protocol used today. They are Transmission Control Protocol (TCP) and User Datagram Protocol (UDP). UDP is designed to provide simple no-frill delivery service while TCP for end-to-end reliable data transfer.
Note that both protocols assume an underlying best-effort network. A best-effort network attempts to deliver packets with whatever resources under its control. However, it may not have sufficient resources from time to time. As it turns out, some packets at a node will be lost when its output buffer overflows when too much traffic arrives at its link. While best-effort network is simple and scalable, we are not sure whether a particle packet will arrive at its destination.
While this is acceptable for video streaming as small amount of packet loss is tolerable or information stream with sufficient redundancy, best effort service is not acceptable for transmission of information that requires high reliability. This requirement led to the development of TCP, a protocol for end-to-end reliabile data transfer. TCP and UDP are two protocols developed for reliable data transfer and no-frill (unreliable) data transfer respectively.
Just like every good thing comes at a price, TCP is more complex than UDP. TCP is a connection oriented while UDP is a non-connection oriented. TCP is a stateful and UDP is stateless. Obviously, UDP is more efficient.
Transport layer communications is based on sockets. A socket is a unique identifier that maps to a process. Because a host usually have more than one process running, a few sockets can be active and each of these sockets must be unique so that the dsignated process is activated for the incoming information.
TCP uses a 4-tuple socket and UDP a 2-tuple socket. For TCP, the socket consists of the combination of source IP address, source port, destination IP address and destination port and UDP a combination of destination address and destination port. In other words, a socket is a unique identifier for the communications between two processes in two host computer. If and only if each socket identifier is unique to a process, it is not possible to identify the process to activate.
A comprehensive review of TCP follows. A best-effort network does not guarantee its delivery service provided to upper layer. As a packet traverses the network, it may encounter many events such as node failure, link failure, insufficient transmission capacity, buffer full or the packet becomes corrupted. For a corrupted packet, which field is corrupted us an important question. Does the corruption occur in the address field or in the payload? Address corruption will lead to packet lost and payload corruption allows the corrupted packet to reach its destination.
What is sent is what is received in a Reliable Data Transfer (RDT) model. In this case, there is no error due to packet lost or corruption in the communications channel. The channel is error free. The RDT model assumes an ideal channel! TCP is supposed to emulate a RDT model even though the underlying network provides only best effort service.
The first TCP feature we consider is packet acknowledgement. Positive (ACK) and negative (NACK) acknowledgements can be used but TCP uses only ACKs with sequence no. By means of the sequence no, the sender is notified whether a packet is correctly received. A duplicated ACK in TCP implies that the packet following this packet is assumed lost. In order to be sure that 2 packets after the lost packet reaches the receiver, TCP retransmits a lost packet when two duplicated ACKs are received.
To keep the best-effort network simple, functionalities of the network are kept to routing a packet from the sender to the receiver. It does not deal with congestion except for discarding packets when its output buffer is full, making the best-effort network highly scalable.
The Finite State Machine (FSM) is a tool in which all possible states of a protocol may be modelled to eliminate unknown states that may lead to communications failures. FSM allows an analyst to model the operation of a communications protocol clearly. In so doing, any undefined state may be identified and hence software bugs are grossly reduced by means of this tool.
There are four different events due to transmission errors that a sender or receiver may encounter. The foure events are packet corruption, packet loss, acknowledgement loss, duplicated packet and duplicated acknowledgement. To provide end-to-end reliability, the Transport Layer must take specific actions. For packet corruption, ACK with packet sequence no is normally used. This feedback confirms which packets are received correctly. When a packet is lost, the lack of acknowledgement over a reasonable period of time greater than the round-trip-time (RTT) allows the sender to deduce that the packet is lost. RTT is the time between the transmission of a packet to the time when an ACK is received. The packet would have traversed many nodes and links in this return trip. Assuming that the return path is uncongested, the RTT estimates of how congested the forward channel is. The duration that the sender waits before taking further action is called the “time-out” duration. Therefore, a lost packet is recovered by packet retransmission when the timeout event occurs. Sometimes, a missing ACK could be due to the loss an ACK in the return path. The sender will not be able to differentiate between a lost packet or a lost ACK. In this case, the sender recovers the loss by retransmitting the packet in question.
If the actual event is a lost packet, no duplicated packet will arrive at the receiver. The receiver will reponse with an ACK to the sender and normal operation continues. If the actual event is due to a lost ACK, then a duplicated packet will arrive at the receiver. The receiver must be able to response to duplicated packet appropriately. For a duplicated packet, the receiver will discard the packet but send a duplicated ACK. A duplicated packet received implies that the previous ACK is lost.
In summary, the transport layer uses feedback to sender in the form of ACK with packet no or the lack of feedback for reliable data transfer. Lack of feedback indicates that a packet is probably lost that will activate sender to retransmit the packet involved. This event is triggered by a timeout counter. However, such retransmission may be due to an overly delayed ACK and result in duplicated packet at the receiver. Selection an appropriate value for timeout ensures that only a small number of duplicated packets.
Both selective repeat and Go-back-N ARQ may be used for packet recovery. Selective repeat ARQ uses a timer for each packet and individualised acknowledgement allows retranmission of missing packets only. Selective repeat limits the number of packet retransmissions in an error prone channel and improves overall transmission efficiency. Go-back-N ARQ uses cummulation acknowledgement. An ACK acknowledges the packet and all preceding packets. A timeout event will trigger retransmission of all unacknowledgement packets. This will introduce many duplicated packets in a connection with a large window size that lowers connection efficiency.
TCP has to deal with a real channel. Packets can therefore be corrupted, lost or duplicated. Similarly, these events applies to ACKs. Therefore, the finite state machine becomes a useful tool to model all possible states that a connection may enter and how the connection moves from one state to another. One key advantage of FSM is a deadlock state may be uncovered.
TCP use ACKs with sequence numbers that specify the bytes rather than packets received. Hence the numbering in a packet sent is in terms of number of the first byte in the packet transmitted or the expected first byte number of the next packet in an ACK. TCP uses four bytes for sequence number. Because a duplicated packet may buffered in the network, sequence numbers cannot be reused unless they exceed their expected livetime in the network. Otherwise, the receiver will not be able to differentiate them. In TCP, the time limit is 3 minutes after the termination of a connection.
Sequence numbers
Because a receiver does not know what was sent by its sender, a set of sequence numbers for a connection has to be chosen carefully. First the sequence number has to be at least twice the window size used. Otherwise, a series of packet loss can lead to an indeterminate condition. The receiver cannot ascertain whether a packet received is a retransmitted packet or a new packet. Another indeterminate scenario can be due to recently terminated connection. Because the network may buffer packets that appear sometime later, reuse of a set of sequence numbers is allowed after a lapse of 3 minutes.
Transmission Control Protocol (TCP) was designed during the 1980s era when link bandwidth was small. This has interesting implications on the way how parameters of TCP was set. Because the underlying best-effort network does not response to congestion, sender TCP was designed response to congestion via rate control. TCP transmission rate depends on two parameters: congestion window (CW) size and RTT. CW size limits the number of packets that a flow may transmit without receiving an ACK. A large window size implies that many packets can be transmit continuously meaning that an avalanche of packets could be released. Such burstiness can overwhelmed the buffers in routers leading to many discarded packets. Recovery of these packets could be costly. On the contrary, using a small window size implies that the transmission rate will be small. Therefore choosing an appropriate CW size for effiicient operation.
Tahoe and Reno
The early versions of TCP are TCP Tahoe and TCP Reno. TCP Tahoe and Reno use feedback in terms of ACK packets for rate adjustment. For a connection, the ACK to a packet sent depends largely on the RTT of the connection. RTT of a connection will increase with congestion. As the path of the connection gets more congested, its RTT will increase and ACK feedback will take a much longer time to arrive. Under this scenario, we have to adapt the CW so that all connections sharing a link share the bandwidth fairly.
Because an ACK will be sent for every packet received after an RTT duration, TCP rate can be approximated by CW/RTT for a connection. Therefore, TCP rate can be increase by CW increase or RTT decrease.
TCP is a greedy protocol and attempts to use whatever bandwidth available. This is accomplished by a so-called “slow-start” phase when the CW begins with a value of 1 and doubles every RTT till a congestion in terms of a lost packet is detected. Tahoe is the first version of TCP and it enters slow-start when a packet is lost. How does TCP detect a lost packet? There are basically two ways that a TCP connection detects a lost packet. Because a TCP receiver response with an ACK with the next expected packet for every packet received, a lost packet in a stream of packets can trigger a stream of duplicated ACKs. Upon receipt of three duplicated ACKs, TCP assumes that the expected packet is lost. Note that this scenario is very different from the sceneario of a timeout event. When three other packets, following the lost packet, were successful in reaching the receiver, the congestion is not so serious as compared to a timeout event. TCP Tahoe does not differentiate between these two events. The TCP Tahoe will enter slow-start phase when it detects a lost packet.
In contrast, TCP Reno treats the two events separately. Because timeout is a serious congestion event, Reno connection will enter the slow-start phase. In the event of triple duplicated ACKs, Reno uses a so-called Fast-Retranmission (FR) procedure. In FR, the congestion window is halved and the connection proceeds with its recovery and transmission.
Slow-start threshold is another important control parameter in a TCP connection. This threshold marks the beginning of a congestion avoidance phase. Note that the robustness of TCP is due to its congestion control and avoidance actions. Detecting a packet loss, TCP will enter congestion control phase by reducing its current window to half its value. The SS threshold is also set to this value. At the beginning of a connection, the SS threshold is normally set to a high value above the CW size at which a packet lost is detected. At a congestion control event, the SS threshold is set to CW/2. When the CW of a connection reaches its SS threshold, it enters a congestion avoidance phase by increasing its CW by 1 for every RTT, allowing the connection to increase its rate very slowly, the property of the congestion avoidance phase. In other words, both TCP Tahoe or Reno will enter congestion avoidnce phase if CW is equal to SS threshold.
More details on TCP Reno will be provided as it is the most popular implementation. It is obvious that Reno is more efficient than TCP Tahoe because it uses FR whenever a lost packet is detected via three duplicated ACKs.
Suppose a TCP Reno connection is set up in a network. After connection establishment, TCP Reno will start transmission with slow-start operations. The window size starts with 1 and doubles every RTT. Very soon, the CW will exceed the path bandwidth as CW/RTT rises rapidly as CW doubles every RTT. Congestion starts and output buffers overflow followed by a timeout event at the sender. The sender will set its SS threshold to CW/2 and repeat slow-start. The CW will reach the SS threshold value and enters congestion avoidance phase in which the CW will increase by 1 for every RTT. This slow linear increase will still cause congestion but not as serious as that due to slow-start. This is the phase where triple duplicated ACKs would be received and the CW and SS threshold are then both set to current CW/2 and fast retransmit is executed. The connection will then alternate between congestion control and congestion avoidance till a timeout event caused by an avalanche of packets from other connections and the above cycle is repeated.
Finite State Machine (FSM)
FSM is a useful repesentation of different states that a process may enter and therefore a uesful tool for protocol design. Because a connection will enter a specific state according to the prevailing conditions, a state diagram helps to eliminate unknown states. The general conventions for FSM are: a dotted arrow points to the initial state; an arrow will show the possible transition due to an activation, shown above a horizontal line, an action if any below it. Sometimes, the arrow is looped back to the original state meaning that the activation does not warrant a transition to another state. Therefore, it is possible to trace all the states of a protocol by means of this FSM tool.
An analyst will start with the initial state and sketch all other possibe states the sender or receiver will enter during the operations of their communications protocol. For example, for a receiver FSM diagram, waiting for the first packet is its initial state. There are many possible conditions at thearrival of a packet. Is it corrupted, non-corrupted, the expected packet, an out-of-sequence packet and within range of sequence number used? Do we need additional states in the FSM diagram? The activation and action along with the transition arrow are then added according to the condition of a packet received. All possible activations must be consider otherwise, your protocol will fail if it enters an unknown state.