This issue occurs because the packet size of all successive IPsec traffic is larger than the original link maximum transmission unit (MTU). When this issue occurs, the operating systems have to complete the Transmission Control Protocol (TCP) handshake in clear text before the IPsec main mode and the quick mode security associations (SAs) are established.
TCP queries the maximum segment size (MSS) to use for each TCP segment and subtracts the IPsec overhead when the TCP handshake occurs or when changes occur later in the network. Then, the main mode or the quick mode process is complete (it takes 63 seconds to complete, according to the RFC). And, the encapsulation mode of the main mode or quick mode SAs that are established at first for the TCP session are updated to ESP encapsulation mode. Unfortunately, the network does not consider the mid-stream traffic that is updated from clear text to ESP encapsulation to be changes, and the current MSS is not updated. Therefore, all successive mid-stream traffic that is IPsec-secured cannot travel in the network layer.
Note When two computers start a TCP session for some unicast traffic, the initial TCP handshake may complete in clear text before the necessary main mode and quick mode negotiations are complete. This behavior occurs if one or both of the computers use a boundary policy such as the IPsec request inbound mode. Both main mode and quick mode SAs must be established before a TCP session can be secured by using the appropriate encapsulation method.