Saturday, 18 July 2015

TCP ACK generation rate


A simple but also very impotent question of TCP, how often the TCP ACK should be generated?

According the RFC, the ACK SHOULD be generated for at least every second full-sized segment (to be correct this comes from RFC5681 which obsoletes RFC1122, because in old version come with confusion, saying in one place MUST in other SHOULD). Sound simple, but if look to Wireshark (after making iperf test) we would see that the ACK generation rate is much lower than TCP DATA (segment) rate.


First  of all i have to note that this is not a problems of system or TCP implementations because according the RFC it's allow to send TCP ACK less often (the ACK SHOULD be generated...). And in kernel code (i am only speaking about Linux Kernel) it is defined to send one ACK after two full size TCP data segments. 
Not going deep in exploitation t the TCP ACK rate difference is caused be two conditions :



1. TCP offloading to the NIC  - almost all modern NIC allow to offload some basic functions to NIC card (like TCP segmentation, IP CRC check and etc).
2. Due to high TCP receiving rate  - in same cases if the receiving rate is overfill the TCP receiver buffer it reduce the TCP ACK generation rate.

TCP offloading to the NIC 

TCP offloading to the NIC is mainly used to reduce the CPU load, because many minor check and calculations are made on NIC interface. It not only reduce system load but also allows faster transmission (It has also a dark side in it :). To control NIC offloading we use ethtool  command in linux.

To see current setting we use:

ethtool -k eth1
Features for eth1:
rx-checksumming: on
tx-checksumming: on
    tx-checksum-ipv4: off [fixed]
    tx-checksum-ip-generic: on
    tx-checksum-ipv6: off [fixed]
    tx-checksum-fcoe-crc: off [fixed]
    tx-checksum-sctp: off [fixed]
scatter-gather: on
    tx-scatter-gather: on
    tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
    tx-tcp-segmentation: on
    tx-tcp-ecn-segmentation: off [fixed]
    tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
......

......

In our case we would like to turn off the following ofloading feaures:
tcp-segmentation-offload - tso
generic-segmentation-offload - gso
generic-receive-offload - gro
 
To change current NIC offloding setting we use:

ethtool -K eth1 gro off tso off  gso off 

(we should make on both system - sender and receiver)

We also turned off both the receiving offloading because with some NIC's  will see not aggregated TCP packets.
The results of the TCP ACK and TCP Data with segmentation offloading turned can be seen in picture beloow:




Now it's more like to say that TCP ACK generation rate is more likely according to RFC. 
But be turning off the TCP ACK offloading we will increase to system CPU and  reduce the data performance of the system.

In testing purpose it is good to turn off the pause frame support (not allowing the receiver or network node to reduce sending rate due to heavy system load of the receiver)

ethtool  -A eth0 trx off tx off autoneg off

To check the status if the pause frame was turned off:

ethtool  -a eth0

TCP acknowledgment in Linux Kernel

To understand the TCP ACK generation in Linux kernel first we have to look to source code of TCP stack. The function responsible for ACK generation is call  __tcp_ack_snd_check, it check if ACK should be send now ( tcp_send_ack) or should be delayed (  tcp_send_delayed_ack ). To full code of function is showed bellow:


4817 static void __tcp_ack_snd_check(struct sock *sk, int ofo_possible)
4818 {
4819         struct tcp_sock *tp = tcp_sk(sk);
4820
4821             /* More than one full frame received... */
4822         if (((tp->rcv_nxt - tp->rcv_wup) > inet_csk(sk)->icsk_ack.rcv_mss &&
4823              /* ... and right edge of window advances far enough.
4824              * (tcp_recvmsg() will send ACK otherwise). Or...
4825              */
4826              __tcp_select_window(sk) >= tp->rcv_wnd) ||
4827             /* We ACK each frame or... */
4828             tcp_in_quickack_mode(sk) ||
4829             /* We have out of order data. */
4830             (ofo_possible && skb_peek(&tp->out_of_order_queue))) {
4831                 /* Then ack it now */
4832                 tcp_send_ack(sk);
4833         } else {
4834                 /* Else, send delayed ack. */
4835                 tcp_send_delayed_ack(sk);
4836         }
4837 }


The tcp_ack_snd_check function checks if TCP acknowledgment must be send now , execute   tcp_send_ack() function or can  be delayed tcp_send_delayed_ack(). To allow the kernel to send the TCP ACK packet now 4 conditions must be meet:


  1. The first condition that the received data segment or unacknowledged data must by more than one  maximum segment size for defined in session icsk_ack.rcv_mss variable. This comes from RFC1122 or STD003 specification (4.2.3.2 section), saying that an ACK SHOULD be generated for at least every second full-sized segment (to be correct this comes from RFC5681 which obsoletes RFC1122, because in old version come with confusion, saying in one place MUST in other SHOULD).
    (tp->rcv_nxt - tp->rcv_wup) > inet_csk(sk)->icsk_ack.rcv_mss &&__tcp_select_window(sk) >= tp->rcv_wnd)

    This logical condition has two parts, the first part we seen that the unacknowledged data must be more then
    inet_csk(sk)->icsk_ack.rcv_mss value, theoretically it is not according the RFC, but in practice this condition is met after receiving the second TCP segment. That is very important here is that the RFC define that the ACK “SHOULD”, but not “MUST” be generated. So basically RFC allows us to generate the ACK more rarely.
  2. In the second conditions says that the TCP receive window or usable buffer space must be bigger the the receive windows or advertised TCP window by the server to the client. This is done to overcome the the Silly windows syndrome (SWS) problem, which was first defined in RFC813. It occurs due the bad system implementation of TCP flow control or due to the slow system, which consumes data slowly or can’t handle the received information. In such conditions the receive windows rcv_wnd is filled with data much faster  than it can handle it (clean up the receive buffer). In such condition the kernel must reduce the advertised window, by sending the update size to the client. This condition would go on until the receive window is set to minimal allowed size, making the data transmission ineffective. By forbidding the server to send  TCP ACK packet, we reduce the packet flow rate, the client must wait for ACK for sending more data and reduce the server load. This condition also satisfies the RFC5681 that “an ACK SHOULD be  generated for at least every second full-sized segment”.
  3. The third condition check if tcp_in_quickack_mode(sk) any data are ping-ponged back to the client, in such case the TCP connection in interactive state and an ACK packet must be send immediately (like telnet or remote data access information applications). In the other way the kernel would wait up to 500ms before sending an ACK message.
  4. The fourth and final conditions check if  the server receives out of order data, by checking the ofo_possible variable and the looking to receive queue tp->out_of_order_queue, to see if any out-of-order packet are received. This must be done for faster data recovery and improves TCP recovery time after a loss RFC5681 - “A TCP receiver SHOULD send an immediate duplicate ACK when an out-   of-order segment arrives.  The purpose of this ACK is to inform the   sender that a segment was received out-of-order and which sequence   number is expected.”This condition usually happens if packet loss or corruption occurs in the link between the client and server.

After checking these conditions (the first and second are in conjunction), kernel can send ACK message immediately if one of three conditions are true, else the the sending of ACK message must be delayed, by calling the tcp_send_delayed_ack function, which adjust the sending time based on RTT value and system MIN and MAX delay values. In case where is no packet drop or the TCP session has not ping-pong data going back to the sender, the TCP acknowledgment generation directly relay on first two conditions (if more than one full size segment is received and the we have enough space in receive buffer). 

Results/Outcome :

The TCP ACK generation rate mostly depends from two things, the NIC (TCP) offloading, which reduce TCP ACK rate (also reduce the system load) and the Linux Kernel TCP generation function tcp_ack_snd_check() function. According one condition of the function (__tcp_select_window(sk) >= tp->rcv_wnd) the ACK  must be delayed until the buffter get freed.

 
 

 

No comments:

Post a Comment