The default system settings are usually good and work fine in most cases, but default sittings are boring :) We all want more power, faster performance, less latency an so on. How to increase TCP performance in Linux is easy, but how to find the bottle neck in your system is on your on is more interesting:)
What we need:
OS - Debian ( Jessie/Wheezy) - Because Debian ROCKS...
Hardware - 2 PC, Ethernet cable
Tools : tc (Traffic Control), iperf, tcpdump, wireshark, ethtool
PC1 1Gbit/s PC2
|192.168.1.10|------>------Ethernet------>------|192.168.1.100|
How we test:
First of all we must to know the max performance of our system setup with any changes - Default configuration (hard for my systems :)
Data sending from PC1 --> PC2 :
PC1 1Gbit/s PC2
|192.168.1.10|------>------Ethernet------>------|192.168.1.100|
iperf -c 192.168.1.100 -i iperf -s -i 1
------------------------------------------------------------
Client connecting to 192.168.1.100, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.1.10 port 49372 connected with 192.168.1.100 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 1.0 sec 110 MBytes 921 Mbits/sec
[ 3] 1.0- 2.0 sec 109 MBytes 915 Mbits/sec
[ 3] 2.0- 3.0 sec 111 MBytes 930 Mbits/sec
[ 3] 3.0- 4.0 sec 111 MBytes 928 Mbits/sec
[ 3] 4.0- 5.0 sec 111 MBytes 934 Mbits/sec
[ 3] 5.0- 6.0 sec 110 MBytes 924 Mbits/sec
[ 3] 6.0- 7.0 sec 111 MBytes 931 Mbits/sec
[ 3] 7.0- 8.0 sec 111 MBytes 931 Mbits/sec
[ 3] 8.0- 9.0 sec 111 MBytes 933 Mbits/sec
[ 3] 9.0-10.0 sec 110 MBytes 926 Mbits/sec
[ 3] 0.0-10.0 sec 1.08 GBytes 927 Mbits/sec
Good we get ~1Gbit/s (the diff go to the Ethernet TCP and IP headers). Must to note the value (Throughput) can deviate depending from system and setup.
Now lets make conditions a little bit worse, and delay the the traffic going from PC1 to PC2 by 10ms with tc command (a demon of ancient world:)
And to the manual to rule it -->
tc qdisc add dev eth0 root netem delay 10ms
and remake the test :
------------------------------------------------------------
Client connecting to 192.168.1.100, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.1.10 port 49425 connected with 192.168.1.100 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 1.0 sec 84.9 MBytes 712 Mbits/sec
[ 3] 1.0- 2.0 sec 91.2 MBytes 765 Mbits/sec
[ 3] 2.0- 3.0 sec 90.9 MBytes 762 Mbits/sec
[ 3] 3.0- 4.0 sec 91.2 MBytes 765 Mbits/sec
[ 3] 4.0- 5.0 sec 90.5 MBytes 759 Mbits/sec
[ 3] 5.0- 6.0 sec 91.0 MBytes 763 Mbits/sec
[ 3] 6.0- 7.0 sec 90.9 MBytes 762 Mbits/sec
[ 3] 7.0- 8.0 sec 91.2 MBytes 765 Mbits/sec
[ 3] 8.0- 9.0 sec 90.4 MBytes 758 Mbits/sec
[ 3] 9.0-10.0 sec 91.8 MBytes 770 Mbits/sec
[ 3] 0.0-10.0 sec 904 MBytes 758 Mbits/sec
Now only get average off 758Mbit/s, no so bad considering the fact we increase the latency from 1ms to 10ms. In doing such changes it's very important to see and check if we performance degradations was only due to delay not due our system instabilty (system load and other factors).
First we check the tc for information about the queuing (it does all the hard work of delaying out traffic)
tc -s qdisc show
qdisc netem 8001: dev eth0 root refcnt 2 limit 1000 delay 10.0ms
Sent 991282850 bytes 655137 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
No drops overlimits and requeues - good, so the traffic delaying has not drop any packets. The second (i would say more impotent check is to see if we had any TCP packet drop) by making packet sniffing.
We can do it any system (PC1 or PC2), it does not matter now to see if we had any packet drops.
tcpdump -i eth0 -s 80 -w /tmp/as.cap
the -i interface, -s the packet size (80Bytes of all packet capture) and -w write to the file.
The -s or snaplen specifies the data length or every packet to capture, the less it is the less system resources is needed, but setting to low we will end with lost header data, so in setting it lower then 74 bytes is bad idea, i use 80 bytes but needed, but it's good idea to set it to 100bytes.
Now lets look to the wireshark for TCP session details, if don't see errors we don't have drops, good (Analyze -> Expert Info)
Now lets make the delay a little bit bigger and change from 10ms to 100ms
tc qdisc change dev eth0 root netem delay 100ms limit 10000
We have added additional value limit 10000, defining the queue length of netem, it's needed if the delay value or packet rate are high (should write additional post about this -->)
If we want to delete the tc we can make it with:
tc qdisc del dev eth0 root
More info on netem module can be found here (I love this tool ) -->
So the results of TCP throughput with 100ms, not good at all:) ~ 94Mbit/s
------------------------------------------------------------
Client connecting to 192.168.1.100, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.1.10 port 50200 connected with 192.168.1.100 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 1.0 sec 5.75 MBytes 48.2 Mbits/sec
[ 3] 1.0- 2.0 sec 11.8 MBytes 98.6 Mbits/sec
[ 3] 2.0- 3.0 sec 12.5 MBytes 105 Mbits/sec
[ 3] 3.0- 4.0 sec 12.4 MBytes 104 Mbits/sec
[ 3] 4.0- 5.0 sec 12.4 MBytes 104 Mbits/sec
[ 3] 5.0- 6.0 sec 10.8 MBytes 90.2 Mbits/sec
[ 3] 6.0- 7.0 sec 12.8 MBytes 107 Mbits/sec
[ 3] 7.0- 8.0 sec 11.5 MBytes 96.5 Mbits/sec
[ 3] 8.0- 9.0 sec 11.2 MBytes 94.4 Mbits/sec
[ 3] 9.0-10.0 sec 11.9 MBytes 99.6 Mbits/sec
[ 3] 0.0-10.0 sec 113 MBytes 94.4 Mbits/sec
The packet trace (sniff) did not showed any drops, so like before the impact to TCP performace/throughput comes only from delay value (in my case).
To find the bottleneck we must go back to wireshark (Statistics -> TCP StreamGraph -> Window Scaling Graph) It's is important to select packet with source IP 192.168.1.100:
In our case (not always) the bottleneck looks to be the WindowSize, after 1s the window size stops growing and is constant until the end of test. The value is of 3145728 bytes.
So lets check the what performance i should get with such window by calculating BDP (more info -->) :
Bandwidth = tcp_rwin / Delay = (3145728 * 8) / 100ms = 251658240 ~ 240Mbit/s
But we get only 94Mbit/s, lets check the sending side tcp_wmem value:
TCP send and receive buffer size in my PC1:#check the TCP sending buffer size - value passed to TCP protocol
# In our case we must look to the last number (buffer size in bytes)
cat /proc/sys/net/ipv4/tcp_wmem
4096 16384 4194304
#check the TCP receiving buffer size - value passed to TCP protocol
# In our case we must look to the last number (buffer size in bytes)
cat /proc/sys/net/ipv4/tcp_rmem
4096 87380 6291456
# the sending OS socket buffer size for all connection and protocol, this value overrides the tcp_wmem
cat /proc/sys/net/core/wmem_max
212992
# the receiving OS socket buffer size for all connection and protocol, this value overrides the tcp_rmem
cat /proc/sys/net/core/rmem_max
212992
Lets increase the the TCP_WMEM value to higher size:
echo 6291456 > /proc/sys/net/ipv4/tcp_wmem
So we get a little better results:
------------------------------------------------------------
Client connecting to 192.168.1.100, TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.1.10 port 54642 connected with 192.168.1.100 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 1.0 sec 6.88 MBytes 57.7 Mbits/sec
[ 3] 1.0- 2.0 sec 19.9 MBytes 167 Mbits/sec
[ 3] 2.0- 3.0 sec 17.9 MBytes 150 Mbits/sec
[ 3] 3.0- 4.0 sec 18.5 MBytes 155 Mbits/sec
[ 3] 4.0- 5.0 sec 17.1 MBytes 144 Mbits/sec
[ 3] 5.0- 6.0 sec 18.1 MBytes 152 Mbits/sec
[ 3] 6.0- 7.0 sec 16.4 MBytes 137 Mbits/sec
[ 3] 7.0- 8.0 sec 19.5 MBytes 164 Mbits/sec
[ 3] 8.0- 9.0 sec 17.8 MBytes 149 Mbits/sec
[ 3] 9.0-10.0 sec 18.0 MBytes 151 Mbits/sec
[ 3] 0.0-10.0 sec 170 MBytes 143 Mbits/sec
After more increase (to up 62914560 ~ 60M) we get 239Mbit/s, (close to over calculated value). By checking it Wireshark (Statistics -> IO Graph) we see:
We use to filter to coming in and outgoing traffic and tick interval of 0.01
Filter for TCP Data : ip.src==192.168.1.10
Filter for TCP ACK : ip.src==192.168.1.100
The interesting part is the gap of No traffic of idle time, this idle period is "eating our throughput", in our case this hapens due to small receiving side TCP window or TCP_RWIN. After increasing it to 12.5MBytes (value calculate with BDP formula) we get only 456Mbit/s, only after increasing the TCP_WMEM to up to 24MB we got our max thoughput, by setting the sending side (PC2) to 24MB, we did not get the same resutls, so where i as wrong?
The only logical answer is the system delay which we forgot to include our delay calculation (still have to prove it)
Resutls:
1. The TCP throughput is depending from sending and receiving side TCP window size (tcp_wmem and tcp_rmem) on sender and receiver.
2. The calculated size of TCP window according the BDP formula is not allays correct due to the fact that the equation is using only network delay (RTT) value.
3. It is best way to check for TCP throughput issue is wireshark (one interesting thing we mist in the last graph is the TCP ACK rate, according RFC the TCP ACK SHOULD be sent after every second full size data segment. But about this next time).
No comments:
Post a Comment