Most current congestion control algorithms rely on packet loss as the signal to slow it down. According to [1] this is ill-suited for todays modern networks.
The BBR congestion control algorithm, is an alternative used by Google, designed so it “reacts to actual congestion, not packet loss or transient queue delay, and is designed to converge with high probability to a point near the optimal operating point.”. [1]
To use BBR on Ubuntu 16.04, first step is to make sure your kernel is >=4.9.
If your kernel is 4.4 (as mine were), the best way to get a newer kernel, is to enable Ubuntu’s “LTS Enablement Stack”. [2] This is on Ubuntu server 16.04 simply done with:
# apt install --install-recommends linux-generic-hwe-16.04
For me, this installed 4.10.0.
Next we need to enable BBR for congestion control, but we also need to change the packet scheduler to fq [3] (though not required since patch [5]):
Append the following two lines to /etc/sysctl.conf (or put them in a new file in /etc/sysctl.d/):
net.core.default_qdisc = fq net.ipv4.tcp_congestion_control = bbr
Reboot, and verify with:
$ sysctl net.ipv4.tcp_congestion_control net.ipv4.tcp_congestion_control = bbr
While attending the BornHack camp I noticed that even though we had a 1 Gbit internet connection, I couldn’t download with more than 30mbit/sec from https://mirrors.dotsrc.org.
After some testing with iperf3, the cause of the low throughput seemed to come from packet loss on downstream traffic. With iperf3, on a single TCP stream I could only download with around 30 Mbit/s, but upload (which had no packed loss) with around 600mbit/sec.
I could use up the entire 1 Gbit link by telling iperf3 to use multiple TCP streams (e.g iperf3 -P 30).
Having a 1 Gbit link, but being unable to use it up has bugged me since Bornhack. Then I stumbled upon this BBR congestion control algorithm while reading a blogpost from Dropbox [4], and I decided to try it out.
So lets try to see if we can do better with BBR. Step 1 is to create a link, with similar conditions as on bornhack. I used a lxc container connected to the same switch as the server hosting mirrors.dotsrc.org, and added some RTT and packet loss.
On my client (the lxc container), I added around 40ms using:
# tc qdisc change dev eth0 root netem delay 40ms 4ms
And made it drop 0.003% of all incoming packages (this gave me the ~30 Mbits/s I was aiming for):
# iptables -A INPUT -m statistic --mode random --probability 0.003 -j DROP
iperf3 results, with the default (net.core.default_qdisc = pfifo_fast and net.ipv4.tcp_congestion_control = cubic):
ato@kvaser:~$ iperf3 -c 130.225.254.107 -N Connecting to host 130.225.254.107, port 5201 [ 4] local 130.225.254.116 port 55802 connected to 130.225.254.107 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 3.16 MBytes 26.5 Mbits/sec 15 91.9 KBytes [ 4] 1.00-2.00 sec 2.42 MBytes 20.3 Mbits/sec 9 76.4 KBytes [ 4] 2.00-3.00 sec 2.05 MBytes 17.2 Mbits/sec 0 97.6 KBytes [ 4] 3.00-4.00 sec 2.73 MBytes 22.9 Mbits/sec 0 113 KBytes [ 4] 4.00-5.00 sec 2.61 MBytes 21.9 Mbits/sec 1 96.2 KBytes [ 4] 5.00-6.00 sec 2.42 MBytes 20.3 Mbits/sec 12 80.6 KBytes [ 4] 6.00-7.00 sec 2.24 MBytes 18.8 Mbits/sec 6 69.3 KBytes [ 4] 7.00-8.00 sec 1.80 MBytes 15.1 Mbits/sec 0 87.7 KBytes [ 4] 8.00-9.00 sec 2.61 MBytes 21.9 Mbits/sec 0 106 KBytes [ 4] 9.00-10.00 sec 2.42 MBytes 20.3 Mbits/sec 2 91.9 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 24.5 MBytes 20.5 Mbits/sec 45 sender [ 4] 0.00-10.00 sec 23.9 MBytes 20.0 Mbits/sec receiver iperf Done.
With the new BBR congestion control enabled:
ato@kvaser:~$ iperf3 -c 130.225.254.107 -N Connecting to host 130.225.254.107, port 5201 [ 4] local 130.225.254.116 port 55930 connected to 130.225.254.107 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 34.5 MBytes 289 Mbits/sec 1139 5.71 MBytes [ 4] 1.00-2.00 sec 57.5 MBytes 482 Mbits/sec 76 6.04 MBytes [ 4] 2.00-3.00 sec 57.5 MBytes 482 Mbits/sec 117 2.48 MBytes [ 4] 3.00-4.00 sec 45.0 MBytes 377 Mbits/sec 129 5.92 MBytes [ 4] 4.00-5.00 sec 48.8 MBytes 409 Mbits/sec 135 2.49 MBytes [ 4] 5.00-6.00 sec 53.8 MBytes 451 Mbits/sec 105 5.81 MBytes [ 4] 6.00-7.00 sec 58.8 MBytes 493 Mbits/sec 69 5.99 MBytes [ 4] 7.00-8.00 sec 50.0 MBytes 419 Mbits/sec 81 5.71 MBytes [ 4] 8.00-9.00 sec 53.8 MBytes 451 Mbits/sec 103 5.84 MBytes [ 4] 9.00-10.00 sec 47.5 MBytes 398 Mbits/sec 111 2.61 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 507 MBytes 425 Mbits/sec 2065 sender [ 4] 0.00-10.00 sec 506 MBytes 424 Mbits/sec receiver iperf Done.
From 20 Mbits/s to 400 Mbits/s. That’s just awesome. I’ll keep it enabled on https://mirrors.dotsrc.org, which should help people download the content we host much faster on links with a bit of packet loss 🙂
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0f8782ea14974ce992618b55f0c041ef43ed0b78
[2] https://wiki.ubuntu.com/Kernel/LTSEnablementStack
[3] “NOTE: BBR *must* be used with the fq qdisc (“man tc-fq”) with pacing enabled, since pacing is integral to the BBR design and implementation. BBR without pacing would not function properly, and may incur unnecessary high packet loss rates.” [1]
[4] https://blogs.dropbox.com/tech/2017/09/optimizing-web-servers-for-high-throughput-and-low-latency/
[5] https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=218af599fa635b107cfe10acf3249c4dfe5e4123