As some of you might know, we had an outage yesterday. We believe that in every mistake there is something to learn from, so after each outage we are writing post-mortems. Usually we do this internally because the issues we run into are very specific to our infrastructure.
This time we ran into a quite nasty issue which could affect everyone running a linux system with a lot sessions on it and we thought you might be interested to know about that pitfall.
At 4:40pm CEST, we got reports about
Yikes (503/504 errors) on SoundCloud. Around the same time, our monitoring alerted for a high amount of 503s at our caching layer and right after that one of our L7 routing nginx instances was reported down.
We were still able to log into that system. dmesg showed:
Aug 13 14:46:52 ams-mid006.int.s-cloud.net kernel: [8623919.136122] nf_conntrack: table full, dropping packet. Aug 13 14:46:52 ams-mid006.int.s-cloud.net kernel: [8623919.136138] nf_conntrack: table full, dropping packet.
N.B.: Our systems are set to UTC timezone
That wasn’t expected. The first thought was: “Someone must have changed the sysctl tunings for that”. Then we realized that this system has no need for connection tracking, so
nf_conntrack shouldn’t be loaded at all. As a quick contermeasure we raised
net.ipv4.netfilter.ip_conntrack_max. This fixed that situation and brought the service back up.
Why did it happen?
After bringing the site back up, we investigated what caused the kernel to enable connection tracking. Doing a
lsmod showed that connection tracking and iptables modules were actually loaded. Another look into dmesg revealed that right before the outage the
ip_tables netfilter module was loaded:
Aug 13 14:38:27 ams-mid006.int.s-cloud.net kernel: [8623415.007818] ip_tables: (C) 2000-2006 Netfilter Core Team Aug 13 14:38:35 ams-mid006.int.s-cloud.net kernel: [8623422.444931] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
So what happened? One of our enginners was doing some preparations for scaling that layer of our infrastructure. To verify we don’t use any specific iptable rules on that system, he did:
iptables -L iptables -t nat -L
Those commands themself are pretty harmless. They will just list configured iptables rules. The first one rules in the
filter table, the second one in the
nat table. Nothing which should change any system configuration, right? Nope. Let’s try to reproduce it. Just boot up some system (I’ve tried it on my Ubuntu Laptop). No iptables module should be loaded:
root@apollon:~# lsmod|grep ipt
Now just list your iptable rules:
iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination
And check again for loaded modules:
root@apollon:~# lsmod|grep ipt iptable_filter 12810 0 ip_tables 27473 1 iptable_filter x_tables 29846 2 iptable_filter,ip_tables
Okay, that loaded some iptables module to make it possible to add filter rules via
iptables. This shouldn’t cause any problems, since without any actual rules the impact on the kernel is negligible. But now check your nat table:
root@apollon:~# iptables -t nat -L Chain PREROUTING (policy ACCEPT) target prot opt source destination Chain INPUT (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination Chain POSTROUTING (policy ACCEPT) target prot opt source destination
Completely empty as well. But now look at your kernel modules:
root@apollon:~# lsmod|grep ipt iptable_nat 13229 0 nf_nat 25891 1 iptable_nat nf_conntrack_ipv4 19716 3 iptable_nat,nf_nat nf_conntrack 81926 3 iptable_nat,nf_nat,nf_conntrack_ipv4 iptable_filter 12810 0 ip_tables 27473 2 iptable_nat,iptable_filter x_tables 29846 3 iptable_nat,iptable_filter,ip_tables
By just listing your iptable rules for the nat table, the kernel loaded
nf_conntrack which enabled connection tracking. See dmesg:
[75024.007681] nf_conntrack version 0.5.0 (16384 buckets, 65536 max
On your Laptop you probably don’t care – it’s even quite convenient. On a production server that handles a large number of connections the fairly small default
nf_conntrack table will overflow quite fast and cause dropped connections.
How do we prevent it?
iptables doesn’t load the
nf_conntrack itself, it only loads
ip_tables which again loads modules it depends on via the kernel’s
But since that module loader uses the modprobe user-space helpers like modprobe, the auto-loading process will honour
modprobe.d/ settings. Unfortunatelly there is no easy way to disable loading of a module altogether, but there is a workaround for that.
Since we don’t need iptables at all on that system, we’ve created a /etc/modprobe.d/netfilter.conf like this:
alias ip_tables off alias iptable off alias iptable_nat off alias iptable_filter off alias x_tables off alias nf_nat off alias nf_conntrack_ipv4 off alias nf_conntrack off
This will make
off instead of the actual kernel module.
Trying to run any iptables command now, should now give you now:
iptables -t nat -L FATAL: Module off not found. iptables v1.4.12: can't initialize iptables table `nat': Table does not exist (do you need to insmod?) Perhaps iptables or your kernel needs to be upgraded.