conntrackd - synchronising state to kernel table on backup instance instead of using external cache

Posted by AccomplishedComplex8@reddit | linuxadmin | View on Reddit | 2 comments

I am building HA Linux firewalls following this article. There are few other similar ones on the internet. However I did not want to just copy paste configs, but instead use it as reference and understand how it works. Everything is fine except conntrack question. I am here hoping on your Linux kernel expertise to help me understand something.

Basically, conntrackd can synchronise conntrack data between two machines. One is being active, and the other is standby backup.

My understanding is that conntrackd can keep states in a) kernel table, b) external cache (file, correct me if I am wrong.) This depends on DisableExternalCache config parameter, see documentation.

For external cache to work, I need to use fail-over scripts in keepalived, looking at that script, it can notify conntrackd to become either backup or primary instance, and if it is telling conntrackd to become primary instance, it will load conntrack data into kernel table from external cache.

If I disable external cache, then conntrack data will be present in kernel tables and should be same on both firewalls.

Author's concern is that keeping conntrack in kernel table (on a backup instance) will use kernel table space unnecessarily(?) and will increase CPU usage. My firewalls are high spec servers with 16-core CPUs, not doing anything but iptables, so I am not worried about performance.

Considering my case, and in case of "b)", since the conntrack must be loaded into kernel anyway, would it not be simpler to just use option "a)"? That would also simplify my deployment of keepalived.

As for limitations of conntrack entries in kernel, if I run out of entries on backup firewall, same issue will happen on active firewall, would it not? But anyhow, I do not think I will run out of conntrack entries, the current limit in Debian by default is high enough, and none of my firewalls (Linux and cisco) ever reached connection count to even half of that number.

I guess the only way for me to find out is to actually test option without external cache ("a)") myself, and be prepared for option using external cache and fail-over scripts (option "b)").

Could it possibly be that when conntrackd development started, it was not common to have powerful CPUs, and since then the documentation did not change, or documentation covers most possible fail-safe scenario?