conntrackd - synchronising state to kernel table on backup instance instead of using external cache
Posted by AccomplishedComplex8@reddit | linuxadmin | View on Reddit | 2 comments
I am building HA Linux firewalls following this article. There are few other similar ones on the internet. However I did not want to just copy paste configs, but instead use it as reference and understand how it works. Everything is fine except conntrack question. I am here hoping on your Linux kernel expertise to help me understand something.
Basically, conntrackd can synchronise conntrack data between two machines. One is being active, and the other is standby backup.
My understanding is that conntrackd can keep states in a) kernel table, b) external cache (file, correct me if I am wrong.) This depends on DisableExternalCache config parameter, see documentation.
For external cache to work, I need to use fail-over scripts in keepalived, looking at that script, it can notify conntrackd to become either backup or primary instance, and if it is telling conntrackd to become primary instance, it will load conntrack data into kernel table from external cache.
If I disable external cache, then conntrack data will be present in kernel tables and should be same on both firewalls.
Author's concern is that keeping conntrack in kernel table (on a backup instance) will use kernel table space unnecessarily(?) and will increase CPU usage. My firewalls are high spec servers with 16-core CPUs, not doing anything but iptables, so I am not worried about performance.
Considering my case, and in case of "b)", since the conntrack must be loaded into kernel anyway, would it not be simpler to just use option "a)"? That would also simplify my deployment of keepalived.
As for limitations of conntrack entries in kernel, if I run out of entries on backup firewall, same issue will happen on active firewall, would it not? But anyhow, I do not think I will run out of conntrack entries, the current limit in Debian by default is high enough, and none of my firewalls (Linux and cisco) ever reached connection count to even half of that number.
I guess the only way for me to find out is to actually test option without external cache ("a)") myself, and be prepared for option using external cache and fail-over scripts (option "b)").
Could it possibly be that when conntrackd development started, it was not common to have powerful CPUs, and since then the documentation did not change, or documentation covers most possible fail-safe scenario?
Suyash_t151@reddit
I am concerned about the backup machine which was supposed to be a backup but here it constanly listen to message from primary machine which make it look like active active setup
AccomplishedComplex8@reddit (OP)
Depends on what is your definition of Active-active.
Maybe at the time of writing I was guided by the documentation and configs.
You can use various terms: active-passive, active-standby, primary-secondary. primary - warm standby.
From the traffic point of view, traffic uses only one firewall, the primary or the active one. It is not using the second machine so it does not participate in traffic and enforcing the rules.
However, the second machine is always ready to take over the job.
in active-active sense your traffic would have to flow via both machines, in this case you need to synchronise MAC address between two machines so that they both answer to ARP. But that's more complicated to implement across 2 different machines, for me personally. I would not want to do it myself with plain Linux.