The effect DNS TTLs have on DKIM and SPF email authentication
Posted by lolklolk@reddit | sysadmin | View on Reddit | 12 comments
If you're still on the fence about DNS TTLs and how it can affect DKIM or SPF evaluation and email delivery, here's why you shouldn't be.
See this timeline starting with extremely low TTLs on DKIM CNAME records in DNS, and the effect it has on receiver authentication validation.
In one graph, this shows the timeline for all DMARC reports not from Microsoft, from which we saw a very positive effect from increasing TTLs on DKIM CNAMEs, and their respective targets. The DKIM failures are almost negligible levels now with all receivers.
In the second, with Microsoft OLC and M365, the effect is not nearly as obvious, as they have a bug currently with how Windows DNS (which the Defender antispam and Outlook consumer services use) evaluates DKIM (and also SPF).
So, in general, you should have your DKIM/SPF records at least at 1 hour. If they don't change often, you can go even higher, to 6 hours, or even 24 hours. The non-Microsoft 24-hour TTL results from that timeline speaks for itself in terms of temperror
reduction.
If you're curious about total volume in terms of numbers, this is based on 2.1 billion total direct (non-forwarded) emails in the last 90 days.
TL;DR For email authentication, more DNS cache = more better
pdp10@reddit
We sometimes get stakeholders arguing that very low TTLs never actually matter to them, so they refuse to go up to anywhere near one hour. They think they're preserving their own agility at no cost to themselves, just an externalized cost.
Gtapex@reddit
Not trying to pick a fight, but I’m not seeing a real trend on that top chart (linked to TTL at least)
Or am I reading the chart wrong?
lolklolk@reddit (OP)
That chart is DKIM failures over time. Less failures = less traffic in that chart. (I.e. the 24 hours TTL, where there are almost no failures.)
Gtapex@reddit
Ahh… I saw the legend showing gray as “delivered messages” and thought that was just regular delivered messages… with yellow being the DKIM failures.
lolklolk@reddit (OP)
Yeah I can see how it might be confusing, just pay attention to the filter on the top left of that chart, it points out what the data is filtered on specifically.
thegacko@reddit
Thanks for this - this is really useful
Is there any public "master thread" of this bug/issues with DKIM DNS resolutions for Office365 ? -- its really causing a major issue and wondering what is being done about it?
It causes constant problems with senders being flagged as DMARC failure when independently there is an aligned DKIM signature that perfectly passes so there is no problem - yet if sender has enforced DMARC policy to the bin it goes when received by Office365.
They even do this for their own DKIM signatures - Office to Office - which is ridiculous. See this a lot with AmazonSES also.
lolklolk@reddit (OP)
Unfortunately the cases I know about are Microsoft tickets that have been opened by customers themselves. There hasn't been direct public acknowledgement yet, outside of a few quips from the PM over Exchange Online/Outlook infra. But many people have been noticing and posting about this problem recently.
From what I've heard, during October, Microsoft has made several adjustments to DKIM retry intervals to improve the issue, but it's had limited impact. They allegedly have a tentative fix slated for Nov. 18, but I wouldn't be surprised if that date got pushed out.
charmingpea@reddit
Interesting. Isn’t the default TTL in AWS Route 53 set to 5 minutes?
ElectroSpore@reddit
As a general rule there are only about two good reasons to ever set ANY DNS record to something less than 1 hour
Gtapex@reddit
#3 and then completely forget about it
SuppA-SnipA@reddit
Always this
lolklolk@reddit (OP)
Agreed. It seems to largely be a problem when architectural teams dealing with DNS and ESPs in general don't think about the TTLs. These in the graph I showed were for a major ESP, and their TTL was 5 minutes on the actual TXT record with the key, and yet they wondered why they had such elevated DKIM failures.