Microsoft blocked my CPA client's emails the day before the tax deadline
Posted by Lord_Amoux@reddit | sysadmin | View on Reddit | 105 comments
I've been fighting with Microsoft support for 24 hours trying to have a tenant-wide email block lifted for a tax office client of mine. (NDR 5.7.705)
Microsoft does not even know why the block happened. They still have been unable to remove it. There has been no spam sent, they are nowhere near the sent email threshold, and no accounts have been compromised. All have MFA. DNS for the domain is all correct (SPF, DKIM, DMARC). Security defaults, enabled.
I've spoken to a technician and their manager, and the manager's manager, and they still are unable to figure out why the block is in effect.
Fucking Microsoft.
thatirishguyyyyy@reddit
Outlook app wasnt working recently. My authenticator crashed a few times too.
Clients told me they had app issues with Outlook as well.
This week one of my websites sent a form receipt, automatically, to the office secretary and somehow Microsoft added a random employee in my clients organization to the email chain.
Literally treated as a reply inside an existing Microsoft email thread.
Fuck Microslop.
Secret_Account07@reddit
I get the reasoning folks are giving in the comments but let’s focus on the fact that even MS doesn’t know. If they don’t know how tf is OP supposed to know? That’s the BS part. If 3 MS techs are confused then enable the damn tenant. It’s that simple imo
enfiniti27@reddit
The reason for this is that over the years MSFT has slowly moved the "hard stuff" in support to be handled by the development teams. So now most support engineers are complete morons that just open tickets to the dev team for anything harder than "Did you restart it yet?"
Source: I watched it happen over the last 8 years.
InflateMyProstate@reddit
Throughout my career I think I’ve had Microsoft Support resolve maybe 1 ticket. Most of the time you submit the ticket, get stuck in the holding pattern, do a bunch of research and find the fix yourself and then they close the ticket. It’s really become a self service platform over the years and most of the time there’s a hidden menu or setting that resolves it. I’ve rarely not been able to fix an issue myself
tdhuck@reddit
I had an issue with a new firewall, it was 100000% a bug. I submitted a ticket with the vendor, I was on the phone with them, had a screen share going, showed them the bug, replicated it 5 times, logged into an older firewall still in prod, it did not have the same bug (both firewalls had the same rules, objects, groups, etc) it worked in the old but not in the new. Then I logged into another brand new sonicwall with the same bug at the same firmware level.
The support tech literally told me 'they don't see the bug and it is working fine' and I was so irritated that they had 0 comprehension of what I just spent 30 minutes showing to them and explaining to them.
Finally they accepted that 'it was a bug and pulled logs and would get back to me' the next day they replied back confirming the bug, but had no timeline of when the firmware would be ready.
I understand that nobody is perfect, even level 1 and level 2 support, but how ignorant do you have to be to not even look at the replicated issue right in front of your face?
Win_Sys@reddit
I’m not kidding, I literally said to myself that’s gotta be a Sonicwall before you mentioned it. I have had very similar experiences.
tdhuck@reddit
It was sonicwall. That's funny.
Win_Sys@reddit
Their support was never amazing but over the past few years it’s just become a complete dumpster fire where I can’t trust they can properly support their own products. I normally only get involved with troubleshooting firewall issues if no one else can figure it out (normally only handle switching and routing) but I can think of 3-4 times over the last 2 years where their support missed what should have been obvious to anyone who is a lead support engineer for their product but they blamed it on something else.
tdhuck@reddit
Thankfully I don't have to call them that often. I was fine with this bug, but I was on a remote session with my boss when I spotted it and he told me to call and ask them about it. I think a lot of it had to do with these devices being new and not fully in prod so if we needed to test a firmware or let sonicwall reboot them, etc, it wasn't a big deal.
Ironically, odds are high he would have forgotten, after about a week, that he told me to call it in and I could have gotten away with not calling, but that's not how I do things. I wouldn't want someone to ignore me like that, so I followed through.
Secret_Account07@reddit
So idk if it’s cuz I work for a larger org but MS premier support has always been solid…or whatever they changed the name too, I still call it premier. That’s paid support.
I always get good engineers but we rarely open MS tickets. I have a suspicion smaller orgs get subpar support
If I open a P1 I’ll get an engineer within an hour, I think SLA for a P1 is actually 4 hours though
willdeleteacct1year@reddit
you started pretty late then. I started with o365 pretty much the day it launched and their support was great for the first couple years, just like any other new product.
InflateMyProstate@reddit
I started around 2014-2015, but was a lowly desktop support agent at that time. We did have ADFS and Hybrid up and running though, I believe our admins had a much better experience with support and onboarding during that time.
KingKnux@reddit
Even better when the functionality exists but the menu either doesn’t exist or says that wont work
But for whatever reason doing the exact same thing and hitting the same API over powershell works fine
countsachot@reddit
Man lately half the time they look at the wrong domain for me.
I don't think they have any techs left, they've got actors reading scripts.
The_Amen_Corner@reddit
Yep. Last support ticket I gave them the easy fix on a silver platter. Still took them a week and a half of escalations.
FerretBusinessQueen@reddit
I called on an Intune reporting issue we and a couple of our clients in a regional area in Azure- our clients out in the west/central/mountain areas weren’t affected- and got some information on Reddit that people who were in the same region were also having the same issue. We were a premiere client (whatever the fuck that even means anymore), had to give the same information multiple times, spent about 10 hours on remotes with them, all while asking and not receiving any credible escalation to an engineer.
FINALLY after a month they recognized that there was an issue with reporting on a regional cluster and that they were deploying a fix for the issue. I asked how come they weren’t alerted to the fact that such an essential function had gone down, got an absolutely dismissive answer saying they’d see if they could implement a solution down the road. I asked for an RCA and was told those are internal.
The people working as Microsoft support do not seem to have a clue, and I 100% blame Microsoft.
Lars_Galaxy@reddit
A great deal of companies gutted their support departments, even at the enterprise level, in the last year or two years. I was one of the layoffs at a cloud company. In my case, they offshored to cheap Euro countries and India + AI as the front line. They also made it increasingly difficult and costly for clients to talk to an actual person with little to no experience actually solving problems, since experienced competent staff was let go. There's only a handful of people left in the US, and that's only because some clients security requirements only allow US people to work on their stuff.
Arudinne@reddit
With Microsoft, this has been an issue for much longer than a year or two.
Like many large companies, they outsourced they outsourced the majority of their call center volume to the cheapest place they could find and hired people that just read from scripts and get punished if they deviate even slightly.
At this point I am fairly sure a Copilot AI agent handling the call actually would do a better job.
countsachot@reddit
At the very least it would know what domain to type into mx toolbox...
Arudinne@reddit
It might also actually read the text and analyze the screenshots describing the issue instead of asking me to do that again after someone begrudgingly takes the ticket.
countsachot@reddit
Lol!
nemec@reddit
Agents. They're called agents now /s
StateOfAmerica@reddit
I could've sworn it's copilots
Starkoman@reddit
Blind co-pilots.
Kichigai@reddit
I'd feel safer with Otto Pilot.
edbods@reddit
i'm reminded of that 4chan greentext where the OP claimed to have worked in Microsoft on the Windows 10 UI and it was an absolute clusterfuck and went into detail about how painful it was to just add a dropdown element to the new windows 10 control panel
NaturalIdiocy@reddit
I believe the story is that they were all in different, practically siloed, departments that would require change authorizations up and down the chain.
deonteguy@reddit
Do they still use Mindtree? I live in the same condo building as their manager of US-based support so I heard tons of complaints about their employees being literally illiterate. Last I saw him, he said they had started to use AI with employees with low reading comprehension so their support replies were getting even worse.
I got to know him after Azure kept losing my company's virtual machine images. For over a decade, our build system has created and deleted thousands of those images on Amazon AWS AMI, and only once did we have one disappear. Amazon confirmed that happened, but that's only once out of dozens per day most days. The test account I created on Azure to move Jenkins and our images to Azure lost seven images in the same day and continued to lose one here or there for days. First level Mindtree support didn't seem to understand the concept of an image. I sent them screenshots of error messages showing various errors including the most common "Your virtual machine is unavailable due to a disk failure." error message. I don't think I ever got hold of anyone that understood the serious problem even with the head of US support helping.
Ferretau@reddit
It's probably been outsourced to enable an exec to get a huge performance bonus for finding "savings"
ElectroSpore@reddit
Copy and pasting Co-pilot outputs. I wish they were using scripts
The_Real_Meme_Lord_@reddit
Im pretty sure the guy I talked was using copilot as it took him roughly between 2-4 minutes to work through issues. Plus I think he was working from home as I could hear kids in the background.
dnev6784@reddit
Better than hearing literal chicken and goats I suppose... Been there.. cough cough.. GoTo Connect
automounter@reddit
This. It is *painful*. We have GCP, AWS and Azure and Microsoft is the worst BY FAR.
ExoticBump@reddit
100% I tell them all can we please stop reading the script. Just talk to me like a normal person
BatemansChainsaw@reddit
Go to the cloud, they said.
It'll be easy, they said.
moffetts9001@reddit
Still better than on prem exchange.
BatemansChainsaw@reddit
I've been running an on-prem exchange for almost 15 years. It's not that hard if you know what you're doing.
moffetts9001@reddit
You know what’s even easier? Office 365.
BatemansChainsaw@reddit
That's like saying it's easier when you have someone else do 99% of the work. gasp
moffetts9001@reddit
Yeah... you don't get bonus points for doing things the hard way. Enjoy Exchange SE, though.
flecom@reddit
ya op should move to the cloud, i've heard all your problems go away in the cloud
St0nywall@reddit
Check their zone and DNS. Make sure nothing has been changed in the last 30 days.
I sure hope their zone record hasn't been hijacked.
Lord_Amoux@reddit (OP)
I checked on the nameserver side, DNS history, etc but nothing has been changed. The domain settings in Microsoft are also still validating correctly as well
SuperfluousJuggler@reddit
If they use a 3rd party sender or that sender send on behalf of the originator, they many need SPF or CNAME records added/updated. That would be on them not you, do you have RUA/RUF setup on your domain with p=quarantine or p=none to see the errors, are you running p=reject?
Lord_Amoux@reddit (OP)
This specific domain is running p=reject
SuperfluousJuggler@reddit
That leads credence to them not having proper DNS settings, and your p=reject would not let them land if that is true.
Do you collect your RUA and RUF reports? Can you look for the sender in the XML file they give you and find the issue?
Lord_Amoux@reddit (OP)
I can double-check those, however Microsoft told us that they checked the DNS from their side and it is correctly configured
oaomcg@reddit
just because their DNS records look right doesn't mean the email will come through... are they sending through an application server that is not captured by their allowed senders settings?
Lord_Amoux@reddit (OP)
Emails are sent directly from Outlook desktop and the bounceback email error states that the tenant threshold has been reached.
We initially thought Avanan could be causing issues ( even though it's been in place for months) so we disabled it and removed the connectors, but even after that the emails are still blocked.
CeC-P@reddit
What are the odds they're over on licenses or didn't pay a renewal so their outgoing email limit is a bit lower? No idea if they can happen or not.
Lord_Amoux@reddit (OP)
A valid thing to check, they actually have extra licensing available for their tenant. We pay the licensing via PAX8 every month so we would have been alerted if there was a payment issue, but we double checked that too.
viquzsa@reddit
Unbelievable that our profession still trusts Microsoft.
thortgot@reddit
You always get an error, what's the error
Lord_Amoux@reddit (OP)
Remote server returned '550 5.7.705 Service unavailable. Access denied, tenant has exceeded threshold.
lart2150@reddit
Makes sense they would hit rate limits around the filing deadline https://techcommunity.microsoft.com/blog/exchange/introducing-exchange-online-tenant-outbound-email-limits/4372797
Lord_Amoux@reddit (OP)
The 24-hour limit for this specific domain is 10,820. The amount of emails this tenant sent out was around 200 in the previous week before
thortgot@reddit
Its pretty clearly a limit issue.
Lord_Amoux@reddit (OP)
If it's a limit issue then it's either that 10,000 emails have been sent out invisibly or the limit is not really what Exchange Admin Center says it is.
thortgot@reddit
If I had to guess a connector got left open.
I assume you've checked mail trace outbound activity?
Lord_Amoux@reddit (OP)
Yes. In terms of connectors, the only existing connectors were for Avanan inline inspection
St0nywall@reddit
What if Avanan is blocking the email?
Lord_Amoux@reddit (OP)
I disabled avanan in the tenant while testing
St0nywall@reddit
Did you check their logs before disabling them? Did they show you anything?
dnev6784@reddit
I'm pretty sure there's ways to manipulate powershell assuming they have access to it, to send emails that aren't going to be in the audit log
JeroenPot@reddit
You could use a third party smtp service to bypass the block.
InflateMyProstate@reddit
Oof, that is a tough one. How much outbound email is sent from their domain? Do they use an external mailing service at all for marketing, etc? Any connectors as well? I would definitely spend some time in the outbound anti-spam settings to see if anything is being blocked there for any reason.
dnev6784@reddit
With only five users it would be hard to believe that they would hit their send limit based on average use. Maybe if they used it to mass email every single client, but even then they would have to have a thousand clients which would be pretty hard to believe for a small five person firm
InflateMyProstate@reddit
Totally agree, but obviously something is going wrong here and these things need to be checked. I’ve seen issues with outbound connectors, printer direct-send, and a myriad of other settings in Exchange Online that could cause Microsoft to block outbound sending. It’s worth it to check this things and review all the outbound email logs while waiting on Microsoft to figure out their left foot from the right shoe. 99% of the time there’s a obscure setting that needs to be adjusted.
shokzee@reddit
This is unfortunately not rare. Microsoft's automated reputation systems flag tenants with zero warning and zero explanation, and their support org has no visibility into why it happened. It's a black box even to their own people.
5.7.705 is a tenant-level outbound block. Usually triggered by their anti-spam heuristics detecting something "anomalous" even if nothing actually malicious happened. A spike in outbound volume around tax season from a small tenant is exactly the kind of pattern that trips it.
Two things I'd do right now: open a case through your Microsoft partner channel if you have one (way faster than admin center tickets), and set up a secondary sending path through a transactional provider like Postmark or SES so your client can actually send time-sensitive stuff while Microsoft figures their own system out. We had this exact issue with a client last year and it took Microsoft 5 days to resolve. Five days.
Longer term, we monitor all our client domains through Suped so we catch reputation and deliverability shifts before they turn into full blocks like this. Doesn't help you today, but worth having in place so you're not blindsided next April.
tootallfortheliking@reddit
This. Several months after Microsoft had their IPs blacklisted by Spamcop a few years ago, one of our tenants were suddenly blocked with this same code. 6 weeks of arguing with various South-Asian and South African Microsoft contractors, we finally escalated enough to get actual Microsoft engineers on the phone.
They conveyed that after the Spamcop incident they starting tightening up the rules their ML was using to monitor spam, and eventually admitted that they tightened it too far and it was throwing false positives. (I'm paraphrasing Of course)
What really stood out to me was when he described how the ML works in relation to decision making that leads to an outbound block.
Basically, it monitors sending frequency, number of users, etc. What really cooked my noodle was the part about if a domain suddenly starts sending to 10-20% more different domains from what was "typical", that's when it would get flagged and blocked.
Not that any of what I've just said has any bearing on OPs situation; I just saw 5.7.705 and had a flashback.
cvc75@reddit
Only 10-20%? That’s just a recipe for disaster for orgs that work seasonally.
And another reason to send any automated mail or marketing campaigns over an external service, not through MS.
tootallfortheliking@reddit
Funny you should mention that. For the first 5 weeks or so of our battle with them, they were insisting that our client was sending bulk mail. Ultimately Microsoft advised if you need to send bulk mail in any way, use a third party service. I couldn't believe that was their official position. Additionally, the HVE they introduced that allows up to 100k emails per day, is only for internal emails. Bizarre.
Frothyleet@reddit
For external bulk mail, the MS service they will point you to is Azure Communication Services.
HVE was originally going to be for both internal and external when it was announced, but they changed it to internal-only prior to general availability. Possibly because of ACS already existing.
drashna@reddit
This is what happens when you use AI/LLM to do all of the work that HUMANS should be doing.
deonteguy@reddit
Anything made to block traffic because of higher demand will cause outages. Cloudflare has caused so many of our outages because they think our normal traffic is an attack.
DerkvanL@reddit
Check your tenant block lists.
Honky_Town@reddit
Stopped reading after "Microsoft support" nothing good comes after those 2 words!
We feel you. Keep in Mind your Job ends if you clock out and your responsibility is to take your problems to the proper solver groups. Yeah we do not get paid for solutions but to forward those to the resolver.
Helps a lot to get a good night sleep and keep a healthy company. No its a MS fuckup not a incompetence of IT...
NotMedicine420@reddit
You pay for shit service you receive shit service.
zer04ll@reddit
Mxtoolbox is your friend use it, if they got hacked and their domain got blacklisted then they are fucked. This is why you use subdomains for marketing or any system that can send emails instead of the primary once blacklisted you are cooked
Thyg0d@reddit
Until marketing decides to tag a campaign wrong in hubspot.. OMFG what a cluster f*ck that became..
Lord_Amoux@reddit (OP)
As of right now they appear on zero blacklists according to Mxtoolbox
zer04ll@reddit
That good! Best luck then
Itsme2020_uk@reddit
Hi,
I've had this twice now on customer Office 365 sites, usually after a license change or expiry. It fixes itself the next day it seems, also check for any outstanding invoices or expired cards, that also seems to trigger it.
Lord_Amoux@reddit (OP)
We provide their licensing via PAX8; their licenses have been the same (Business Standard) for the entire time we've sold them
corbeth@reddit
Pax8 is a massive provider, they have a significant Microsoft presence and should have leverage to help get this solved. Have them reach out to their CSAM to get this case prioritized. If they can’t help then find a direct provider who can and move your customers there. This is the kind of thing that you have to push Microsoft on or nothing will get done.
Lord_Amoux@reddit (OP)
We had a critical ticket open with pax8 today. After conversation they told us that because it was a tenant level block only Microsoft can fix it and they’re in the same position as us
corbeth@reddit
Right, but they have a relationship with Microsoft that they can leverage on your behalf. I’m a direct provider so I know it can be done. Have they escalated with their Microsoft account team?
QuerulousPanda@reddit
fucking LOL
farva_06@reddit
Long shot, but does reverse DNS lookup of the IP resolve to their mail domain?
RikiWardOG@reddit
Vibe coding their web filter
radialmonster@reddit
is the server ip on dns black lists?
alexandreracine@reddit
Two weeks.
dnev6784@reddit
Make sure to log into the exchange admin powershell and disabled directsend or any of those features that allow mailbox that's outside of audit logging. I don't remember the exact name, but several months ago emails were being inserted into people's mailboxes via that feature, and it was a big deal I think back in like October and November. It's an easy thing to do if they're not using any features that are linked to it. I think there were several posts about proof point needing some additional changes but I saw you were using Avanan, so you should be good to turn it off
0xDesecrator@reddit
Either the tenant has a compromised user or the SPF is misconfigured. Do you have E5? Can you look at the mail volumes in Defender?
ExceptionEX@reddit
Probably sending out mail that is non canspam compliant. I see it all the time, people just bulk sending out of their tenant. Not following any of the rules.
And expensive and costly lessen, but any business that sends emails out at scale, should be doing it via a 3rd party on a subdomain or seperate domain, and not on their primary domain via their MS tenant.
P.s. just because you aren't being told why doesn't mean Microsoft doesn't know, it just means the level of support your reaching doesn't have access to it.
Lord_Amoux@reddit (OP)
In a case where mass emails are sent normally, yes, some sort of third-party service should definitely be used. This specific tenant is on the smaller side and had fewer than 200 emails sent out in the week before they were blocked.
My main issue with Microsoft is that they've stated a couple of times in this ticket that they "ran a command" to unblock the tenant; yet the latest comment from them is that they don't have access to the necessary functions to unblock a tenant. Now, the engineering team is working on it.
dnev6784@reddit
For sure, one of the mailboxes was compromised. Reset all passwords, sign out of all sessions for each user, reset 2FA for admins, look for rules in each box, etc, etc
They're blocked because something was sending and it's possible it was an account that has POP and SMTP enabled.
skeetgw2@reddit
Smtp enabled is a good catch. Nicely done.
Lord_Amoux@reddit (OP)
Authenticated SMTP isn't enabled for any users. Web app, MAPI, Exchange, and IMAP are allowed
skeetgw2@reddit
Oh i just meant that I wouldn’t have immediately jumped to that avenue. It was a good idea.
Lord_Amoux@reddit (OP)
We have Huntress ITDR in place for the tenant in addition Avanan Advanced Protect. At the very least Huntress has told us in the past if there are any sort of rules in mailboxes that it finds. Also there's only 5 users in this tenant and I've done a check for each of them myself too
RCTID1975@reddit
How did you confirm they're no where near the threshold, no spam was sent, and no one was compromised?
Lord_Amoux@reddit (OP)
Threshold : Exchange admin center. 24-hour average + total emails sent. Also, Avanan mail explorer. The Tenant Outbound External Recipients report in the EAC displays the 24-hour limit for the tenant (in this case, 10820)
Spam check - since there are only 5 users we were able to check all the mailboxes and message trace to see if there's any bulk messaging happening.
Tying into that, we have Huntress identity protection and Avanan account monitoring to log suspicious sign-ins and account activity. Also went through Entra admin center to look at sign-in logs ,etc
profesionalec@reddit
Are there any unusual recipients visible in Message trace? Are there any suspicious connectors?
Only EOP/anti-spam backend team can unlock the domain. Try to search for "Exchange Online blocked error 5.7.705" in the help widget and open the ticket from there. I would send something like:
"Outbound mail is blocked tenant-wide with NDR 550 5.7.705 - Access denied, tenant has exceeded threshold. We have completed full remediation: no compromised accounts (all MFA enforced, sign-in logs clean), no suspicious connectors, no open relays, DNS records (SPF/DKIM/DMARC) are verified correct. Client is a tax office with business-critical email needs. Please escalate to the EOP/anti-spam team for immediate tenant unblock."
dio1994@reddit
Since maybe a year ago, there has been a bizarre formula rolling out, where you basically need to be a math major, that determines how many emails and addresses your tenant can send during a rolling 24-hour period. Its also a good idea that you put a limit internally at the user level, the recommended level is 500 but if you enable that rule I believe the default is 1000 per user. It's a bit confusing because it is each address on an email (contact groups and distros are exploded out) and addresses can count multiple times from different emails. Did someone have a huge contact list that they blasted? The block is a hard 24 hours that you need to wait out. But they dont want sending marketingband bulk email from exchange.
https://techcommunity.microsoft.com/blog/exchange/introducing-exchange-online-tenant-outbound-email-limits/4372797