I don't want to do it
Posted by Adept-Following-1607@reddit | sysadmin | View on Reddit | 164 comments
I know I'm a little late with this rant but...
We've been migrating most of our clients off of our Data Center because of "poor infrastructure handling" and "frequent outages" to Azure and m365 cause we did not want to deal with another DC.
Surprise surprise!!!! Azure was experiencing issues on Friday morning, and 365 was down later that same day.
I HAVE LIKE A MILLION MEETINGS ON MONDAY TO PRESENT A REPORT TO OUR CLIENTS AND EXPLAIN WHAT HAPPENED ON FRIDAY. HOW TF DO I EXPLAIN THAT AFTER THEY SPENT INSANE AMOUNTS ON MIGRATIONS TO REDUCE DOWN TIME AND ALL THA BULLSHIT TO JUST EXPERIENCE THIS SHIT SHOW ON FRIDAY.
Any antidepressants recommendations to enjoy with my Monday morning coffee?
desmond_koh@reddit
I 100% agree with the comments re: expectations not being managed. But I also disagree with the "move everything to Azure/AWS" approach.
Servers in a data center are in the cloud. Where do we think Microsoft, Amazon, and Google keeps their servers?
There is no reason why we cannot build our own highly reliable hosting infrastructure in a data center.
Now, if we don't want to have to deal with servers, storage arrays, etc. then fine. But building your own cloud is a perfectly doable, reasonable, and modern approach too.
anobjectiveopinion@reddit
We did. By hiring sysadmins who knew what they were doing.
lost_signal@reddit
Also datacenters plural. Have a DR site your replicate and practice regular failover testing with.
thortgot@reddit
A self hosted cloud has all the same break points either less scale and less expertise.
Secret_Account07@reddit
Plus I can easily do things like take a snapshot in 2 clicks.
We don’t have a ton of VMs in Azure/AWS but it blows my mind how complicated doing something as simple as taking a snapshot is in Azure
This is why I prefer our VMware environment. Hate Azure
thortgot@reddit
Snapshots arent that complicated to do, but they are intentionally difficult because they want to discourage you from using the same workflowd as on prem.
Secret_Account07@reddit
Yeah we have good snapshot policy and alerting for our on-prem VMs. Customers know it for quick change & test, but I still have yet to find a good way to do a full VM snapshot in azure
Have a script that does it through Powercli but just seems overly complicated.
Just simple stuff like that makes me hate public could. I get they don’t want hypervisor access or customers breaking stuff but man there’s a hundred small examples where I just don’t get why they can’t get some stuff implemented.
Great excuse for enterprise techs to want VMware and other private clouds.
Secret_Account07@reddit
This is why my org makes this distinction
Private vs public cloud
The default should always be our data center unless there is a really good reason to put in public cloud
ESxCarnage@reddit
100% this, currently did a migration to Azure for part of our environment because the node it was on was dying. Could had we bought new equipment and got it restanding? Sure, but the higher ups didn't want to pay for an actual cluster so we can survive an issue like this in the future. So we decided we no longer wanted to troubleshoot hardware issues and move it to the cloud. It's definitely expensive but the VMware licensing we save on pays it off every year.
desmond_koh@reddit
We're a Hyper-V shop and run Datacenter Edition on everything. All our non-Windows workloads, of which we have quite a few, also run on Hyper-V.
ESxCarnage@reddit
We have another cluster that is dual hosting (some our VMs, and some our parent company's VMs) which is running fine. It's just more the cost of equipment and time to acquire it at the moment. We probably will have some sense of on prem in the future but trying to see realistically what that will be. For context we our a government contractor so the failing equipment was holding the VMs that cannot be on the same physical host as our foreign parent company for compliance reasons. If this was a normal company things would be a lot more simpler.
g-rocklobster@reddit
And not at all uncommon.
Friendly_Ad5044@reddit
You forgot one of the most basic tenets of IT: "The 'cloud' is really just someone else's data center"
BetamaxTheory@reddit
Some years ago now I was an M365 Contractor for one of the big British Supermarket chains.
The first big M365 outage they encountered post-migration, I’m hauled into a PIR to explain the what and the why. Microsoft had declared the issue was due to a bad change that they rolled back.
Senior Manager had a list of Approved Changes on the screen and was fuming as to why Microsoft “had carried out an unauthorised change”.
Genuinely, somehow Senior Management were expecting Microsoft to submit Change Requests to this Supermarket’s IT Department…
LinguaTechnica@reddit
I've got a small one-man band type lawyer client with the same mindset. Baffling.
sbadm1@reddit
That’s hilarious 😂😂
Case_Blue@reddit
The difference is: expectations were not managed.
The cloud CAN go down, the cloud CAN fail.
It's just when it fails, you have tons of engineers and techs working day and night fixing it for everyone.
mahsab@reddit
What are you going to do to prevent this happening in the future?
Exactly
Case_Blue@reddit
That's the nature of cloud computing: you have given up your right to touch your own hardware.
And that's fine, but please do explain to people that WHEN the cloud fails, you have downtime. That's... to be expected.
Sudden_Office8710@reddit
No but M365 is asinine, you have to bring you own spam filtering and your own backup. Then you still have to pay extra for conditional access.
F Microsoft all to hell. I’m standing up a MIAB installation just because Microsoft is not M365 it’s more like M359.
lordjedi@reddit
Your own spam filtering? Since when? Exchange Online has had a spam filter for years. You only need an additional one if you want something that does even more, like ProofPoint or Abnormal.
Sudden_Office8710@reddit
M365s spam filtering is absolute garbage, yes proofpoint, abnormal, mimecast you need one of those in front of M365 at least we do because Microsoft 🤷♀️ just shrugs their shoulders to our problems. Maybe you don’t do the kind of volume that we do so maybe you’re OK with M365 off the rack but we’ve found it to be sub par.
lordjedi@reddit
My point was simply that it has a spam filter. So you don't have to "buy extra".
GWS has one too, but we also put one in front of it.
So the overall point is that no one does it perfectly unless they're in the business of strictly spam filtering.
Crumby_Bread@reddit
Have you tried actually licensing Defender for Office and tuning it and all of its features? It works great to the point we’re moving our customers off of Proofpoint.
hubbyofhoarder@reddit
Same experience here, although we're not an MSP. Have had Barracuda as our spam/security filter for years, and Defender for Office is quantitatively better
Sudden_Office8710@reddit
We have Defender too. We have to have something at the perimeter prior to getting into M365 once it gets to Defender it’s already too late.
TheIncarnated@reddit
That's because those practices are antiquated. M365 doesn't require backups. It is replicated across data centers and you use litigation holds for things that are not supposed to be deleted due to... Litigation and regulations, wild fucking concept, I know.
We have no spam filter in front of our system, we use the built in one and activate all the security features. We are a 100 year old business without any spam issues.
This sounds like a skill/fear issue. Good luck!
pinkycatcher@reddit
Backups serve more purposes than what your implication is
TheIncarnated@reddit
It is not the responsibility of IT to backup users important emails. That is up to them. Outside of litigation and logs, I don't care about the rest, nor is IT required to.
Inform your CIO/CTO if you are required otherwise. Anyways, good luck paying those added costs!
pinkycatcher@reddit
lol.
It's absolutely IT's job to provide and implement the technical tools the business requires to meet business needs.
TheIncarnated@reddit
Like providing archiving? Wild, we do that.
So you are just ignorant, love it!
noiro777@reddit
So you are just arrogant and love being rude to people when their opinions differ from yours, love it!
pinkycatcher@reddit
Dude's the reason IT people are disliked and he doesn't even realize it.
TheIncarnated@reddit
Aww your feefees are hurt. Do you want a floppy disk to cry into?
(By the way, I owe you shit nothing, random person on the Internet. You started the rude behavior.)
I give out what is given to me. Wild concept, I'm sure, to someone like you
pinkycatcher@reddit
Why are you so angry over such a dumb reason?
TooOldForThis81@reddit
Guy has to be on something, or maybe he just needs a hug. Anyways, off to let my users go and manage their backups /s
TheIncarnated@reddit
The key thing is, I'm not lol
I'm having fun now since i responded to a joke comment with a half serious one and the entire community, including yourself, jumped my shit. So enjoy the consequences of your actions and I'm going to enjoy being a troll.
Also, is a Saturday night dude/dudette/dudeother, lighten up and have fun. Work will be there on Monday (and when I'll return to being serious.)
Sudden_Office8710@reddit
If you are hit with ransomware all you fault tolerance goes along with it. We were told that we need separate backup and cyber insurance to be proactive. All your legal hold horseshit is meaningless if your entire instance is fucked.
This is from Microsoft, their security team is a bunch of clueless millennials who thought I was talking about Mountain Dew when I mention code red of the early 2000s 🤣
TheIncarnated@reddit
Damn that sucks they didn't even know what code Red was... Since every single millennial actually knows what that is. So you are ageist and prejudice.
I now know why you don't understand
Sudden_Office8710@reddit
The entire industry is ageist, don’t trust anyone over 30 yeah I’m just telling you what I’ve experienced. I know people that were let go from Google after getting pregnant. Sorry to burst your bubble those are the cold hard facts of the industry. So sorry I triggered you.
TheIncarnated@reddit
Lmao, the one triggered here is you brother.
I couldn't give a fuck. I have had never and will continue to never want to work for FAANG and extension. They are fucked up places to work.
I've experienced those things outside of tech, shocking, I know...
timbotheny26@reddit
Right? How young does this person think Millennials are? The youngest of us turned or are about to turn 29 this year, at least according to every chart/graph I've seen of the birth years.
Sk1tza@reddit
"M365 doesn't require backups"
lol. I hope you don't have any input into anything that matters.
TheIncarnated@reddit
I do! I'm a cloud architect. Anyways, it's wild what you can do with a product, when you understand how it works. Maybe, just possibly, when presented with a different world view, you try to understand.
Also, highly recommend going and getting proper training, good luck!
steaminghotshiitake@reddit
Whelp that's terrifying.
TheIncarnated@reddit
Lol... Keep living in fear???
Are y'all seriously not doing user training?
Hell, are y'all even properly trained on the services backend?
Sudden_Office8710@reddit
They are absolutely necessary for ransomware, human error protection, compliancy implications, business continuity implications. We spend more on M365 than most companies make in revenue in a year. If you don’t have any of the above requirements then yes you don’t require backups but we do and E&O and Cybersecurity insurance.
Sudden_Office8710@reddit
Per Microsoft if you instance is hit with ransomware it is your responsibility to have your own backup. Per Microsoft your spam filtering is your responsibility and your problem. It’s not a skill problem it’s M365 is a giant piece of shit problem.
iama_bad_person@reddit
I mean, having another backup makes sense, 3/2/1 and all, but own spam filtering? Fuck that.
SarcasticFluency@reddit
And what you can touch, is very very controlled.
rodface@reddit
Go cloud, pay money to giant software vendor. When problems arise, you get to wait and see if the team of employees on the vendor's payroll can pull an ace out of the proverbial sleeve, and solve the problem quickly.
Or...
You stay on-prem, pay money to a team of employees that are on your payroll, and hopefully they pull an ace out of their sleeve(s). You have the benefits of:
I could go on, but shoot, isn't having your own IT staff great, instead of paying the big corp$ more money and getting to twiddle your thumbs when things are going south?
Maybe I'm just biased.
tigglysticks@reddit
Most of your points are still there with the cloud. you're still just a drop in their bucket and they do not care about you or your business.
except now you have zero choice. you're beholden to them and their timeline.
I can have my team up in < 4 hours no matter what the failure is. I'll stick with on-prem.
uzlonewolf@reddit
Yeah, but when you outsource, you can shift the blame when things go down. "We didn't do anything wrong, they are the ones who went down!"
Case_Blue@reddit
ding ding!
7FootElvis@reddit
And frankly, significant outages are so rare for Azure.
wazza_the_rockdog@reddit
Yep, if OPs previous data center had frequent outages then just compare the uptime of their DC vs Azure/365 and show customers that while it sucks they encountered it so soon after migrating, the reliability of Azure/365 sounds like it's massive amounts better.
iruleatants@reddit
I mean, I can just give them the writeup from Microsoft regarding the cause of the downtime and how they will prevent it in the future.
I've yet to work for a single company willing to spend extra to ensure there is zero downtime. Never had an sla that didn't account for downtime.
It's still much less likely for Azure to go down than it is for an on premise environment to go down.
We once had our primary and secondary firewall die at the same time and cause an outage, the game plan from leadership wasn't "we should buy four firewalls to make sure it doesn't go down again."
mahsab@reddit
They don't even bother with those anymore. It's just a generic one liner "We're reviewing our xxxxx procdure to identify and prevent similar issues with yyyyyy moving forward.".
I don't believe anyone is talking about zero downtime.
Only if your DC is available globally. Otherwise, I disagree.
Yes, Microsoft has much better hardware infrastructure than most of us ever could have. They have a lot of redundancy and protections for every scenario you can imagine. Some new DCs will even their own nuclear power plants.
But they also have a LOT of software (management, accounting ...) layers on top of the basic services and they are constantly mucking with them regularly breaking things.
Azure never goes down completely, but from a perspective of a single user/tenant/DC, e.g. me, my on-prem environment has had much higher uptime (or fewer outages) than Azure. I can schedule all my maintenance during periods of lowest or even no activity (can't do shit about MS doing maintenance on primary and secondary expressroute during my peak hours). If I break something during maintenance, I will know immediately, I don't need to wait for hours for the issue to be localized back to the team and the change that caused it. Power or internet outages will affect users anyway, while in the latter case they can still access resources locally.
iruleatants@reddit
So you just don't use Azure sources then? They already have their Preliminary Post Incident Review out that documents the incident with Azure Front Door, the root cause, how they responded, and what they are doing to prevent this from happening in the future. It's definitely not a one liner.
Pretty sure we are, but whatever.
You think that Microsoft doesn't provide post incident reports, and yet they do, so I'm sure you'll disagree.
Most companies have a lot of software and they constantly mess with it. That's how business and technology works, unless you are a tiny company.
Ah, so you are the one true sysadmin. Never once made a change that silently broke something that wasn't discovered until down the line? All problems are immediately visible and fixed.
Give it some time, you'll update software for a security vulnerability once day and it will take down some critical business component that shouldn't have been impacted.
bigdaddybodiddly@reddit
Deploying to geographically diverse zones with quick failover or load sharing ?
uzlonewolf@reddit
Doesn't help when your could provider accidentally deletes your account/cloud (as UniSuper found out) or the provider has an infrastructure bug that takes everything out (as Microsoft found out). You really do need multiple cloud providers for high uptime requirements, though problems coordinating them can cause outages too.
AlexEatsBurgers@reddit
Exactly. It's an opportunity to sell additional redundancy to the client. Azure guarantees 99.99% uptime to at least 1 VM if you deploy 2 instances of the VM across redundant availability zones. Azure is already extremely reliable, but if its that critical to a business, they can pay money for 99.99% guaranteed uptime and above.
chapel316@reddit
This is the only real answer.
Sufficient_Yak2025@reddit
The likelihood of it happening again compared to your local DC is minuscule. Migrating (some) resources to Azure from a local DC is overall a good choice.
mahsab@reddit
I disagree about the chances - we are talking about your DC availability to you, not globally.
Azure is extremely resilient about caching fire and things like that, but much less when it comes to configuration and management changes that will break access to their services. They have so many layers of management on top and around their services, things are bound to break as they tinker with them.
lordjedi@reddit
Sure. And then the CEO flies to another state or country and, for whatever reason, the VPN (or whatever else) doesn't function and he/she suddenly can't reach their email. Now your DC being available locally to YOU is meaningless.
Sufficient_Yak2025@reddit
OP literally said “frequent outages” as their reason for migrating. Azure boasts 5 9s for a large number of their services. Enable some geo-replication/backups, or even do cross-cloud and run some infra in AWS/GCP and outages shouldn’t be a problem ever again.
HunnyPuns@reddit
Laughs in AWS East.
Loudergood@reddit
Perigrin Took: "We've had one yes, what about Second cloud?"
dinominant@reddit
Sometimes a cloud outage has no fix and your data is gone forever. Make sure you have a way to pivot if/when the cloud destroys your data or workflows.
Case_Blue@reddit
Well obviously you still need to consider some disaster plans, but how often have you "lost everything" on a major cloud player? Honest question, I've never had this happen yet.
dinominant@reddit
Me personally? In the last 2 years I had a Google account that was impaced. It took weeks to sort that out. It does happen, and sometimes to very large systems. It's frequently in the news.
Fallingdamage@reddit
"Listen, to get this into the cloud, its going to cost you more than overhauling your entire infrastructure. The cloud will be unstable and nothing will work faster than your internet connection can handle. Expect some type of weekly outage. All your capitol expenditures will be the same except you wont need a physical server anymore. We will also need to bill you for a ton of remote work and a sluggish ticketing system that we pretend to pay attention to. Once you get comfortable with the inconveniences, our owner will sell offshore all support, fire the good technicians, and sell the company to a VC firm and go on a cruise. But trust us, this is going to be better in the long run."
Case_Blue@reddit
Yup, pretty much.
It's risk outsourcing.
Adept-Following-1607@reddit (OP)
Yeah yeah I know but try explaining this to a stubborn 65 yo who calls you to extract a zipped folder cause "it's too much work" (They pay my bills so can't really complain but maaaaannnnn)
Darkk_Knight@reddit
Or need help converting a jpeg to pdf so they can upload to a document system.
ImALeaf_OnTheWind@reddit
Or help them scan this doc into the server but scanner is malfunctioning. But the kicker is they printed out this doc from a digital file in the first place!
awful_at_internet@reddit
Solution: check the scanner document feed for plastic dinosaurs.
You might be thinking "haha that's funny but would never happen. Our users are all adults."
So are ours, friend.
Sceptically@reddit
Our users are all fully grown children.
Especially the IT staff.
awful_at_internet@reddit
Hmmmm. Now that you mention it, our office might be full of 3D-printed pokemon, dinosaurs, fidget toys, and other random bits and bobs.
It wasn't one of us, though!
Adept-Following-1607@reddit (OP)
😭😭😭
somesketchykid@reddit
Dont explain, Just show him the cost of replicating everything in separate availability zone in azure and then another estimate with cost of having a 3rd replicas idle and waiting to be spun up in AWS
Show him the time it would take to complete that fail over exercise
Once he sees the cost in money and labor to ensure 100% uptime no matter what he will shut up.
CraigAT@reddit
Clicked Refresh, a lot!
countsachot@reddit
CAN=WILL
Icedman81@reddit
Let me rephrase that for you:
It's never a matter of "can". It will go down. It is, after all, just someone else's computer.
Case_Blue@reddit
Agreed
Traditional-Fee5773@reddit
"Everything fails, all the time" - AWS CTO (but I suspect he was talking about Azure)
blbd@reddit
He was talking about one alarm fires. The big cloud providers are so huge it's effectively statistically impossible for them not to have a handful of equipment failures in every single facility every single second and minute of the year. So they responded by engineering in the fault tolerance for those cases.
Because of which the multi alarm fires are surprisingly improbable and usually only happen because of abjectly bizarre failures from cross facility common code pushes a lot more often than any hardware problem even a horrible one.
Case_Blue@reddit
Eh, he wasn't wrong.
Somewhat related: I once had a call with a partner who manages the Nutanix clusters in our datacenter.
He refused to come online at 3AM because "we... didn't change anything "
"Well shit, neither did we, so let's all go home then!"
MaelstromFL@reddit
You all had expectations? /s
GoBeavers7@reddit
I've been managing M365 and Azure for the last 10 years for mutli-location companies across the US and Canada. In that time there have been 2 outages, both were recovered in less than 2 hours. Prior to moving services to the cloud the outages were more frequent and took much longer to resolve. Especially as the hardware aged.
The cost to recreate M365 and Azure are simply not affordable.
ScroogeMcDuckFace2@reddit
when in the cloud, expect rain.
Netghod@reddit
There is no cloud. It’s just someone else’s computer.
Outages happen. They need to be planned for in one way or another.
As for the meeting - timeline of failure(s). Clear explanation of the what happened in the cloud.
And recommendations on next steps based on lessons learned.
Don’t play the blame game…
Dermotronn@reddit
Never do shit of a Friday evening if the company works 9-5. Most companies have weekends off or absolute bare minimum staff so running into an issues leaves you devoid of backup support
PrimaryDry5614@reddit
You cant control what you cant control, the truth will set you free. Even Fortune 5 cloud solutions have outages, its the nature of the beast, nothing has 100% uptime.
No_Match_6578@reddit
How does migrating actually work? I keep hearing but never had to do it. How does it go, what is needed? I can't understand something I never had to do and thinks I don't know drive me crazy.
CPAtech@reddit
Was it your decision? If not, then you just give straight facts.
If the expectation was that there were no outages in 365 then whomever made the decision did zero research and should be called out on it. If that's you, good luck.
No_Anybody_3282@reddit
This is the way.
Snackopotamus@reddit
Tbh, if you didn’t sign off the decision, don’t carry the blame. own the report, not the original call. phrase it like “we recommend” instead of “we failed.” keeps you professional.
neucjc@reddit
This.
L3TH3RGY@reddit
Agree.
Valkeyere@reddit
The difference is number of 9s uptime.
More redundancy just means more 9s, and cost scales up exponentially.
It's rare for azure to have global outages, just major regions. So you need your estate replicated across regions, data sovereignty allowing.
Actually not sure if you can have Entra exist across two regions, surely you can buy idk for sure.
Even then it's not 100%, it's number of 9s.
And the '5 mins a year' they'll never really meet.
As others have said, if someone on your end sold them 100% uptime they've lied. But Microsoft is going to provide a higher uptime at a more reasonable scale than you can manage with on prem or 3rd party data enter just due to the economy of scale. An outage doesn't counter this.
tigglysticks@reddit
And this is why I have been pushing against the cloud since it's inception.
Ok_Discount_9727@reddit
No better explanation for the cloud “is just another Datacenter and can go down like any other” than this.
AugieKS@reddit
Anti-depressant recommendation, I got that. Venlafaxine, aka Effexor, has been great for me. It is an SNRI, so it blocks re-uptake of both serotonin and norepinephrine. Does wonders for my depression and my anxiety.
Downside, though, it has legitimate withdrawal symptoms that kick-in in as little as an hour after missing a dose. Pretty bad ones, too, considered the worst by many doctors and patients who have been on many different therapies. Having been on at least one of the other big ones, Paxil, and Venlafaxine, Venlafaxine is worse by far imo. It's like having the flu, but a really bad case, and takes a few hours or more after taking your meds to fade. You do get a little warning before the worst sets in though, GI upset usually comes first for me, and if I don't take it after that sets in I am in for a rough day, but it will subside if I catch it then.
But if you are good at taking your meds on time, don't skip doses, don't forget to get your refills, it's pretty good.
Adept-Following-1607@reddit (OP)
that is... detailed.
Grrl_geek@reddit
Venlafaxine sucks.
Assumeweknow@reddit
Why didn't you have multi-data-center redundancy? Just asking...
Kyp2010@reddit
"The cloud is just another data center, in the end. It is and has always been subject to outages despite promises from salespeople."
acniv@reddit
Wait until the cost of cloud-flation starts to kick in. The sr staff wants less IT and less IT infra onsite and then start to bitch about how much the fees are increasing. Never seen the self-storage bait and switch model used so effectively outside of self-storage...they get what they deserve.
jimlahey420@reddit
I love when non-technical people in positions of power look at our 99.9% uptime with on-prem and say "how do we get to 100%?" and then float the "cloud" as a solution to that "issue".
Timzy@reddit
Honestly since I created a database that scraps scheduled changes for cloud platforms. I highlight any that may be of concern. Any other isssues are squarely on them. If they don’t have an RCA in place then it’s them going to these meetings.
AmbassadorDefiant105@reddit
What are the SLAs for the clients(s). If your stakeholders are expecting 95 to 99% uptime then tell them to pay up for a DR site.
stonecoldcoldstone@reddit
we went through that process with our catering provider, they wanted their system in the cloud rather than the on prem vmhost.
surprise surprise there is an advantage to on prem cloud sync rather than having every transaction connecting to the cloud in real time.
after moaning about their till speed for a year we had them migrate back, they tried to blame the broadband and it took quite a long time to convey that "you'll never have the connection to yourself, if you want to make money quicker move back on prem"
1a2b3c4d_1a2b3c4d@reddit
Just give them the facts. No emotions, no conclusions, no opinions.
Just describe what happened, and back it with Microsoft's official explanation.
stevenm_83@reddit
Was azure actually down? Only the portals was?
Nguyen-Moon@reddit
That no SLA has 100% availability and there was a pretty big outage last week.
https://azure.status.microsoft/en-us/status/history/
Geminii27@reddit
Say how long Azure was down. Maybe mention well-known other Azure outages from the past year or two. IF you start getting thrown under the bus, you can say that the decision to switch to Azure was not made by the company IT department; it was only handed to IT as something to be implemented without argument. (And, assuming there is proof, that the IT department argued against it at the time due to, in part, known issues with the reliability of third-party service providers. And were overruled.)
No point in bringing that up until and unless there's an attempt to put blame on IT, though.
ocdtrekkie@reddit
My Exchange server is historically at least twice as reliable as Microsoft's. "The more they overthink the plumbing, the easier it is to stop up the drain."
Industry's gone crazy.
ghostalker4742@reddit
Souvenirs, from one surgeon to another :)
Sam98961@reddit
I call it, "Failover Friday." Let's just test that HA.
Antique_Grapefruit_5@reddit
The great part about the cloud is that it costs much more than your on-prem solution, support sucks, and when it breaks is still your problem-but your hands are tied and all you can do is sit there and get kicked in the goodies until it's fixed....
Tall-Geologist-1452@reddit
If you lift and shift 100% , if you re-architect, then no..The cloud ( Azure ) is not on prem and can not be managed the same way even tho alot of the skill set does migrate.
realityhurtme@reddit
Everyone loves M363.5 except when they don't, we are also moving our secondary Data centre to Azure to increase resiliency (save a line item for the building at the expense of a huge subscription bill). Friday was not abnormal, your Tenancy and Azure may be up, but good luck accessing it when some other part of their infra goes tits up.
trueppp@reddit
And then often forget the On-Prem infrastructure outages or downtime. I am way happier getting yelled at on the rare occasion M365 goes down that all the evenings I spent fixing corrupt Exchange databases, installing security patches, Installing CU's (When you have 200+ Exchange servers to update, you really have your work cut out for you....)
TreborG2@reddit
Give them an explanation of the difference in up time, vs costs.
multiple locations requiring multiple high speed access lines
multiple servers with multiple connection points
... with each factor of the word "multiple" your costs to maintain and support this go exponentially upward.
but ... by being in the cloud .. the complexity and costs for local staff and IT needs goes down. Has higher visibility within the cloud's engineers and people specifically trained to work towards resolution ..
So .. same services at 15 to 20 times the cost?
trueppp@reddit
It all depends on your needs and size.
BoilerroomITdweller@reddit
Microsoft is so bad for their outages because they have “everything is running fine” on their status pages and things go down for days they won’t admit. I mean they cannot beat Crowdstrike but they are 2nd in line.
We can’t rely on them because we run patient saving computer software and we cannot just have patients die.
The problem is Microsoft doesn’t have ANY fail over. An outage affects everyone at once.
We use Hybrid Join so we can use Entra if needed but it fails over to the domain. We have VPN. They use OneDrive with local backup though.
trueppp@reddit
What.....
marafado88@reddit
There's no bullet proof eco system, this is the hard truth.
Pitiful_Caterpillar4@reddit
We have a bingo!
DevinSysAdmin@reddit
These are meetings? That's an email.
jeffrey_f@reddit
Go find the statement from Microsoft about this and post what they said and make sure that you explain that nothing about the outage had anything to do with you or the company. Furthermore, if they want more information they should call Microsoft directtly.
bbqwatermelon@reddit
When a doctor sold his practice to a big city practice, they immediately moved the electronic medical record software from the local server I had upgraded with full flash storage after identifying it as a bottleneck to hosted software that was used over RDP or RdWeb and the whole firm then complained about performance. The doctor who sold the practice was still on for a year in consulting and he took me aside and begged me to bring the EMR back in house. I "begrudgingly" and "sympathetically" shrugged my shoulders and informed him I could do nothing about it. Learn to enjoy having less responsibility.
neucjc@reddit
“You did it wrong”.
Deep-Trick7995@reddit
LRS , GRS or ZRS?
Deep-Trick7995@reddit
Oups… global !
BarronVonCheese@reddit
Just hand them the MS outage report and tell them that’s all we’ll ever know, welcome, to THE CLOUD!
Pyrostasis@reddit
A little wild turkey or some old grandad works for me.
wired43@reddit
Cloud is a scam.
It looks attractive in the short term because of low monthlies if configured in a cheap way.
However, they can never live up to their promises of uptime.
lordjedi@reddit
Doesn't MS give some kind of after action or status page? Give them that report.
Then you can recommend that they keep their data in multiple regions. Yep, it'll cost more, but it'll result in less downtime.
HunnyPuns@reddit
The report should include quips about the sky falling.
Fallingdamage@reddit
Just tell them your boss thinks that lift and shift makes for more billable hours and expensive service contracts than keeping anything on prem. That convincing them to spend tens of thousands in the hope that their capex would be reduced by maybe 15% while opex goes through the roof is the grift that pays the bills.
beigemore@reddit
You just tell them the truth and move on with your life. Things happen that we cannot control.
My company just migrated from on-prem Cisco Call Manager to Teams phones on Wednesday and then the outage happened, so that was fun.
MasterTater02@reddit
MS claims (5) 9's on uptime. Frankly my mileage varies
Holiday_Voice3408@reddit
Lions mane is pretty dope.
icanhazausername@reddit
With an on-premise environment, there is a neck to choke when something goes down. There is no neck to choke for a cloud outage. If you are to set expectations of the cloud experience, keep in mind you generally can't call Microsoft or AWS and yell at them to fix it and ask when it will be back up.
itmgr2024@reddit
nothing is perfect. If the downtime is less then they should be happy. If they want perfect tell them to pay up the wazoo for realtime replication and standby for everything.
chandleya@reddit
Your org oversold the fuck out of their SLAs lol
ne1c4n@reddit
Did you add redundant/failover systems in other regions? Are they willing to pay for that? Azure does have downtime, but it's usually limited to a region or 2, not Azure wide. Also, you could have the same redundancy on AWS, paired with Azure if you really want. They simply need to pay more if they want 100% uptime.
olizet42@reddit
I guess they have chosen the cheapest stuff. Cloud is expensive if you are doing it right.
Cormacolinde@reddit
Exactly what my take would be. Azure will have failures, what’s your HA/redundancy/DR plan when it happens?
Askew_2016@reddit
We have the same issues with pushing all reporting from MicroStrategy, Cognos, Tableau to PowerBI. Yes it is cheaper but the reports are completely unstable and only run a small percentage of time.
They need to stop looking at software/data platform $$ in a vacuum. A lot of times the cheaper they are the worse they function
JerryRiceOfOhio2@reddit
my place went from on site to cloud. when there were issues on site, everyone lost their minds and everyone ran to fix the problem. with cloud, when there's an issue, everyone just shrugs and plays on their phone until things work. so there's that benefit. maybe just present a shrug emoji to your customers and say it's not your fault
tfn105@reddit
I think we all get it - it sucks when you’re in the middle of a production outage.
When the dust settles, here are some things your firm needs to consider (not just you)…
On prem or cloud… they just elicit different requirements in designing your platform to be resilient.
Cloud world, Azure/AWS/GCP are responsible for delivering their data centres up to spec and providing you multiple DCs in a given region that can’t have correlated failures. Your responsibility is to design and deploy your services to take advantage of this.
On prem, you have the same software obligations except you also have to build your data centres to the same level of operational planning as the cloud.
Chocol8Cheese@reddit
Still better than some self hosted nonsense. Get an o365 outage report for the last 12 months vs the old data center. Shit happens, like when your fiber gets dug up for the third time in three years.
Forumschlampe@reddit
Just tell them Microsoft is the superscaler with the biggest outages
https://azure.status.microsoft/en-us/status/history/
No this will not be the only outage u will experience and there is nothing u can do as long as u rely on Azure.
expiro@reddit
Calm down sysadmin. This is inevitable. This is our fate. You can’t solve any fucking shit with this emotion. Explain to people nothing about this downtime. Instead, explain why is happened and who is the blame (microsoft)… and make it feel who tf responsible for that azure migration was a bit uncomfortable. All with nicely calm speaking. They will let you alone and search the problem at their decisions ;)
Asleep_Spray274@reddit
Hold up, hold up, are you saying that even the cloud can have down time?
But I don't have to fix it you say 🤔
Helpjuice@reddit
The less downtime you want, the more you have to pay for it and distribute what needs to be kept available. Multi-cloud and private data center solutions would reduce the probability of downtime problems.
Instead of putting all of your eggs in one basic, your services should be hosted on-premises and in multiple cloud providers (hybrid) in locations 150 miles apart at a minimum in case a region becomes unavailable. If you are in the USA best practice if budget allows for it is to host your content on the West, Central, and East parts within the country.
Some things to help enable real uptime - All content should be served over a CDN (can and probably should be many in case one goes down). - Edge nodes should be setup in various locations of importance to include PoPs. - Internal data center to cloud private links should be setup to speed up non-internet based traffic. - Global load balancing should be default - Flash storage should be default for hot systems that need to serve content fast - Spinning disks should potentially be in the mix for massive storage if all flash is not an option - Firewalls should be kept up to date, hardened and monitored remotely. - Layered defenses and advanced technology should be put in place to proactively detect threats and operational issues before they become outages.
If you cannot cut the link to a data center and your operations don't continue running smoothly then there is work to be done if uptime is of the highest importance. Things will fail, but the company can pay to reduce the impact to the business when things do fail when information systems and security is strategically and properly setup, maintained, and upgraded continuously.
Provide the risks of not doing so in your meeting, tell them their risk acceptance to use a single cloud provider and not have multiple options increased the risks of outages impacted the business. The better approach would be multiple cloud providers and a hybrid approach. Any pushback let them accept the risk in writing and deal with it.
nixerx@reddit
Word. cloud is sold as always on. NOTHING is always on.
Traditional-Fee5773@reddit
Sorry to say that Azure was the wrong choice if reliability was a key factor, it's well known for frequent and fairly long outages, often global.
AseGod-Ulf@reddit
Realistic expectations based on terms of the contract. Also setting the understanding that 100 percent uptime isn’t truly realistic. The focus sets the perfect example of how an outage can be resolved by Microsoft same day versus. Human expectations and personality will be the sell on this
Due_Peak_6428@reddit
you need to set expectations, downtimes are inevitable.