[15YoE] How normal is it to never have worked on high-availability systems?

Posted by TempleBarIsOverrated@reddit | ExperiencedDevs | View on Reddit | 74 comments

I'm interviewing and preparing some system design stages which made me think about my career so far (Europe) and I realized I never really worked on systems that require special approaches to handle the load. Now I'm wondering if I somehow missed the boat on gaining some experience solving technical problems.

Started my career as a simple backend web developer where the entire team was writing SQL queries in the procedural PHP website. No need for any high load capabilities.

Next up was part of a team tasked with a rewrite of an old C# WPF application into "microservices" where somehow people decided we needed roughly 10 machines to replace PART of the WPF application to handle the same load. Again no need for any high load, rather just working on cleaning up the WTF stuff.

After that I became tech lead for a while in a small shop where again most of the time was spent stopping colleagues from doing dumb shit and spent a lot of time building pipelines and setting limits on what could be done manually (we used to spend an afternoon each sprint with a "code freeze" so the previous team lead could merge all SVN branches..). Again no need for any high load code, rather just raising the floor on what we can do as a team.

Last job I was part of a team working on "microservices" where most of the logic was in stored procedures in a SQL db that was owned by an overseas team. Again no need for any high performance code since the main perf bottleneck was known: SQL db with a team that doesn't want to let go of the control.

And to top it all of: my current job is to get rid of most of the legacy stuff. We have some decent load but all of it is spent asynchronously (web scraping at night). Here again I'm running into workload capacity issues where we're a 3 man team for 5 applications. You can imagine there's no space to work on performance improvements.

So after all of that I'm left 15 years older, and ne'er a chance to work on low latency projects, or deploying microservices in a gradual manner, or any of the stuff that system design tackles. Is this a normal career or did I miss the boat somewhere?

[-]

hundo3d@reddit

Normal. I’m 8 years in. About to start my first big-ish tech job, which will be my first time working on a system estimated to handle significant load.

[-]

MrMo1@reddit

Imo I would say it's normal. Ugly truth is that most software systems out there don't need to handle load levels that can't be handled by your regular monolith design.

A lot of them go the distributed/microservice way to pad cv's and attract talent or just copy cat what tge very few big name companies with actual use case for these things do.

[-]

TempleBarIsOverrated@reddit (OP)

That's the feeling I've had for the past 2 jobs which is where "microservices" were chosen. It's especially egregious when many of those services directly connect to the same database, meaning you can't even deploy any changes easily and have to coordinate within your own team.

95% of the workload they did could be handled in a single monolith.

[-]

gyroda@reddit

It's especially egregious when many of those services directly connect to the same database,

This is why each microservice needs to own it's own data storage.

I've built two microservices that are pretty closely related, but neither directly accesses the other's data storage, instead you just invoke API endpoints.

[-]

stringbeans25@reddit

I’ll be honest my opinions changed on this. I used to think one service, one data storage but that is a very blurry line. Logical separation is really the only thing that matters, separate users/roles/schemas. Data store per service is just another way to do it. The other big thing I’ve started following is only one writer but anyone can read.

[-]

TempleBarIsOverrated@reddit (OP)

I'm fighting a losing battle at my current job regarding this, which is a big part of why I'm prepping interviews again.

We have tons of these microservices synchronously connecting to other services to get some information that can easily be duplicated and stored. Unfortunately we don't have a single event bus or similar that could help with this (and I've given up trying to introduce one due to no mgmt buy-in). This leads to terrible performance in some public API endpoints, but everyone keeps pointing fingers to "legacy"...

[-]

jenkinsleroi@reddit

This is probably typical, because people are afraid to duplicate data and distribute state. And can't get past CRUD.

[-]

Clem_l-l_Fandango@reddit

The old tightly coupled, low cohesion pattern, gotta love it 😆

[-]

TempleBarIsOverrated@reddit (OP)

Unrelated, but love the username.

[-]

gfivksiausuwjtjtnv@reddit

Distributed monolith. Worse than either option. And there’s no realistic way you can untangle that into an actual distributed system.

[-]

JaySocials671@reddit

There is. The realistic way is patience.

[-]

Stephonovich@reddit

The main problems with that approach are the high likelihood of data duplication without any hope of referential integrity, and the vanishingly small chance that the teams have any clue how to properly model a schema, write performant queries, etc.

[-]

scodagama1@reddit

The issue with "micro"services as done by companies that actually need these architectures (Amazon pops to my mind, I used to work there) is that there's nothing micro about them. A typical micro service that's part of Amazon's or AWS's product is a fully featured application operated by a 5 person team

So unless the company has 50 engineers who can be split into 10 teams - 8 own "micro"services each and remaining 2 build tooling to survive that craziness this architecture is pointless. Microservices are a solution to organizational problems, not some magic architecture. IIRC Amazon mandated switch to that architecture when compilation time of their monolith started approaching 20 hours mark - a clear indicator that something had to change if development wants to continue in the fast pace

[-]

zuilli@reddit

compilation time of their monolith started approaching 20 hours mark

Holy shit, and I thought the 1.5h build time of one of the projects I worked on was bad

[-]

stringbeans25@reddit

What language are you all using? These build times are insane!

[-]

gorliggs@reddit

Same. We have "microservices" to the same database with a "schema sync". Notice how those who implement these designs never stay and somehow get "staff" and "principal" roles.

It's rare to work on a system since you get stuck maintaining the cluster the last person left.

[-]

Bstochastic@reddit

Sounds like a distributed monolith. Worst of both worlds.

[-]

Clem_l-l_Fandango@reddit

Ugh Microservices for style points is painful. There’s a few unique circumstances that could warrant it, maybe if you have some heavy calculations, or a rules engine type service that’s too heavy to be on the main or background threads of your main app.

I laughed pretty hard when Amazon Prime announced they reduced their cloud bill by 80~% by going back to being a monolith (Apparently the sending video data across thousands of wires was much more expensive than just scaling resources)

[-]

This-Layer-4447@reddit

those are not microservices, just SOA

[-]

DizzyAmphibian309@reddit

At our company we call that "Promotion-driven design" and it's incredibly hard to deal with. Sure you can say things like "it seems like overkill to me" but it's hard when you're dealing with bad faith actors. I had to fight for years to get a "cellular rearchitecture" project shut down because the "selling points" in the proposal could be addressed by much easier, cheaper, and lightweight alternatives. It wasn't until I found a hard blocker to the proposal that the important people started listening. That and of course, money. I crunched the numbers and suddenly adding in some extra "deployment safety" instead of rearchitecting the system to reduce the blast radius of bad deployments sounded a lot better.

[-]

Stephonovich@reddit

It’s worse than that, IMO. The increased overhead from all of the orchestration layers and the increased latency from all of the remote calls often leads people to mistakenly assume that their app is very demanding, and they’ll use that as proof of the necessity of further splitting up of services. That, and the upsettingly small percentage of devs who profile their code at all.

This is how you get shit like the Vercel crowd complaining about their $5000/month bill, and when you look at their consumed compute and bandwidth, it could’ve been ran on an RPi without much fuss.

[-]

Sparaucchio@reddit

Microservices also help consultancy company bill lots more hours

[-]

355_over_113@reddit

The interview industry is geared towards young people who have time to practice and regurgitate the interview materials. This is because the industry favors young people who would be compliant to management instructions because they are too inexperienced to see their managers' shortcomings or mistakes.

[-]

anuj_pardeshi_19@reddit

This is an incredibly normal career. In fact, you just described the reality for 90% of senior developers who aren't at a massive tech company or a high-frequency trading shop.

You're looking at this as "missing the boat" on high-load systems. I see it differently. You've spent 15 years doing the actual hard work: raising the floor, stopping teams from doing dumb shit, and cleaning up messes left by others. That's a much more stressful and arguably more valuable skill than just tweaking a load balancer. It's the kind of work that leads directly to burnout because you're fighting entropy, not just writing code.

The challenge isn't your technical skill; it's managing the immense cognitive load of being the designated "fixer" for over a decade. I actually built an interactive protocol for that specific kind of stress. DM me if you're curious about it. Looking back, which of those "clean up" jobs was the most draining for you?

[-]

puremourning@reddit

HA and load are not … the same .. or even related. I think not getting that distinction is much more of a concern.

[-]

ScoobyDoobyGazebo@reddit

It's quite common in FAANG to have systems that need to be highly available, planet scale, and low latency everywhere.

In practice, the requirements occur together often enough that I think lots of folks just treat it as one generalized grab bag of stuff to know about system design. It doesn't seem like a huge deal.

[-]

kohossle@reddit

Thanks for this question. The Discussion gave me some perspective since I've been studying systems design interview currently.

[-]

rundef@reddit

If you want to work on more advanced systems, you actually have to find a job where you will do that (and prep accordingly)

15YoE here as well. I spent the first 11 years working on web stuff just like you.

I started working for a trading firm 4 years ago. I now work on distributed systems and low-latency stuff.

[-]

kohossle@reddit

Is your work more difficult now? Or more fulfilling?

[-]

TempleBarIsOverrated@reddit (OP)

I'm interested in how you got in. Prepped the hell out of it with DSA and system designs, or did you take a less intensive approach?

[-]

rundef@reddit

Honestly, my timing was good: it was during the 2021 hiring frenzy ! My preparation was similar: DSA, system designs, easy/medium leetcodes and a few good stories to tell during interviews. Sounds like you already prepared more than me at the time.

Good luck !

[-]

Senior-Secret-7113@reddit

It very normal. Most engineers never have to work on such things. Its only when ur working for a cutting edge startup or at Big Tech with billions of users do you need to think of these things.

[-]

ScoobyDoobyGazebo@reddit

It's completely, totally normal. Don't impostor syndrome yourself over random stuff like that.

most of the time was spent stopping colleagues from doing dumb shit and spent a lot of time building pipelines and setting limits on what could be done manually

we're a 3 man team for 5 applications

These could all be apt descriptions for 80% of FAANG jobs, too.

Working at scale and/or chasing lots of nines comes down to a handful of technical skills and techniques, much like everything else. Once you get into the team and ramp up on some of that stuff, you'll be surprised how much it feels like every other shit SWE job.

[-]

Schmittfried@reddit

Pretty normal. Most software is boring.

[-]

TempleBarIsOverrated@reddit (OP)

Honestly, that's ok for me. Preparing for interviews and doing them just gives this big feeling of "shit, I've never needed any of this in over a decade.".

Learning new things just to never use them is quite demotivating. I've stopped doing that for a few years now.

[-]

thephotoman@reddit

And this is why we keep saying that the process is broken.

The real problem is that we're unwilling to try to do better when we're on the hiring side. We keep coming up with leetcode questions because we, for whatever reason, have decided that this is what we "should" do.

But it doesn't work. We know it doesn't work. And we refuse to try to do better.

[-]

NeuralHijacker@reddit

Personally I made sure that I get on to working on the critical stuff that requires all this tech as soon as I could. My concern over working on 'boring' systems is that it's a lot easier to find that your skills have become commoditised and replaced by AI/Indians.

People are a lot less willing to take a chance with money saving tricks like outsourcing when they have to work to a 100 ms latency SLA with 5 nines of availability required.

[-]

Clem_l-l_Fandango@reddit

That’s actually brilliant. Buying time until we need to fix all the AI slop that slipped in 😂

[-]

NeuralHijacker@reddit

A few year ago I set myself a few rules for my career:

Work on products that are being sold not cost centres
Work in regulated industries where there is the possibility executives could face criminal sanctions if they're sloppy with things like security or data privacy
Working technical areas that are considered leading edge, then move on to the next thing before they become commoditised ( used to be DevOps, then big data, now AI )
Reassess the above every 12 months

It's not foolproof but it's done me pretty well so far without having to go through the whole FAANG grind

[-]

BDHarrington7@reddit

I’m re-evaluating my career and have came to the same conclusion about what I should be targeting.

I noticed crypto / web 3 is not in your list, how did you asses that trend?

[-]

NeuralHijacker@reddit

Crypto and web3 are something of an autistic special interest of mine, I've made pretty decent money speculating on them. I did look into working in them as a career but because of the crazy fluctuations in value they've struggled to gain consistent traction with companies that are not just startups until now. The problem is that getting large projects moving in big companies takes time and political capital, and if you do that with a crypto project and then a situation like FTX comes along, it's all for nothing.

I prefer to work in areas that are not surfing the hype cycle. The sort of AI that I work in is not llms for the most part but it enables me to pivot to that easily.

I actually think that crypto is becoming a much more serious proposition now the fluctuations have calmed down a bit. Stable coins are absolutely massive already. People are still very prejudiced against it because of the stuff that's happened before but a lot of people in finance are now taking it extremely seriously.

[-]

Abadabadon@reddit

Just watch some system design interviews. "Oh yea we'll need a load balancer here, a service that will cache, maybe replicate the DB" yada yada yada

[-]

Abject-Kitchen3198@reddit

Same. There are areas that you can't really learn without working on real systems that need them. Scalability and availability being one of them. Sure, I can click auto scale and multi AZ on a cloud provider and claim that I can provide 99.99% availability and infinite scalability.

[-]

LondonPilot@reddit

At my previous job (company is now in administration due to complete incompetence - tech was not our core business, but the incompetence in the core business was enough to send us into administration and the tech incompetence was even worse) they decided to hire an entire brand new DevOps team, where the devs had managed perfectly well without one before.

One of the first things the new team wanted to do was move my software to Kubernetes.

I pointed out that we have less than a dozen staff members using the system simultaneously. And a few thousand clients, of whom only a very small percentage would use the system simultaneously. It was still in development so we didn’t yet have any actual production stats, but it was literally impossible for it to ever have loads that would need any kind of horizontal scaling unless it was really poorly written, and we were able to produce load tests that stressed it way beyond what it would ever experience in production, on a single instance, with no issues.

It took me several weeks to win that argument. I only won it by compromising on letting them use Docker, which was also completely unnecessary in a mostly-CRUD .Net application that had no external dependencies other than those we downloaded from Nuget. By using Docker, we lost the ability to use Azure app service deployment slots, which meant the DevOps team had to implement their own blue/green deployment system (and spend many hours fixing bugs in it) that did exactly the same job. But at least we didn’t have to use Kubernetes.

The idea that everything has to be highly scalable is complete nonsense. Having said that, there are lots of systems that do need scalability, and I wonder if some of the people who push to support scalability where it’s not needed are indulging in CV-driven development (and others simply don’t know what they’re doing, and really do believe it’s necessary.)

[-]

dogo_fren@reddit

And hot garbage :)

[-]

RogueJello@reddit

And produced by FAANG companies.

[-]

Schmittfried@reddit

Obviously no?

[-]

RogueJello@reddit

Google graveyard says otherwise.....

[-]

dontquestionmyaction@reddit

Google kills more expensive and better made projects daily than most people ever make in their career.

[-]

Deranged40@reddit

Most? No.

Yeah, they produce plenty, but no, not anywhere close to most.

[-]

Existing_Customer392@reddit

I'd say that high-availability is to keep it working to attend business necessities. If it's 9-5 Monday to Friday, then you have your guidance about availability.

As everyone pointed out, most of the software is pretty boring and with medium to low demand.

[-]

bwainfweeze@reddit

It’s true that there’s more than one way to lose availability and in particular there is a large set of things you can’t do because planned outages stop being a thing.

To the point where you really need generators, battery backup, and to prove that the two companies providing you internet aren’t both using the same trench outside the building where a single backhoe can cost you millions of dollars.

Upgrading hardware at 7 pm stops being a thing. Or like at my last place, it was still a thing but only because maintenance and performance regressions were more tenable outside peak hours (a situation I pushed back on for years until many tasks could also be done during daylight hours).

And there are tasks you need to do right the first time because even like organizing the network cables or power cords could result in an outage.

[-]

Intelligent_Click_41@reddit

I’ve been in critical infrastructure system development since my internship 7 years ago. I find it pretty common that most software developers don’t ever have to think about the rigorous planning and design that comes with these kinds of systems.

The software we develop usually have very strict requirements for availability. Typically each software component in our system, has at least redundancy and high availability. Usually these kinds of systems also need to have non repudiation on event logs and replay capabilities.

One sort of downside to this, is that regardless of the typical tutorial/guides you read on stuff, you’ll end up having to deep dive the documentation to see if whatever you want to use support all this HA requirements.

And of course let’s not forget about the joy of air gapped deployments and installations. No cloud is the best cloud.

[-]

bwainfweeze@reddit

About ten years ago I mentioned to someone that I took a distributed computing class in college. Not only had they not, but they also didn’t recall it even being an option. So then I asked everyone in earshot and got the same answer. I took that class in 1994, and it’s still top 4 of “classes I actually used after college”. I have of course had to half keep up with the evolution of things over time but they all are extensions of those principles. Set theory hasn’t changed much in comparison.

I’ve since talked to quite a lot of 30 year olds who say the same thing. No such class. WTF is going on out there?

[-]

SeriousDabbler@reddit

In my experience this is something that developers really care about, and some care enough to do it well. I'm not sure how you can get a chance to practice on this stuff but there are some great enabling technologies that you could learn to get ahead of the next opportunity. Things like docker, nginx and the cloud hosted load balancing, container hosting and elastic compute resources which if you spend a bunch of time getting familiar with it could give you an edge when interviewing for your next role

[-]

MaiMee-_-@reddit

I think you missed being intentional on what you learn and what experiences you pick up along your career path.

I think a lot of people's advice is to "move when you stop learning new stuff"?

If you ask how normal it is... I think there's people who go by 10 years stuck in the same place, no good mentors, no challenging systems, no good engineering practices. And they look... mediocre... to me. But their title/grade reflects that: nothing more than a "senior".

If you aren't in that position, I think you're in a better place. Is it the best place? Probably not. Is it acceptable for your career plans ahead, would be my question.

I was pretty religious on shopping for the right experiences, when I was still trying to climb the seniority ladder. Which I think I should probably get back at on my next career move. Hm... Anyway. I've just been in this for 3 years so.. shouldn't know more than you.

[-]

lokaaarrr@reddit

By hours worked, the vast majority of software engineering is for small scale / internal only systems, generally with pretty lax requirements.

Very few people (overall) work on large complex systems serving large numbers of users.

[-]

SongFromHenesys@reddit

Im jealous. My last 2 years revolve around designing a backend that will take gigantic spikes of load due to the way our product works (and its marketing).. Caching, indexes, and lookup tables/containers is all Im thinking about lately, along with begging for upgrades of cpu/ram on some service plans...

[-]

diablo1128@reddit

I think it's pretty normal when you look at the industry at large and not just big tech companies. 15 YOE here as well and I have only worked on embedded devices for me career.

I have never needed to think about scalability, microservices, or anything like that since there is only 1 person using a devices at a time. We did have cloud connectivity on these devices, but that was all handled by a separate team. I would learn these things on the job if necessary, but I have zero interest to do that in my free time.

Our situations are not exactly the same, but in my case I find it limiting on the jobs I can apply to. I just enjoy working on physical devices over web stuff so I just deal with it.

[-]

daredeviloper@reddit

Just like we aren’t inverting binary trees, we are also not building high scale globally fault tolerant distributed systems that require specialized patterns.

But the companies want it just incase?

[-]

Stephonovich@reddit

It’s vitally important that you realize that while there are a ton of services and layers you will need to mention in the interview, it’s entirely performative, and few of the interviewers could actually run the systems they describe on paper.

Also, knowledge of how much load a given application will actually put on a server - and how much load that server can handle - is near zero.

I had to fix one of our system design interviews recently, because as part of the normal “now scale it up” steps, it stated that the RDBMS needed to be able to handle the dizzying number of… 1 million rows. Oh no, how will my DB handle this massive table that fits entirely into memory, with several GiB to spare?!

[-]

snafoomoose@reddit

Very late in my career and I have never worked in high availability.

In fact our group has been pushing back on external contractors who were brought in to implement load balancers and hot backups because our tools simply do not need that kind of expense and trouble and we are going to have to waste our time and effort to undo what the contractors set up because it is not needed.

(The hot backups were needed by other tools in our organization, but the 'one-size-fits-all' mentality had them applying the same solution to our set of tools that just did not fit.)

[-]

Clem_l-l_Fandango@reddit

It depends a lot on how many companies you have worked for, it’s pretty easy to go that time and not have to deal with it.

I would say it’s very important to learn, to understand good strategies if you are faced with it, but also knowing that you might never need it. Ultimately it’s another tool in the chest, and the more of those you have, the better off you are

[-]

Legal-Trust5837@reddit

It's not normal but it's expected if you haven't been intentional in your career moves.

It becomes a problem when you're looking for a higher calibre position than what you've done up to today, as the expectation from you is different, and so is the average level of your colleagues.

For actionable steps, I'd recommend you to read: 1) Designing data intensive applications by Martin Kleppmann 2) System design interview by Alex Xu

[-]

Aira_@reddit

Good advice but the book recommendation is garbage, sorry. DDIA is way too deep and the Alex Xu one is just shallow stuff for interviewing. Try Understanding distributed systems by Roberto Vitillo.

[-]

Legal-Trust5837@reddit

I gave these two books specifically because I don't know the OP's technical expertise so I knew he'd be able to get value from one of them.

It's hilarious I get downvoted for saying the uncomfortable truth. Guess it rubbed some people the wrong way. Maybe a good look in the mirror is due for some

[-]

throwaway_0x90@reddit

Completely normal depending on what you want in your career. Do you actually care about becoming an SRE one day? If not, then you don't need to care about specifically working on high-availability systems.

[-]

AppointmentDry9660@reddit

I too started at a PHP shop

[-]

thevoid__@reddit

For high impactful positions (Staff Engineer / Architect) it will be limiting factor. In our company i wouldn't hire for any of those roles someone without that experience because is a must have.

Our scale is not out of this world but there is some scale and there are things you only know how to handle once you have faced them.

[-]

bcolta@reddit

I would say it depends on you carrer path.

If you work on high availability systems you learn at more fast paced speed, for me when I transitioned to a big project like this my knowledge has grown exponentially.

[-]

UnbeliebteMeinung@reddit

You are confusing/swapping high-availability and high performance a lot in your post.

[-]

Agitated_Run9096@reddit

And yet it more accurately defines what the OP is asking. Welcome to the world of SLAs.

[-]

Dear_Philosopher_@reddit

Absolutely not

[-]

Idea-Aggressive@reddit

It’s normal. Everyone has their own experience, at the end of the day is related to business needs. Of course the are false expectations which leads to teams failing to fill a position for months because they’re expecting fantasy. Do your best and that’s it.