What's a system design mistake you made in your career?

[-]

thisismyfavoritename@reddit

as tempting as it might seem a full rewrite is probably never the right thing to do.

Often you can only generate value/gain any traction once you have feature parity with the product you are replacing, while you also need to plan for and support other new features (which are the reason why the rewrite happened in the first place).

[-]

spacemoses@reddit

I've seen some pretty wicked game dev projects where I think a rewrite is justified. Either that or it will be multiple rounds of heavy refactoring.

[-]

thisismyfavoritename@reddit

in some cases there's no other way, but the business will have to be ok with 2 things: there will be no benefit/profit made from the rewrite until feature parity is achieved. This more or less forces "waterfall" development, because there won't be any interest in using the unfinished product over the working one and thus you have all the downsides of waterfall, specifically the rewrite might not even be needed by the time it's done.

Secondly the business should be ok with some "API" breakage, especially when some of the issues caused by the bad design leak through the "API".

I put API in quotes because it isn't just API in the usual sense, it's ANYTHING observable that your process exposes. I've found one very painful example is byproducts like files.

[-]

MusikPolice@reddit

This is especially true when test suites, comprehensive documentation, and experienced developers are missing from the project that is being replaced.

In those cases, there’s no way to test that the replacement system actually does what it’s supposed to do, and no way to learn about all of the edge cases and bugs that have been patched out of the legacy system.

[-]

kutjelul@reddit

In my career I’ve dealt with countless ‘seniors’ who’s first solution to anything is a proposed rewrite. They completely overlook the point you mention

[-]

dweezil22@reddit

Deeply and honestly answering: "What is valuable about this system that prevents us from just quickly rewriting it?" is something that almost never happens, which is a shame.

You'll see ill-fated rewrites that don't fail b/c they only discover this stuff after the fact. But you'll also see ill-fated non-rewrites that keep the legacy system out of pure fear, rather than an understanding of why.

[-]

Mr_J90K@reddit

This is because "we need a rewrite" is typically said when the original developers are either unavailable or overwhelmed, and the current team hasn't yet acquired enough tribal knowledge to manage the system effectively. As a result, they often can't distinguish which parts are valuable enough to keep and which represent past mistakes.

[-]

ShroomSensei@reddit

Do a medium refactor/rewrite of our business logic framework right now. Completely regret it. Not because it wasn’t the right thing to do but I simply am not given enough time to commit to it so it’s starting to get rushed and some of the foundations are starting to not be laid correctly.

[-]

la_cuenta_de_reddit@reddit

But that's the reason they were bad to begin with.

[-]

ShroomSensei@reddit

Nah the reasons it’s bad (not even bad, just something we can’t deal with anymore) is because of unknown unknowns. We didn’t know it was going to blow up into dozens of microservices, we didn’t know our support team would get laid off, we didn’t know our company would end up canning tools we used heavily, etc etc.

[-]

doteka@reddit

I feel this on an emotional level. We embarked on a rearchitecting project that made a ton of sense when we had 6 teams and 40 engineers. It makes much less sense with 3 teams and 20 engineers, but now we’re already in limbo.

[-]

ThePoopsmith@reddit

The second system problem was described in “the mythical man month” literally 50 years ago. Yet tech leaders so often still think their project will be the exception. It’s always been a mess whenever I’ve seen it.

[-]

undo777@reddit

I actually had a highly successful rewrite recently, but it was a very isolated and rather small component. The issue with the original implementation was that a few system design mistakes made at the beginning severely handicapped the ability to make it work the way it should, and over time people added hacks to get around those issues, which made it even more difficult to maintain. One example was that the parallelization didn't take into account that a part of the work was more efficient as a single process. What did folks do to get around that? Added a semaphore, of course! Well, now you have a multi-process system with semi-random serialization on that semaphore, good luck figuring out why it is being slow in some cases.

My rewrite fixed this and a bunch of other random issues - also carefully throwing out some of the bells and whistles that people thought "would be useful some day" - and yielded a major improvement (latency, resource use, debugging). Kind of a unicorn situation and I hate to take quite a few stabs at it due to those bells and whistles + a conservative dev on the team, but it does happen once or twice in a lifetime.

[-]

sarnobat@reddit

Using neo4j instead of plaintext files for data storage

[-]

mrfoozywooj@reddit

Allowing developers to write their own infrastructure code instead of simply modifying and caring for infra code handed off by the cloud engineers / devops teams.

literally every developer written piece of infra code ive seen is a total shitshow of dependency issues, maintainability issues or just flaky, crappy, unsecure infrastructure.

[-]

donatj@reddit

We had a system that ingested large JSON blobs, made some simple decisions based on their content and forwarded them on. It was very old, creaky, and written in PHP. I was insistent that a Go rewrite would be faster.

I was given the chance to build a little prototype, and the initial pass using the standard library JSON parser was roughly 3x slower than the current PHP version. Undeterred I tried many different JSON libraries claiming improved performance. After a week or so of fiddling with the idea the best I could achieve was still just slightly slower than the current version.

I went back, tail between my legs, and explained. We had a pretty good atmosphere though that allowed experimentation and failure, so there was no real bad outcome.

I believe the PHP version is still in use today, surprisingly difficult to beat.

[-]

dweezil22@reddit

Where's the mistake? You had an idea, honestly prototyped it, and accepted your data driven results rather than putting your thumb on the scale. This is something to be proud of!

[-]

donatj@reddit

The mistake was in my insistence that it would be faster/better. It wasn't a component we had any good reason to really touch. It was doing its job. I was just insistent I could rewrite it and Go and it would just magically be faster, and I presented it as such repeatedly.

[-]

apartment-seeker@reddit

The mistake was in my insistence that it would be faster/better without any real evidence.

How would you have gathered evidence without building it?

I am surprised the PHP version was faster, that's very unexpected.

[-]

dweezil22@reddit

Makes sense. Btw this is like the perfect interview answer to a number of shitty interview questions like "Tell me about a time that you failed" or some such.

[-]

tonygoold@reddit

I don’t consider that a shitty question, although I prefer to ask about “a project or design decision you were involved in” that went badly, because I want to hear how they analyze a situation like that. Anyone senior or higher should have mistakes they can reflect on.

[-]

dweezil22@reddit

Fair, I forgot the sub we're on. For a sufficiently experienced candidate I agree that it includes a valuable test of humility.

OTOH I've seen it used interchangeably w/ juniors, and I don't think it's appropriate there.

I'll never forget the dick from Lockheed Martin that asked me "What's your greatest weakness" during my loop there and when I said "Inexperience" used it against me to suggest they shouldn't hire me (despite it literally being a college undergrad new hire loop).

In all cases it's a dangerous question, b/c a bad faith interviewer can use it to gain ammo to sink you. As the interviewer, I know I'm working in good faith, but I recognize the candidate in my power doesn't know that. Which means I try to reserve it's use for cases where I think that humility signal is truly important (typically staff+ or TL roles).

[-]

s0ulbrother@reddit

Way to hard on yourself here. I don’t think any dev has never gone “we should rewrite this thing “. I’m actually surprised they said go for it. They probably agreed with you in theory

[-]

smacksbaccytin@reddit

Yeah this is the perfect outcome. In my experience most engineers wouldn't even test it, just rewrite it and blindly say its faster and better and deploy it and someone else learns the hard way and has to deal with it.

[-]

BetterWhereas3245@reddit

PHP's json_decode is just a wrapper on a highly optimised C library.
There's a nontrivial amount of PHP language features that are just a thin wrapper of a C library, and the language itself is also written in C.

[-]

coyoteazul2@reddit

Maybe the php version is not parsing the json, but doing keywords search instead? That'd make the most sense. No need to parse the whole thing if you are not going to use the whole thing.

[-]

donatj@reddit

The PHP version used json_decode into associative arrays. It's just surprisingly quick at it.

[-]

AaronBonBarron@reddit

Because it's a thin wrapper over a highly optimised json parser with next to zero error recovery written in C

GitHub link

[-]

coyoteazul2@reddit

That documentation mentions it has depth, so it really isn't parsing the whole thing (depending on what depth you gave it)

[-]

donatj@reddit

Setting a max depth just results in NULL and an error, not a limited parse. json_decode is just dang fast.

https://3v4l.org/F9lUj#vnull

[-]

tooparannoyed@reddit

Default depth is 512. It’s crazy fast. Based on my own experience, you’ll have to go with c if you want better performance. There’s also python libraries that might be better depending on the use case.

[-]

Intelnational@reddit

It’s an interesting case, how come PHP was faster than Go though? I did not think such a thing could be possible in any script?

[-]

plhk@reddit

Why not? Php’s json decoding is written in C and dicts are a native type for the language

[-]

missing-comma@reddit

This is quite old now. But I'm a bit curious, did you check if those Golang libraries attempted to do zero-copy parsing?

My first impression is that the slowness might be caused mainly by too many unnecessary allocations and copying. Other than this, I would imagine the Go code is/was just not being optimized enough by the compiler or something.

Quite interesting and unexpected at a first glance.

[-]

Internal_Outcome_182@reddit

go lang json parsing is worse.. than any other language, so that's understandable.

[-]

casey-primozic@reddit

Hmmm... is this trure?

[-]

await_yesterday@reddit

The parsers for JSON, YAML, and XML have serious security issues: https://blog.trailofbits.com/2025/06/17/unexpected-security-footguns-in-gos-parsers/

[-]

cant-find-user-name@reddit

I don't know about any other language, but go's json performance is atrocious

[-]

behusbwj@reddit

This is a perfect example of why algorithms and data structures matter. Many people will simply think the “faster” language is better, but a bad/unoptimized algorithm can make any language slow.

[-]

tooparannoyed@reddit

Quite interesting and unexpected

Good or bad, that’s an apt description of PHP.

[-]

son_ov_kwani@reddit

If it works don’t touch it.

[-]

dealmaster1221@reddit

You tried simdjson? Bet that would be way faster?

[-]

klowny@reddit

I see a lot of newbies believing the "go is faster" trap. It's pretty slow for a compiled language, it only seems fast by comparison if you're coming from interpreted languages.

But interpreted languages usually have golden paths that are offloaded to c with little overhead, which is very very fast. JSON parsing commonly is.

Go has pretty limited ability to offload to c, so it often loses to "slower" languages for these tasks.

[-]

Low-Tip-2403@reddit

So many feels in this! I’ve done the same…

The lesson I learned is actually none of the “new” stuff is faster… almost every time legacy wins

[-]

dchahovsky@reddit

The mistake of having too many micro services. Having a micro services per single api or a function. In some cases it has benefits, but the lifecycle, version and other management of too many entities is usually awful. And many deployable entities add a lot of additional (system) strain on the resources. Don't split logic to separate deployable entities without a good reason (e.g. different scaling, etc), just modularize it inside and be prepared to split.

[-]

xabrol@reddit

Yep, in the process of converting 20 microservices back into a mono repo as we speak. Its one product, never needed them to be separated, added tons of maintenance costs.

[-]

baklaFire@reddit

microservices in mono repo are still microservices

[-]

xabrol@reddit

We are unrolling a lot of apis into one

[-]

pfc-anon@reddit

Ah, this is my wheelhouse, I once inherited a project with 32 μServices, most of those were jank and could've existed as a library or part of a larger service. My predecessor went batshit crazy with all of this unnecessary complexity. This was from breaking the monolith they were working with. I had to propose re-monolithing of these to make the developer's life easier. Before I left we were down to 18, hopefully the team and my successor got it under control ✌️

[-]

MusikPolice@reddit

The related issue here is that developers don’t know what they don’t know.

It’s almost impossible to intuit the microservices that might eventually be needed when the project is still in the design phase. I’ve seen plenty of cases where the wrong divisions have been made, which can lead to a production system that is less optimal than a monolith would have been.

In my hard won experience, it’s always better to start with a heavily instrumented monolith, and then to split as needed based on observed bottlenecks rather than trying to intuit the correct splits up front.

[-]

thesame3@reddit

This. I’m currently maintaining 13 micros services as a single developer. That system was built by a team of 3 people. No micro service receive more than 1k requests a day.

[-]

paynoattn@reddit

I worked for a company that had a microservice for CRUD operations around phone numbers. I argued it can just be a code library, nope they really wanted a microservice.

[-]

Hziak@reddit

At my first job, about 4 months in, we decided to build an in-house CRM. About a month later, every other developer in the company quit. They asked me if I could continue on my own, I was too scared to say no to management, so here I was, 5 months into my career as the sole developer on a brand new PHP web application. I had never built an API or any other kind of web app before.

To this day (apparently, over a decade later, they’re still using it), there’s still no back-end authentication on any requests, including the many hosted resources that generate lists of every lead, job completed and financials for the company. The company has an extreme churn rate of people who take what they learned and start competing companies, as well, and require people to use their own personal devices for the job. Anyone with the most basic web development knowledge could very easily bookmark the I’ll for a daily list of leads filtered by geographic location and poach the entire marketing and sales funnel for their own business or sell it to whomever.

Oops…

[-]

agk23@reddit

Damn. I sure want to avoid working for this company. Can you share a link to them so I know to avoid them?

[-]

Hziak@reddit

Unfortunately, the product is hosted somewhere totally different from the company’s branding and is only really known internally. While it is available to the public Internet, the link you’re looking for would be pretty annoying to find without either direct FTP access to the web server - at which point, just take the plaintext mySQL credentials from the DAL file… - or having worked there long enough to unlock the ability to not get redirected from the page that makes that request 🤣🤣

[-]

Miniwa@reddit

Once I implemented a kind of "behavior-as-configuration" system where you could modify and add layouts, menus, data sources and add "transformation filters" on the data, straight from a json file. The benefit, in my head, was that administrators and users could change what they needed without getting a developer involved. This kind of "meta configuration" turns out to be really hard to maintain, and also is a headache to work with because you have data migration issues on top. And the benefits are illusory because no user will want to learn your complex system that lacks tooling and documentation anyway. So in the end you're the one implementing changes anyway.

Now I believe code should stay code, and that configuration should be thought of as another type of API aimed in a different direction from your user facing API. Design it to be as simple as possible, but not simpler.

I tend to err towards "specific" rather than "abstraction" these days. Good abstractions are VERY useful but early on its so hard to predict where you will want them.

Oh and not thinking about data early enough. Code mistakes are easy to fix. Data mistakes not so much.

[-]

BetterWhereas3245@reddit

I'm so glad I was able to discourage management at my previous job of doing something like this, for SQL queries! They wanted to build an unsanitized SQL query input field into some admin panels and it was a massive maintenance and security nightmare.
They knew it would be a serious security flaw that would not pass an audit. What got them on my side was explaining that the intended users would never be smart enough to use the system on their own and we would be doing that maintenance work in the end.

[-]

MusikPolice@reddit

Heh. I’m working with a similar system right now. It uses JSON files to define a data and API schema that are used to dynamically codegen a cluster of microservices.

In theory, it lets customer teams quickly set up a cluster of web services that do exactly what is required. In practice, the learning curve is steep as a half pipe, and the system fights you any time you try to stray from the narrow path that it was designed to service.

[-]

await_yesterday@reddit

This is the "configuration complexity clock": http://mikehadlow.blogspot.com/2012/05/configuration-complexity-clock.html

[-]

csanon212@reddit

I worked with something very similar. Another team had a JSON config that allowed you to drive a page layout with dynamically built components. There was no room for custom components. Our business requirements called for a table with multi select. We came back to that team who said it was not possible and they added it to their backlog and said it would be 8 weeks. We needed a UI built yesterday. I made my own multi select table and made the whole site in 2 days. I kind of ruffled some feathers as now that team had one less "success story" to trot out as I "went rogue". The UI was the last thing on this project which drove 7 figure revenue over the next year. The One Generator to Rule Them All project got killed like 3 months later.

[-]

Lmhjpn@reddit

Same thing!! A talented junior engineer convinced leadership to implement this Json config for web forms and they ate it up thinking it would allow "self serve" and scaling of adding a lot of different forms. It is much more complex than writing the web code and of course doesn't handle all the UIs we want to add and needs maintenance. Very few people understand how it works and it has definitely not made things faster. I completely agree with code is not config.

[-]

Punk-in-Pie@reddit

Wow. As an engineer with 5 YoE currently, that Jr was me on my team previously... Good to know I'm not the only one that over-engineered in this way.

[-]

Potato-Engineer@reddit

I worked on an internal product that served about a half dozen teams at first, and the product leaders went for a JSON-configured system "so teams could set up their own pages quickly."

I talked to the UI's team lead later; he firmly believes we could have gotten going faster and more reliably by just directly building the pages those other teams wanted, rather than building a system and then configuring it.

[-]

BDHarrington7@reddit

Data mistakes not so much.

This is one of many reasons why any other sql db >>> SQLite. The latter will happily accept a string in a column defined as an int, by default.

[-]

gnuban@reddit

This is very common to see, and I think it's really easy to end up in this trap. The tendency of a very generic system to become sort of a bad version of the original development environment is sometimes called "the inner platform effect". There's a Wikipedia article on it and some funny anecdotal stories on RheDailyWTF.

[-]

horserino@reddit

Code mistakes are easy to fix. Data mistakes not so much.

This should be printed and put on the walls of every software shop

[-]

ShroomSensei@reddit

Including Kafka in the first iteration of a feature. Made it stupidly complex for no reason and ended up being the complete downfall. All we really had to do was work the PM and reduce the scope of the feature. Like maybe we shouldn’t allow customers to request 25gb+ of csv data…

[-]

BroBroMate@reddit

Yeah Kafka is one of those technologies where you trade complexity for what it can do.

And generally, if you're not moving multiple megabytes of data per second, the complexity isn't worth it.

But when you need it due to throughput, then Kafka is a godsend.

I got hired for my current position for my Kafka experience, and the first thing I realised in the new role is "... you don't need Kafka".

But the VP had read a white paper, so my opinion was disregarded, so I spent my time trying to teach people how to work with Kafka, and how to mitigate the complexity.

A few years on, the company still doesn't need Kafka lol.

[-]

meisyal@reddit

This is interesting.

Could you share a bit about how to mitigate the complexity?

[-]

BroBroMate@reddit

Step 1. Explain that Kafka is not a message queue. Step 2. Thoroughly explain how consumer groups work. Step 3. Explain the various strategies around offset committing. Step 4. Explain how producer batching increases throughput. Step 5. Explain how Kafka maintains absolute ordering on a partition basis, and how key based partition assignment works.

[-]

Stephonovich@reddit

Step 1. Explain that Kafka is not a message queue.

THANK YOU. This seems to be so hard for devs to grasp.

[-]

Pristine-Pride-3538@reddit

Do you know of any source (besides their respective documentation) that really drives this point home?

I've been researching this topic for a use case at work (message queue to provide some buffering if services go down) and my colleagues seem very intent on reaching for Kafka, probably because they hear its an industry giant.

I already know that Kafka is an appended log and not a relational database like RabbitMQ. That logged messages in Kafka aren't removed from the log when consumed, unlike RabbitMQ. Is it that last part that is crucial when distinguishing it from a message queue? Or are there other factors that are significant?

See, my colleagues don't seem to think that those factors I mentioned disqualifies Kafka from our use case.

Might I also add, we are absolutely not going to process hundreds of thousands of messages per second. Ideally, we'd like to have a separate message queue service for like a dozen API services. Therefore, I'm also thinking about overhead.

[-]

BroBroMate@reddit

Yeah a great way to explain the difference is what MQs can do, that Kafka can't - I mean, you can try to accomplish these things with Kafka, but it's clunky as fuck.

The obvious one is synchronous produce/consume, where one app wants to wait until another app confirms successful processing of a message.

Another is message routing - e.g., if the message headers contain Foo, then only Bar app should get it.

Or guaranteed delivery, with retry and a DLQ.

You can do these things in Kafka, but they're hard and often involve intermediate apps.

[-]

hooahest@reddit

Huh. Good to know, those are all fundamental features in RabbitMQ

[-]

BroBroMate@reddit

Yeah, so when someone wants to implement synchronous messaging on Kafka with two topics and a producer/consumer on each side, I give them a pamphlet on RabbitMQ.

[-]

BroBroMate@reddit

Best way I can explain it is that it's basically a series of big dumb tubes.

[-]

Stephonovich@reddit

Oh, so it’s like the internet!

[-]

towncalledfargo@reddit

Step 4. Explain how producer batching increases throughput.

We actually ended up implementing batching on the consume side of things, i.e keep consuming messages with same DateTime created, event type etc and then when you see a message that doesn't follow that trend, commit the offset.

[-]

meisyal@reddit

Thanks for sharing. You are really explaining the Kafka fundamentals to them.

[-]

silentxxkilla@reddit

I told us not to upgrade the TF version from 0.x because it would introduce mega breaking changes, and our deadline was looming urgently. The system is still on 0.x. Stuck there forever likely.

[-]

jake_morrison@reddit

Not my decision, but my client’s. The non technical founder asked his friends in Silicon Valley what tech stack he should use, and chose Django and MongoDB. This was the early days when Mongo had just been released, and he wanted to be “web scale” for his social restaurant guide website.

Storing restaurants and related data as a single blob was a performance problem. Adding a review to a restaurant meant reading everything from the db, adding a line of text, then writing everything back. If two people were trying to comment on the same discussion, there would be conflicts.

In order to get its high performance numbers in benchmarks, Mongo by default used “running with scissors” mode, where it would not sync to disk immediately. Turned out that the Django driver for Mongo would silently discard errors. The result was a bad performance, lost data, and ultimately a badly corrupted database.

I still have PTSD from that project.

[-]

csanon212@reddit

My retirement side income is going to be going through legacy apps built on NoSQL databases and converting them to SQL

[-]

considerfi@reddit

And yet we're supposed to always pretend in system design interviews that we considered noSQL for the main database.

[-]

thatssomecheese8@reddit

Goodness, I hate that. I really badly want to just say “I’m gonna use Postgres because it just works” for every single case

[-]

considerfi@reddit

Yeah seriously. I just want to say "I'm going to use postgres." Then pause, stare them in the eye and say "Because."

[-]

Cube00@reddit

But schemas limit sprint velocity /s

[-]

Stephonovich@reddit

Now you’ve made every DBA reading this violently twitch, good job.

[-]

enygmata@reddit

Do they still exist?

[-]

Stephonovich@reddit

They do at companies who don’t want to fall apart as scale. Sometimes they’re called DBREs.

[-]

NaBrO-Barium@reddit

Aye, and if you have a problem with that we’re not a good fit. Peace

[-]

ikeif@reddit

I don’t think I have ever seen something in Postgres -> NoSQL. But I have seen a lot of NoSQL -> Postgres/MySQL.

[-]

catch_dot_dot_dot@reddit

You can introduce them for a reason. Key-value, columnar, and graph DBs have their place if you do an analysis and determine the performance/usability increase is worth the extra maintenance. Unfortunately the maintenance is usually underestimated.

[-]

ikeif@reddit

I feel like "that's a future problem!" is the usual thought in the matter.

I'm currently working on migrating an old DocumentDB -> PostGres (and also python -> Golang, but that is for company alignment, not because python couldn't perform)

[-]

stringbeans25@reddit

To be fair there is a certain point where a single Postgres instance might not be worth the maintenance/complexity overhead. I feel like if your app is truly going to see consistent >100k IOPS, you should consider NoSQL options.

[-]

Stephonovich@reddit

Why? An NVMe drive can hit millions of IOPS, and Postgres can make use of it. Source: I’ve ran precisely that.

[-]

stringbeans25@reddit

I’m actually interested in a write up if you have one!

No argument on my part from what IOPS an NVMe can hit. >100k IOPS is just a general guideline I have in my head for when to even start thinking about NoSql. 99% of applications won’t hit anywhere near that with human traffic.

[-]

Stephonovich@reddit

tl;dr EnterpriseDB Postgres BDR active-active mesh with 5-7 shards (I forget exactly how many), each primary node having N vanilla Postgres read replicas attached to it. The primaries had io2.BlockExpress drives, and the read replicas were all i4i instances with local NVMe. Total mesh peak traffic was something like 1.5 - 2 million QPS.

I don’t particularly recommend anyone do this, as it’s a huge pain in the ass to administer, but it was also the only thing keeping the extremely chatty app from falling over.

[-]

stringbeans25@reddit

This is an awesome setup! I’ve only setup single primary with a single read replica myself which lessened the maintenance overhead.

My original comment was geared towards single instance setups but definitely a good callout that multi-instance Postgres is an option!

[-]

meltbox@reddit

I mean sure but can we stop pretending that those nosql solutions aren’t just optimized sql-like solutions that fit your use case more precisely?

I mean if you need the relations then you still have to encode them in some way. You don’t magically obviate them by using magic nosql.

[-]

ings0c@reddit

I mean if you need the relations then you still have to encode them in some way. You don’t magically obviate them by using magic nosql.

Most data is relational. That’s not a factor when choosing SQL vs NoSQL.

If your access patterns are known at design time, you can build efficient data structures ahead of time which captures those relations, avoiding runtime joins.

For truly write heavy scenarios that would benefit from horizontal scaling, they can be a better choice than a SQL database, but rarely are. Nearly everyone who thinks they need that degree of horizontal scaling doesn’t.

[-]

stringbeans25@reddit

They are typically entirely different underlying data structures so I think optimized sql-like is a bit reductive. I do 100% agree you still need relations and the NoSql solutions I’ve seen work typically have very well defined use cases and you build your API’s very specifically around those use cases.

[-]

illuminatedtiger@reddit

That's the correct answer. If you're proposing MongoDB, in 2025, as part of any solution you're being willfully negligent.

[-]

casey-primozic@reddit

Serious question. Do interviewers deduct points from you if you choose Postgres? WTF kind of bullshit is this?

[-]

Bakoro@reddit

It is incredibly dependent on who is interviewing you.
Reasonable people just want you to be able to justify whatever decision you make and know that you are thinking about how to use the right tool for the job.
Some people have their favorite thing, and will absolutely deduct points for not doing their favorite thing.

[-]

thatssomecheese8@reddit

They usually want you to “justify” why SQL is good and NoSQL is bad for the situation, or vice versa.

[-]

casey-primozic@reddit

Serious question. Do interviewers deduct points from you if you choose Postgres? WTF kind of bullshit is this?

[-]

considerfi@reddit

No not that directly but it seems common to discuss "why you chose postgres" with interview prep materials suggesting that if you have unstructured data you would use noSQL. But like.... All data is "unstructured" until you structure it. And anyway postgres can handle json, so you can mix structured and unstructured data anyway so it's perfectly fine to go with postgres even if most of your data is unstructured.

[-]

ashvy@reddit

LowSQL or HalfASSQL when??

[-]

old_man_snowflake@reddit

you can just say mysql it's ok.

[-]

NaBrO-Barium@reddit

No… it’s oursql comrade

[-]

Max_Svjatoha@reddit

Pronouncing sql as "sickle" from now on ⚒️

[-]

audentis@reddit

HalfASSQL

Feels like something to put on my resume to filter the bad recruiters out.

[-]

SpiderHack@reddit

That's why I love android, I use sqlite like a sane person with an ORM on top, but can write my own custom sqlite helper class if needed. -done, correct answer.

Nothing else is really acceptable cause they have such good sqlite built in for free to all apps

[-]

meltbox@reddit

Is there even a use case where the main database should be nosql outside of “we don’t know what we need so we used nosql so that we can make it someone’s nightmare later.”

[-]

Punk-in-Pie@reddit

I think I may be an outlier here, but I do like NoSQL at MVP for start-ups for exactly the tongue-in-cheek reason you stated. Being able to add in columns adhoc is really nice while the business is finding its way. Once things stabilize and you know what the final(ish) form of your data is then you can refactor into whatever fits best. However I think it's also important to have that plan be very clear on the team.

[-]

considerfi@reddit

"i heard it was the new cool thing so I'm gonna be new and cool"

[-]

SuaveJava@reddit

For the simple yet high-scale systems in those interviews, a key-value store is sufficient. That's also the case in a lot of real-world systems. Yet frankly, most systems won't ever reach the scale where Postgres becomes insufficient.

[-]

considerfi@reddit

That's another thing we always pretend, that their startup is definitely going to need to scale to Instagram level, and we'd best make sure we plan for that today, with 1000 DAU.

[-]

heavymetalengineer@reddit

more microservices than customers

[-]

Wheezy04@reddit

Lmao. My last job inexplicably used ddb for the most relational-style data imaginable and then on top of that used the same ddb table to store like 12 entirely different data structures and used an awful complex prefixing strategy on the sort keys to allow searching for the different data types. All of the queries other than single lookups would have been massively faster with access to a table join.

[-]

twnbay76@reddit

Doing that now lol

[-]

casey-primozic@reddit

Or in 2025 terms

My retirement side income is going to be going through legacy apps built with vibe coding and making them work

[-]

mb2231@reddit

NoSQL is like a plague.

It's a tool that has a pretty specific use case that overzealous engineers try and use because it's a shiny object. Literally 95% of the issues I see in RDBMS's are just poor overall design or poor query optimization.

[-]

meltbox@reddit

Nosql seems so idiotic to me. I don’t work with databases but why would I need a database to store unstructured data…. It boggles the mind.

I mean it’s basically just a giant map in extended memory I guess, but why doesn’t anyone actually just say that. Instead every answer about what you would use one is very vague and never actually gives a concrete use case.

To me nosql is just a bad term. It’s basically “database that can do anything that isn’t exactly sql but could include sql like relations”.

[-]

Gxorgxo@reddit

I had a similar experience, Rails and Mongo. The stack was decided by the first engineer that worked a few months and then left for Google. The application worked with very relational data so we had to create a whole infrastructure in Ruby to make Mongo work as if it were SQL. Eventually with a lot of effort we migrated to Postgres.

[-]

casey-primozic@reddit

That is the stupidest shit ever. Rails and Postgres have been working together so well for more than a decade. That engineer should have been fired on the spot. They didn't know what they were doing.

[-]

Gxorgxo@reddit

To be fair, almost nobody at the company knew what they were doing

[-]

meltbox@reddit

From webscale to functional.

[-]

Cautious_Implement17@reddit

oh no, not the “mongodb is webscale” video.

[-]

Potential_Owl7825@reddit

Thanks for putting me on to that video, it was amazing to watch 😂😭 I didn’t know dev/null was web scalable and supported sharding

[-]

whisperwrongwords@reddit

Definitely supports sharting your data, that's for sure

[-]

ikeif@reddit

I remember when Mongo was released and another dev was pushing hard on using it on a project - and your scenario is the exact reason I fought against it.

Still glad I didn’t bend on that.

[-]

racefever@reddit

MongoDBEngine can rot in hell

[-]

ashultz@reddit

"future proofing" abstractions

most of the time the future does something different and now your abstractions are in the way

better to plan for a rewrite than try to avoid it

[-]

paynoattn@reddit

Yes!! Want an abstraction or polymorphism? We'll do it if that need exists here today. One of the first things I was taught in coding was "don't repeat yourself" but I feel like I should have been taught the opposite.

[-]

Gofastrun@reddit

Now instead of DRY we use AHA (Avoid Hasty Abstractions)

[-]

MixedTrailMix@reddit

AHA!! so good. I will be using this. Nothing bothers me more than nearly unused interfaces

[-]

Punk-in-Pie@reddit

I feel this in my soul

[-]

xgme@reddit

I was writing the replication part of a key value store serving exabytes of data. I didn’t know that within every couple of billion rows you’d get random corruptions. I investigated this for weeks until I learned about it. The bad parts were:

My rows didn’t have crc fields to validate the data. So I didn’t even know which rows were corrupted.
I didn’t have a version field for the rows. So updating the schema wasn’t that easy. Note that this was before thrift/proto was invented; I was doing raw data parsing. I eventually added a version field but “technically” there was a chance that ab old row would have matching beginning.

This was such a nasty issue to fix. It took me (and another engineer was assigned to work with me) months. I was super embarrassed but at the same time, during reviews nobody pointed this out.

Now, anytime someone introduces a schema, I immediately think about these two fields.

[-]

behusbwj@reddit

Avoiding redundancy of data across microservices. I had only seen it done wrong

[-]

kareesi@reddit

Can you expand on this a bit more? What kind of issues did you see when it was done wrong?

We’re running up against data redundancy across microservices on my team often and I’d love to learn more about anti patterns and what not to do.

[-]

MusikPolice@reddit

It was my first time out as a technical lead and I was an arrogant shit. I had been hired to build a scalable backend for an IoT company because I had some experience with big data processing.

I was so blinded by the idea of scalability that it didn’t occur to me that our company ran a high price, low volume kind of business. Consequently, scale was inherently limited by the number of devices that they sold. We’re talking thousands, rather than tens or hundreds of thousands of client devices that would ever have to be served at one time.

Anyway, I picked DynamoDb for a datastore, which was fine while we were in AWS, but turned into a data migration nightmare when some customer came along with a big bag of cash and a desire to run the system on-prem 😞

Two big lessons learned on that job: 1. Understand how the system you’re being asked to build will need to scale. You probably don’t have Netflix problems 2. Avoid using solutions that are proprietary to your hosting provider if at all possible

[-]

GoTheFuckToBed@reddit

Bringing in too much new technology on a small team. Even a simple database like postgres needs knowledge and maintenance.

Now I make sure resources (time, knowledge, humans) are available before spending. (sometimes call this innovation tokens)

[-]

MusikPolice@reddit

Innovation Tokens is a really great way to think about this problem. I’ll have to remember that one

[-]

Groove-Theory@reddit

Not one single thing, but usually I've always been burned by not having enough logging or visibility to whatever my system was doing.

Anytime my system does something weird and you don't have the receipts or redundancy or logs to debug or observe what's going on always limits to knowing if your system design is good or not.

Espescially when you work on payment systems.....

[-]

MusikPolice@reddit

We pay an arm and a leg to send traces to DataDog. I love that flame graph view that shows me exactly where execution time is being spent.

[-]

gnuban@reddit

The worst part is that you're never logging the very thing that goes wrong. It's line you say, you can never log enough, literally never.

[-]

donalmacc@reddit

I made the typical “use mongo when you should just use sql” mistake. We had a project where the data was logically key value, our access patterns were key value, and there was absolutely no plans for any relational data. We also didn’t have a schema for the data so mongo let the domain be “flexible” with what it supports.

About 6 months into this, we haven’t changed the schema of the data we’re storing once, and all of a sudden we need to with versioning and migration of old data in our dev DB. App team are complaining that their code should just work, when they wrote the serialisation into mongo in the first place.

Then when we started scaling it and benchmarking it, we saw enormous amounts of redundant re reads, over and over again. Turns out in basically every interaction the other team did “iterate through every key that I know about, fetch the data and store it in the app data, and then filter by a specific field”

We replaced it with MariaDB over about 2 weeks with “minimal” data loss, all our performance issues went away with 2 filtereing endpoints, and we also fixed a bunch of bugs around atomicity when writing that required a whole load of patch up code to be run to roll back partial updates.

I’ve not used mongo since, unsurprisingly

[-]

morswinb@reddit

I don't see how this was an issue with mongo itself.

The iterate through all the data and rewrite all the data is a pattern that I managed to fight my boss against before implementing. Mongo works just fine after almost a decade now.

[-]

neurorgasm@reddit

Lots of the "problems with mongo" posted on this sub are usually people who didn't want to learn how to use mongo, then roll their eyes when a postgres-shaped peg doesn't fit in a mongo-shaped hole. Same with graphql.

[-]

kbielefe@reddit

It's sort of the same mindset issue as with static vs dynamic typing. NoSQL data still has schemas, they're just not enforced by the database at write time. "Not wanting to deal with schemas" is a bad reason to choose it.

[-]

donalmacc@reddit

Mongo themselves disagree with you - On their website they repeatedly talk about storing unstructured data, and specifically say

the developer controls the database schema. Developers adjust and reformat the database schema as the application evolves without the help of a database administrator

They also talk repeatedly about getting started quickly and “evolving quickly” - all of this is (IMO) is saying “don’t worry about a schema, we’ll handle it).

Storing unstructured (or maybe more accurately “loosely structured”) data is _ the_ reason to use mongo

[-]

kbielefe@reddit

My point is this doesn't remove the burden of knowing what the structure is. It shifts the burden from the database to the app. That feels easier, until it's not. If one reader decides a field is required, every writer must remember to add it every time.

[-]

donalmacc@reddit

I did learn how to use Mongo. We architected our API to use mongo effectively. But the problem is that everyone else wants to use a postgres shaped peg.

[-]

paynoattn@reddit

Not to pile on here, but I'm confused by your statements. Mongo has a query mechanism that is just as powerful or more powerfult than SQL. It also has schema enforcement mechanisms - though optional, and versioning mechainsms. I don't see how a SQL DB would have fixed anything here.

You're problem is having another team executing bad queries on your DB. They should have been using your API to query and filter data.

[-]

donalmacc@reddit

The point was we used mongo when we just needed sql. We did backflips to design things that weren’t relational to use a nosql database and ended up with a simpler faster approach when we used the right tool for the job.

Like I’ve said three times in this thread

[-]

rat9988@reddit

Most complaints about mango are usually bad usage, using it in its early years when it was a bad db, or just repeating what others said.

Mango has grown a lot since its inception.

[-]

donalmacc@reddit

We made a shitty relational database api out of a nosql database and app logic. We would have had the same problem with redis or anything else - fundamentally we wanted a relational DB

[-]

Then-Boat8912@reddit

I used an Oracle product.

[-]

tecedu@reddit

Not as complex as the other guys here, but was writing a forecasting software, initial scope was only 100 sites with 3 scenarios. So i decided to load the data before in batch, decided to load the dataframes into dictionaries and loading into memory. Which was fine, then I was told scale and omg the compute time and RAM kept ballooning up. It went from 100gb of ram to 900gb when we decided to add 900 more sites.

So currently trying to get it fixed now with a proper database and stop loading everything into batch

[-]

rincewinds_dad_bod@reddit

Mongo, 2 months in I tried to get is to switch and continued that effort for an entire year. 😭 Later after I. Left the project that idea actually for some traction 😑

[-]

cyriou@reddit

Using dynamodb instead of relational database for a startup.

[-]

Wide-Gift-7336@reddit

Not me but I remember being apart of a product where it’s suppose to be low power but we had two separate chips to handle Bluetooth and audio processing separately.

What’s funny is technically the Bluetooth chip could do it all, the dsp, audio output, etc. so we had a parasitic chip essentially because of some decisions that were made and we were forced to work with them.

I once had to implement another interface between two Linux SoCs. Lots of people were pushing for finishing the v4l2 usb peripheral implementation. But I thought it was better just to use a Linux rndis network adapter usb peripheral implementation.

Then to send video we would just send it as udp packets of compressed h265 or whatever video data.

Turns out that implementing that was super hard, perhaps just as hard as finishing the kernel work to get a camera peripheral (to share video quality). But I ended up being right anyway because our peripheral SoC only supported a few usb endpoints at once anyway,

[-]

Master-Guidance-2409@reddit

i have consistently tried to eliminate duplicate code by creating a lot of abstractions and creating "magic" defaults that attempt to the right thing, if specific config/details are not set. its always backed fire on me.

i seen this work in communities like ruby on rails, laravel etc. but it works in these communities because this is the expectation and its how everything is done since forever.

the issue is if you dont do the work of communicating and documenting all the magic; people break it in all kinds of ways unknowingly.

a lot of times direct, explicit, repetitive, duplicated code is the best way to move forward and easier to change once the proper abstractions are discovered.

i also waste all my fucking time name and renaming things trying to find the perfect balance between not too implicit and not too verbose.

im older now so i dont suffer from these ailments as often, but every now and then i relapse. abstractions are a hell of drug.

[-]

Straight_Waltz_9530@reddit

Not pushing the team harder to use Postgres instead of MySQL. I've made this mistake twice now.

[-]

temakiFTW@reddit

Why Postgres over MySQL? Is it generally a better database, or did it fit the use case better for your project?

[-]

paynoattn@reddit

Not OP but postgres has a lot of pros over MySQL. From a speed perspective they are usually neck and neck, but Postgres has a lot of stuff out of the box that mysql / mariahDB does not. Internal caching and text searching (no need to use elasticsearch), jsonb support with field querying and indexing; additional money type database fields with validation; also a huge list of plugins that can add things like oauth2 user auth, graphql, vertices. etc.

[-]

son_ov_kwani@reddit

Postgre internal cache is quite slow and I’d not recommend it.

[-]

Straight_Waltz_9530@reddit

https://www.sql-workbench.eu/dbms_comparison.html

[-]

No_Grand_3873@reddit

used mysql in a large project and a lot of performane issues, not happening now that im using postgres in a different project, maybe skill issue

[-]

wdr1@reddit

Choosing PHP as the language to make Yahoo's tribute site for the first anniversary of 9/11.

As you can guess, this was September 2002. Around 9/1, the company decided it wanted to do something & put out a call for volunteers. The idea was to make a "virtual quilt". It was inspired by the AIDS quilt, with the idea, each person could make a custom quilt (an image + text) to add to the virtual quilt, which could then be browsed.

Our leadership had decided we would use PHP going forward, but it hadn't been announced yet. (Notably we hadn't hired Rasmus yet.) There was a team of about 5 of us, none who had used PHP before. We were all experienced eng and definitely knew how to make high scale websites, but a lot of infrastructures & best practices wouldn't work with vanilla PHP. Notably, unlike mod_perl or other Apache modules, you couldn't persist data between requests. Rasmus would later tell us it was for security, but it made it impossible to cache certain data. If I remember right, we solved it by writing Perl scripts to query MySQL & generate PHP data files as a workaround.

It ended up working just fine. The site itself was a huge success. Coverage on CNN, etc. and 60 million tiles created (which, creating how many people were online in 2002, was a lot).

But man, to this day, I still fucking hate PHP.

[-]

son_ov_kwani@reddit

Was still a baby then so I can’t really relate. At graduation 2018 PHP got me my first job, first pay check. I used to hate it but now I’ve grown to love it. ❤️

[-]

gnuban@reddit

you couldn't persist data between request

IIRC people were using memcached for that kind of persistence back in the day, but it seems like it only came out in 2003. Proper early!

[-]

XenonBG@reddit

You probably know this, but PHP has improved a lot in the past several years. I understand why people hate it, and for some reason Microsoft hates it as well, but it certainly doesn't deserve the reputation it achievement 10-15 years ago.

[-]

titpetric@reddit

At some point I made every mistake. The most common pattern example would be cases where to transition the mindset from no design/intuitive domain design to technical requirements design (HA, traffic patterns, infrastructure, etc.). Essentially "we need a v2".

This applies to common SWE problems like cache invalidation (iterated), timeline queries with 200K+ users (mini twitter/blog platform/...), job queues that took hours which we then optimized/parallelized. Usually the performance hits would point out a system design issue (rather than a mistake, mostly just sub optimal code).

Due to systems maturing over time, new concerns are formalized and sometimes those carry design changes. Ensuring compliance with new standards like soc2, or adding observability, those sometimes carry design changes if the original design if any did not account for those. A lot of these I consider a baseline of design for serious software, and objectively not a lot of OSS meets my criteria in this regard. It's more typical that these things are unmet if there are "smallest change" policies and iteration is discouraged. Top down, product development organisations make a lot more mistakes than service development organisations, usually struggling to update dependencies and perform maintenance tasks towards structure and style conformation.

[-]

magichronx@reddit

I was tasked with building a fairly sophisticated metrics logging/reporting/monitoring system of time-series data. The project was a company experiment / side-project, and I was the sole developer on the project so all design decisions were up to me. Unfortunately I had never wrangled such a large amount of time-series data, so the first thing I reached for was InfluxDB ...aaaand it ended up being a huge mistake. The cost was prohibitive and InfluxDB has query limitations that prevented me from producing the reports I needed.

After I realized the entire data persistence solution wasn't going to be a good fit I ended up having to spend a whole bunch of time refactoring a ton of the codebase to make use of self-hosted TimescaleDB (which is basically Postgres with a time-series extension). The refactoring delay caused the company's interest in the idea to plummet and it was eventually abandoned.

In hindsight I should have done more research and cost-calculations before locking in on InfluxDB, but the project specs were very nebulous when that decision was made. Plus I was swamped with a million other decisions to make because I was responsible for building frontend/backend of a customer-facing dashboard, an internal admin dashboard, a data-ingestion API, and a system application that cross compiles to windows/linux/mac that can be remotely installed/configured/updated.

TLDR: Choose your DBMS carefully

[-]

Gofastrun@reddit

Moving from a monolith to micro-services.

We thought it would improve developer experience but then we just ended up with data boundary issues, a graphql layer that only senior engineers could understand, a bunch of N+1 queries, and coordinated deployments

[-]

angrynoah@reddit

I insisted on Kafka instead of SQS. Without actually trying SQS to prove it couldn't meet our throughput+latency needs.

Turns out SQS definitely absolutely could have met our needs, none of the extra features of Kafka added any value, running it was an operational nightmare, and the cost was probably 100x what SQS would have been.

I just fell for the hype, and convinced myself it was necessary, and disregarded any evidence to the contrary. Classic confirmation bias.

I think of this error often.

[-]

metaconcept@reddit

The real question though.... which one looks better on your resume?

[-]

No_Grand_3873@reddit

also a lot of jobs ask for xp in sqs

[-]

dchahovsky@reddit

The mistake was to pick more expensive service over less expensive, without any specific gains of the former. I completely agree with that. But I think you shouldn't call Kafka itself "expensive", as you probably mean not Kafka, but MSK (managed Kafka), which is indeed expensive.

[-]

angrynoah@reddit

MSK didn't exist then.

[-]

paynoattn@reddit

Redis also makes a really good open source alternative for kafka. Usually quite a bit cheaper and has most of the same features - consumer groups, compaction, infinite TTL, avro support, self hosted, etc that most cloud alternatives don't have. Most people think redis is hella expensive because it runs on ram but Eventhubs (the azure alternative to SQS) costs my company almost $250k a month due to needing premium namespaces because standard ones only allow 25 avro schemas. We could easily replace this with $50k of redis clusters but everytime I bring it up I hear about "cloud native" bullshit.

[-]

nshkaruba@reddit

We had 3 microservices, and we needed them all to have separate networks for security concerns (compromising one of them is a huge company risk)

We were rushing to deploy our startup to a cloud provider, so we didn't really have time to think, and our architect guy suggested to put them all to separate infra (separate clouds, folders, compute nodes, k8s clusters and etc). Separate infra means automatically separate networks. I didn't have a better idea at the time, and our management really rushed us to see the app in prod, so I agreed.

Half a year later I discovered Cilium :S Yeah. From that moment we've been dealing with x3 work every time a DevOps task is here. Now we're deploying a second installation, meaning we'll have 3 more infra components: 6 clouds instead of 2 💀

I wish I had more systems design experience back then. But well, it was a good learning experience, and our app is kinda popular :D

[-]

paynoattn@reddit

Thanks for pointing out Cilium to me, but for clarification purproses are you saying you deployed to three different cloud providers? That's insane. That architect really wanted to ensure they had job security.

[-]

nshkaruba@reddit

Naah, it's a single cloud provider, but basically 3 separate infrastructures in it. And we tried to achieve separate networks with that decision, which can be achieved with Cilium

[-]

PianoDogg@reddit

Learned very early that when sending email, one should really only do it zero or one times.

[-]

The_Rockerfly@reddit

Storing data column bound data in a nested JSON object. It made sense at the start of the project to make things simple and reduce the number of tables. We load a single record and then write out multiple front end records. Cheap, a huge reduction in DB calls and we could make changes to the schema easily.

Then we needed to start filtering data for the front end, on nested data post the query calls. Immediately, all savings were lost. Plus someone wanted to start recording the data for the warehouse and we don't have a lake house and had to monitor the pipeline. So any change that was a simple change for us was a breaking change.

[-]

gnuban@reddit

Many such cases. Similarly, when people tell you to "pick a NoSQL db that fits your use-case", you better have a very narrow use-case :D I can see it working for a single purpose frontend, but normalized relational data is so incredibly versatile in comparison.

[-]

osiris679@reddit

Assuming that actual mobile devices could parallel request 10 remote files at a time like my mobile emulator setup (needed for a specific use case with file access policies), when in fact most devices throttle to 2-3 parallel requests at the chip level.

Painful lesson.

[-]

oddthink@reddit

I was implementing some financial calculations, simulations effectively. Generating random sets of future interest rate paths was expensive, so we cached them. When the calc servers woke up, they'd read the interest rate data and do their calculation. It worked great! We had some compute servers in NYC, had the rates cached in their own servers, no problem.

Then someone decided to run the calculations on the servers in London, and we promptly saturated the data pipe between NYC and London by all the London servers slurping down rates from NYC.

I used to tell this as a ha-ha, this was a terrible failure, but it clearly wasn't my fault, kind of a story. No one asked me about running things in London, after all.

After a few more years, though, it stopped sounding so funny. Had I documented anywhere that we should really only run this in NYC? No. Did I test that the data and the compute were in the same geographic region? No. Did I set up any kind of graceful fallback (like switching to manually computing the rate paths if latency got too high)? No.

But after that, I did remember that location actually does matter, even on the internet.

[-]

soundman32@reddit

I tried to fit 9kb of code into an 8kb eeprom. It took weeks to work out why. The code ran fine on the emulator (which had 64kb).

[-]

kanzenryu@reddit

Well not with that attitude

[-]

daedalus_structure@reddit

I see web developers make a similar mistake with "local performance" all the time... "what do you mean 50 round trips to the back end is bad to render the home screen?" or the more subtle "what do you mean the SQL query is slow".

Yeah, works great on your machine where the network runs on loopback and you have 200 un-indexed rows not 2 million.

[-]

undo777@reddit

Oh gosh I hate tooling that does this kind of thing to you. How on earth is this not a trivial error? Is that because eeprom programmers have no way to check for boundary conditions - out-of-bound write isn't even a failure?

[-]

Eire_Banshee@reddit

We you work at that low of a level the error abstractions don't always exist. Similar to how OOM or SEGFAULT errors are always lacking detail.

[-]

undo777@reddit

I mean I can hand-wave all day too, I'm curious about the specific technical details that led to this situation.

[-]

soundman32@reddit

You all seem to be thinking in a 21st century mindset. This was the mid 90s with a custom compiler, and crappy eprom burners that were little more than wiggling pins in the right order. The idea that there was enough intelligence in the burner to even care what the user is doing is way beyond what was available 30 years ago at the low end of the market.

[-]

A764B9289D@reddit

You all seem to be thinking in a 21st century mindset.

Tangential but I I dream of a world where everyone I work with thinks in a 21st century mindset. I know people who would passionately argue that it’s the fault of the user and not their system’s responsibility to prevent such situations.

[-]

undo777@reddit

I don't think the mid-90s is even far enough for that kind of mindset. Specific mechanisms for exception handling were being developed since the 1950s with standardized support by programming languages in the 80s - just an illustration that people were very conscious about the benefits of propagating errors to the caller for a long time. I kind of suspect that the "crappy eeprom burners" was the more important driving factor there, as well as not being able to prioritize tooling improvements as there was so much other work and not enough talent in the booming industry.

[-]

undo777@reddit

Sounds like a lot of fun! Haha

[-]

undo777@reddit

Of course we default to the modern mindset - your original comment didn't suggest in any way this happened long ago. We still routinely run into these kinds of situations these days though, like when error propagation was deemed unnecessary and then you end up wasting time first figuring out wtf is going on and then dealing with it in unnecessarily creative ways.

[-]

subma-fuckin-rine@reddit

i get caught with these kinds of issues all the time, always some small detail that SHOULDN'T cause an issue but does. definitely source of most of my frustrations lol

[-]

stillavoidingthejvm@reddit

Been there. 🫂

[-]

NotAllWhoWander42@reddit

Working on evaluating a replacement wifi chip for our embedded product, had to write the MAC address. I was told that the chips had eeprom memory. Found out the hard way they had write once memory that just had a bit of extra “buffer” bits that made it seem like eeprom until you exhausted the buffer.

Cooked a handful of wifi modules figuring that one out…

[-]

rawrgulmuffins@reddit

I worked on a hardware product that was network enabled but my company didn't have access to our customers networks. We depended on customs to upgrade their systems following our direction and if things went wrong we sometimes had to fly engineers out to solve them on site.

I argued that a security issue was bad enough that we needed to patch it on all systems.

Management didn't want to pay for a full patch but they were willing to go with a "security patch" which was really just loading a kernel module for a particular os version. I said this was so bad we needed to fix it even if we're doing this the dumb way.

By the time I left that company our test matrix needed to be run against almost 100 os versions.

[-]

Chevaboogaloo@reddit

Not so much a choice I made but a company I worked for did a rewrite and went with microservices.

I was a junior at the time so it seemed like it made sense. But in hindsight we had serious velocity problems because of it.

There were less than a dozen devs in the company and over a dozen services. Nowhere near the scale that would justify it.

[-]

XenonBG@reddit

That's my life right now. My team of two developers has been assigned six microservices in the rewrite.

[-]

DragoBleaPiece_123@reddit

RemindMe! 1 week

[-]

RemindMeBot@reddit

I will be messaging you in 7 days on 2025-07-12 21:04:03 UTC to remind you of this link

CLICK THIS LINK to send a PM to also be reminded and to reduce spam.

^(Parent commenter can ) ^(delete this message to hide from others.)

^(Info)	^(Custom)	^(Your Reminders)	^(Feedback)

[-]

Low-Tip-2403@reddit

Blazor. React. Angular. Everything ui framework in existence.

You’ll have to pry alpinejs and htmx out of my cold dead hands…..

The over complications of ui development in general.

Is probably never coming up with a way to define. “Ok now I should add e2e tests”

Oh man I can go on.. this is a good topic

[-]

orzechod@reddit

this is just a list of stuff you don't like. where's the system-design mistake?

[-]

Low-Tip-2403@reddit

The mistake was over engineering a UI sorry thought that was clearer!

[-]

orzechod@reddit

ohhhhh

[-]

Low-Tip-2403@reddit

I should probably add another design mistake I always run into…. Context…. I always forget to add more context….

[-]

hopbyte@reddit

Not me but our “architect”. He went all in on Model Driven Architecture and code generation. How do we store a new Contact? Well obviously you’d generate an immutable Plain Old Java Object source code that extends a Contact interface with getters to its properties from a UI using the base distribution of Eclipse and then have them click deploy that compiles this new Contact, sends the bytecode to the server, and hot deploys the jar.

What’s that customer, UI performance is terrible!? Oh, we’ll just have our architect look into optimizing the comp… nevermind, he quit.

I quit shortly after.

[-]

Logical-Error-7233@reddit

Back in the early Java 2 days serialization was all the rage. We realized we could save a ton of overhead by simply serializing our objects and storing them as a blob in the database vs trying to convert them to SQL and map them back and forth. This was stine age pre-orm days when everything was straight jdbc. We were already serializing things to send across the wire so it made perfect sense.

Worked great until our next release when every single object that was updated now threw an incompatibility error upon deserialization. Whoops.

Super obvious in hindsight but I know for a fact we're not the only team to come with this idea and get wrecked.

[-]

superluminary@reddit

Multiple microservices talking to one db.

[-]

paynoattn@reddit

Having multiple microservices connect to the same database. Also sharding SQL can often lead to deadlocks unless properly implemented.

But the hugest system design mistake I see people make is having huge fights over programming languages saying Go or Rust will make your application 100x faster. If you look at the call stack of your app you'll see 80-90% of the request time is spent in the database. So changing your backend language will only affect 10-20ms of the 100ms, not the 80-90ms where your code is just sitting there waiting for a response. If you want speed, start by creating indexes, doing query plans, looking at your DB dashboard for longest running queries, etc before you ever consider switching your language. If you really want speed improvements, you can stay in python/php/node and switch to a cache like Redis or NoSQL like Cassandra. Only after that should you think about about a rewrite.

[-]

horserino@reddit

Jumping on the rust bandwagon for a parser and runtime for an in-house programming language that needed to run on both the frontend and backend in the context of a relatively successful startup 6-7 years ago.

Turns out writing a fast parser in Rust was far from trivial, so the resulting parser and runtime wasn't even faster. Loading the wasm made the first load way slower and all in all the typescript version was good enough for our context.

A lot of wasted effort way too early in the company's context. Didn't make much of a difference and we could've spent that time actually improving the languange and runtime themselves. Oh well.

I do wonder if the Rust barrier of entry for something like what we were trying to do is way lower nowadays.

[-]

gruehunter@reddit

an in-house programming language

Isn't this the bigger architectural mistake?

[-]

Low-Tip-2403@reddit

Yeah that feels like it’s casually glanced over and would be the real issue lol

[-]

Potato-Engineer@reddit

I've used a DSL that was the right decision. I've also used a DSL which was a godawful decision, and a third that was a mediocre decision (could have been a good business decision, but I wasn't privy to the data behind it).

The good decision was "user writes code, we need to convert it into four different languages." (I don't know a guy's alternative for that.)

The mediocre decision was "there's a lot of cheap JS devs out there, so let's make an internal platform for feature phones that runs on JS." (I'm not sure how much money they saved, but it's hard to imagine it was enough.)

The bad decision was some prick who didn't want to be blamed when the server crashed, so he wrote a DSL that was an XML wrapper over a sunset of Java, have it some exhaustive (?) tests, and could deflect blame from himself.

[-]

Low-Tip-2403@reddit

Oh…my…

[-]

horserino@reddit

Maaybe.

But that wasn't my design mistake lol

[-]

horserino@reddit

(fwiw, I don't think it was a bad choice, apps with small simple DSLs can be a great way to allow non programmer domain experts to encode their domain knowledge in the context of an application.)

[-]

tikhonjelvis@reddit

No idea on how it is now, but my first Rust project like 6–7 years ago was a parser for a simple binary format using the Nom parser combinator library and, while I did not do it in a particularly idiomatic way, it was pretty easy to get something fast working.

I've never tried doing WASM stuff in Rust though.

[-]

yodal_@reddit

I had the same experience with writing a few parsers in Rust.

WASM support years ago was not great, but it was an inherent problem with the WASM spec at the time and not the fault of any language. I think this sort of thing would be faster nowadays, but going from WASM to JavaScript will always be the bottleneck.

[-]

uns0licited_advice@reddit

As a junior dev in the early 2000s, I was tasked to develop a signature verification feature for a banking system. I had it refresh the whole database of signatures each time a user looked up a signature. This worked fine in test but when they deployed it at banks with thousands of customers it would take several minutes to look up a single signature. It's funny now that I think about it.

[-]

PocketBananna@reddit

Rolled our own auth service.

[-]

SlechtValk2@reddit

We have a big Java client application (started in 2002, so lots of legacy). Big part of it is a map viewer that uses ancient technology and only works with map tiles stored on local disk. We needed to modernize it to support map tiles server from a SaaS service.

After some research I decided that we should replace the existing map viewer with one based on a modern open source GIS library I used before with some success. After a lot of work by me and an other talented developer we still haven't managed feature parity with the old map viewer. And at the same time we ran into more and more problems caused by all the legacy stuff in the application and bugs and performance issues in the library.

Other developers had advised me to think about redesigning the whole application using web-frontend technology, but I thought of every possible argument against it to convince them and myself that my idea (my way) was the only right way forward, without really listening to their arguments.

In hindsight I think I made the wrong decision, so now after more than a year spend on a dead end road we are going to research the possibilities and challenges of the complete redesign...

[-]

dedservice@reddit

Wow, I love that half the other answers in this thread are "I shouldn't have done a total rewrite", while your answer is "I should've done a total rewrite".

[-]

SlechtValk2@reddit

Java Swing is ancient by now and hasn't really been updated since Java 5. JavaFx is a failed experiment that never went anywhere, SWT is also effectively dead. So staying with a Java desktop client is a dead end road.

It has served us for many years, but it is time for something new. Our biggest problem will be that our users are pretty conservative and very resistant to change. That is why I think we need to write something new and not just try to rewrite our client in newer technology.

Designing it will be a big challenge for me, as I am very familiar with the Java/JVM landscape, but pretty much a novice in web frontend stuff. I will need to use the knowledge and experience of other developers in our organization that know this stuff.

[-]

VictoryMotel@reddit

Inheritance

[-]

ikeif@reddit

Not me, but a former boss.

He used TinyInt for the primary key in several databases for several clients.

I inherited one of his projects when everything broke, and discovered that I could switch it from TinyInt to fix it, and then discovered that a TON of generated PDFs were never being cleaned up - that had a LOT of PII.

[-]

abe_mussa@reddit

Having more microservices than actual users

[-]

private_final_static@reddit

All of them

[-]

DeterminedQuokka@reddit

I needed an admin ui for an eta product and I didn’t want to build it. So instead I basically jailbroke Django admin and rewrote a ton of the internals. Then I wrote several scripts that would write like 100 files to set up everything. It was a mess if you had to edit anything. You had to delete most of it and modify the generators then start from scratch. It would have been easier to just build a real ui.

[-]

monsoon-man@reddit

Wrote a firmware which took input from the user, a 3 character code often passed as CLI option or from config file. Some day, the system will occasionally stop, and it took me a few days to figure it out. I allocated 3 bytes for the code, but someone used a seemingly 3 character codes, but the last character was empty space, making it 4 characters long.

Sanitize your inputs.

[-]

Efficient_Sector_870@reddit

Refactoring a god class monster of a reporting system and realised too late it used a connection pool and temporary tables, which I didn't have access to in the other connections

[-]

adfaratas@reddit

I tried to emulate java in python. Also tried to follow the clean code book to the T. It was a good abstraction but was too impractical.

[-]

birdparty44@reddit

I was vaguely told how to implement an app, in terms of architecture pattern, as a freelancer for an agency. I asked for example code / tempate I could follow. Provided none. Implemented it my own way. They’re on a deadline and go with what you know.

I was forbidden from using the only persistence framwework I knew. So I encapsulated the implementation that used it. 🤷‍♂️

Lead tells me at the end I cost him weeks in rewrites. I told the lead that sometimes it’s worth making the time you claim not to have in order to steer unknown freelancer in the direction you need, otherwise do a quick review early on instead of waiting till the end and then being upset.

As much as I regret not being able to deliver what he had hoped, I have a clear conscience about his having to work overtime at christmas due to a failing in his “lead” work. It’s not hard nor time consuming to throw a readonly repo link at someone and say “have a look at these 5 files to get a sense of how we implement this pattern.”

[-]

stillavoidingthejvm@reddit

Tried to shoehorn a relational database into elasticsearch. Ended up implementing super expensive application level joins, then trashing the entire project in favor of pg

[-]

malthuswaswrong@reddit

Took over a project from consultants that really mucked things up. Fixed a lot of their bad design, but direct access to the database wasn't corrected. I sped thing up dramatically, but never built an abstraction layer between the client and the database, and every client made direct queries.

This was an internal background application, so no users were involved, but I kept tuning the SQL queries to be faster, more concurrent, avoid locking, etc.

I made everything work and was quite proud of myself while I was doing it. Now I look back and realize if I had stood up an API and banged against that I could have saved myself a lot of pain and had a more secure and scalable design.

[-]

Top_Bumblebee_7762@reddit

Did a non responsive website a year after Ethan Marcotte's book had come out. At that time I thought a m subdomain was the way to go.

[-]

nath1as@reddit

you seem not to have learned anything at all...

[-]

mckenny37@reddit

As a junior dev I had 0 oversight from other devs I was tasked with making a web page to create and track forms for an Equipment Release Checklist.

Made everything as generic and reusable as possible. Attempted to create a 5NF normalized database structure. Table layout was overly complicated and pretty much had to be updated through a stored proc.

Values were tied to a specific place based on an id coming form the layout and was stored in 1 of 4? different tables based on datatype. I don't think I even stored the datatype anywhere so it just had to check all 4 tables to retrieve it.

Made the table so it could hold data of multiple different forms and use multiple different structures.

Ended up using this structure to make 3 different tracking systems and of course we stored each in a different database table, so the generic part didn't matter at all.

The code interacting with the tables had to be very specialized since the table was so generic.

Apologized profusely when I left the company 3 years ago. Feel very sorry for who has to/had to figure out how to extend that system.

[-]

justUseAnSvm@reddit

It's a bit of a long story, but the granular aspect are this:

We had a processing application, that was using streaming between the services. That was a huge mistake, since we were streaming individual items, with no sense of the batch, and there wasn't an easy way to add the notion of a batch.

My idea, was to basically work within that streaming system, and aggregate the results at the end using a commutative process that would mask the effects of not having a notion of "batch complete". The better idea, was to just bite the bullet, switch off streaming, and use a distributed lock system.

Anyway, it worked out for me: the team lead who had us use streaming left, I got the job, and a lot of credit for calling out the issues with streaming, and driving us towards a solution.

[-]

Oakw00dy@reddit

WCF. Seemed like a good idea at the time.

[-]

Vizioso@reddit

Wrote an ORM framework modeled fairly closely after hibernate for a custom database layer. The mistake I made was trying to idiot proof literally everything in the initial release. When you do this for something as ambiguous as an ORM, you realize there’s a lot of things to proof, and you start going down rabbit hole after rabbit hole. Stuff like cyclical dependency mapping for eager fetching was a big one that I tried to solve, and only stopped banging my head against the table when I realized that hibernate also got to a point where they said screw it and just let it run until the database errors out. To my credit I did something wherein I threw an error about cyclical mapping in the hopes something like that never saw the light of production.

[-]

spelunker@reddit

VERY early on in my career, insisting that rewriting one of the web apps to use the new hottest Java Enterprise tech because it will make life so much easier.

That was when I learned rewriting is almost never worth it!

[-]

foufers@reddit

Using a singleton pattern on a database object, and then forgetting about it until we added a replica database to the system. Application switched the connection string as needed. Could not figure out why records kept intermittently put into the wrong db

[-]

germansnowman@reddit

Using HTML as a text document format, and trying to write my own text editing logic on top. It’s so much more complex than people may think at first glance. (Also, HTML is a terrible markup language.)

[-]

SoggyGrayDuck@reddit

Yeah I highly recommend avoiding small software. I'm using yellow brick and I'm always frustrated they changed things from postgress. Sure some commands are simpler but I already learned the old ones!