Why Event-Driven Systems are Hard?

[-]

germansnowman@reddit

Off-topic, but it really bothers me even as a non-native speaker: Can people no longer ask questions correctly? I see this all the time in Reddit titles. It should either be “Why are event-driven systems hard?” or “Why event-driven systems are hard” as a statement.

[-]

imdrunkwhyustillugly@reddit

A more illustrious title would perhaps be

hard? Event-driven systems why why why

[-]

EqualDatabase@reddit

lol

[-]

Plank_With_A_Nail_In@reddit

What bothers me is supposed intelligent people getting faux confused over perfectly understandable English sentences. There is no confusion over what was being conveyed by this Title. The article's content works for both a statement or a question.

I think its just dullards wanting to mansplain the conventions of the English language under the guise of the rest of us not know them, news flash we all fucking know already.

Learning the common conventions (there are no rules) of the English language might have been the highlight of your life but for the rest of us they are trivial and not something we get so excited over.

[-]

JMBourguet@reddit

What bothers me is supposed intelligent people getting faux confused over perfectly understandable English sentences.

Non native speakers are both more susceptible to make some kind of errors and more sensible to the errors. The first is obvious. The second is because we wonder if the erroneous structure isn't something correct but we don't know about and thus bringing a change of meaning.

[-]

thesituation531@reddit

Grammar exists for a reason.

as long as the information gets communicated we are cool.

And proper grammar makes that easier.

[-]

germansnowman@reddit

I appreciate good writing and would like to see a high level of literacy in our society. Go ahead with your ad hominems and the watering down of standards; I will not be a part of that.

[-]

thesituation531@reddit

I'm a native a English speaker, and it greatly bothers me too.

[-]

AvidStressEnjoyer@reddit

There is a surge of second language English speakers moving into dev with varying English language skills.

All I know is that they speak more languages than me and do so more capably.

[-]

nerd5code@reddit

I prefer “Does it be that event-driven systems do be hard, or doesn’t do be doing being?” personally.

[-]

nepios83@reddit

Interestingly, in Chinese writing, embedded questions are supposed to have a trailing question-mark. Thus, one would write: "Yesterday he asked me why I bought a new car?"

[-]

germansnowman@reddit

That is indeed interesting, thanks!

[-]

CherryLongjump1989@reddit

The sentence was grammatically correct and is perfectly fine English.

[-]

germansnowman@reddit

No, it isn’t. If you put the “are” after the object, it makes it a statement. If you want to ask a question, the “are” must go before the object.

[-]

CherryLongjump1989@reddit

I realized it immediately after but Reddit's delete function is broken. They must be using events.

[-]

germansnowman@reddit

Fair enough

[-]

CichyK24@reddit

Probably because for non native speaker the wrong order in "Why Event-Driven Systems are Hard?" sound totally fine (especially if you native language allows such order), and you could keep asking question like that for you whole (English speaking) life and no one bothers to correct you. Really, the only place where I was corrected about such wrong order was when doing Duolingo and translating Spanish sentences to English :D

[-]

seunosewa@reddit

At some point it should be incorporated into the grammar.

[-]

CherryLongjump1989@reddit

It already is - and was. The sentence is grammatically fine.

[-]

bunk3rk1ng@reddit

If it wasn't phrased as a question it wouldn't get any clicks.

[-]

tao_of_emptiness@reddit

It’s just a sort of editorial/colloquial shorthand for “reasons why x is hard.”

[-]

germansnowman@reddit

That makes it even worse, as it looks even less than a question.

[-]

drislands@reddit

It's especially egregious because judging by the username, OP is associated with the website in the link. So they wrote it right once, then fucked it up on Reddit. What the hell?

[-]

germansnowman@reddit

As I wrote elsewhere, I did check the website when writing my original comment, and it matched the title. I think it has been edited since.

[-]

ForgettableUsername@reddit

If you deliberately make a minor spelling or grammatical error the title of a post, a certain number of people will rush to be the first to correct you. This counts as early engagement and boosts the visibility of your post.

[-]

ptoki@reddit

I think it is one of side products of language popularity across many other cultures.

You have to accept it probably. It indeed was a surprise to me that even natives started to ask questions in that non question form. I just concluded that this is something english got from the world in exchange of being popular.

And if you understand this form then it means its working.

[-]

NoInkling@reddit

I used to get annoyed by this too, but after experiencing what it's like to learn another language I just assume they're an ESL speaker and have become a lot more tolerant.

(I swear though, if someone talks about "web scrapping" one more time I might actually lose my sanity)

[-]

gyroda@reddit

(I swear though, if someone talks about "web scrapping" one more time I might actually lose my sanity)

Autocorrect and swipey keyboards on phones account for most of my typos. Often some very strange ones.

Fun side thing: one of the exam boards for the A level course in computing (OCR, in case anyone's curious) had a typo where they called it "disk threshing" rather than "disk thrashing". They were seemingly incapable of fixing this typo for years, as it would keep appearing in their exam papers over the years. I looked into it and the only people who were using the term were specifically making content for that exam.

[-]

germansnowman@reddit

I do understand that, but as an ESL speaker myself I feel I pay even more attention to English grammar than most native speakers. Not to say I don’t make mistakes, but I make a conscious effort not to import German grammar into English.

[-]

NSNick@reddit

The really hard rules are the ones native speakers don't realize are rules until they're broken. Things like:

Vowel sound order: e.g. "tick tock" sounds right, but "tock tick" sounds wrong.
Adjective order: e.g. "a beautiful small red gem" sounds right, but "a red small ball" sounds wrong.

[-]

GrinQuidam@reddit

The trick to English is all the rules are lies and if you understand what someone said, they're communicating correctly.

Properness is very static and does not accommodate the culture of language

[-]

HoushouCoder@reddit

Ironically, the actual title of the article is "Why are Event-Driven Systems Hard?" which is correct

[-]

germansnowman@reddit

I don’t think it was originally, I wish I had made a screenshot.

[-]

Immotommi@reddit

I think part of it is the fact that the statement is valid. People see the Why at the start of the sentence and think they need to include a question mark at the end

[-]

FullPoet@reddit

The level of literacy in the US (at least) is plummeting.

[-]

nemec@reddit

OP is not a native English speaker, either.

[-]

germansnowman@reddit

I expected as much.

[-]

OrchidLeader@reddit

If they have dyslexia, then yeah, it’s difficult knowing when they’ve swapped words around in a sentence like this.

I’m super paranoid about doing it and end up checking my wording several times, and I still sometimes get it wrong.

[-]

germansnowman@reddit

Fair enough. It seems to me though that most people never, ever check their titles.

[-]

RetiredApostle@reddit

Seems like a rhetorical question?

[-]

germansnowman@reddit

That does not matter – my point is that the grammar is wrong, rhetorical question or not.

[-]

Substantial-Reward70@reddit

That’s because you see languages as fixed rules that will be always the same, but it’s not, languages are constantly changing, people adopts terms and new words like trends and we even change the meaning of existing words, you adapt or be ready to be upset everyday listening/reading people…

I’m Spanish native so it’s the same with my language too…

[-]

wildjokers@reddit

Biggest challenge I have run across is event discovery. Haven’t yet found a good automated way for a service to document what events it fires and what events it cares about.

[-]

steven_dev42@reddit

God I’m running into this at my current job. A whole new influx of devs so I’m updating our eventing documentation. Thoroughly documenting which events are published and consumed by which micro services. But I just know in 6 months after implementing new features it will be out of date

[-]

hala102@reddit

I ve worked in similar environments. That’s why I decided to create a platform that does exactly that. Currently we delivered documenting GitHub repo but working on automating the whole workflow mapping of technical systems.

[-]

Reasonable-Steak-723@reddit

Totally. Do you have any ideas how this can be solved? I created an open source project called EventCatlog to help, but always looking at ways to make it better.

[-]

pkmn_is_fun@reddit

I like pact

We integrated as part of our test suit and because we test the actual publisher/consumer, theyre usually always up to date after theyre implemented.

[-]

imdrunkwhyustillugly@reddit

There's AsyncAPI, which is basically OpenAPI for events. One could have some kind of automation based on reading such a spec from a feed - a lazy option could be to just have a snapshot test in the consumer that fails on any changes to the document.

For tracking consumers, (OTEL) logging/metrics that includes message contract type, version, consumer. Some libraries (f.ex. NServiceBus, but think hard before you commit to a vendor lock-in) has this built-in.

Also, some transport topologies use a single-topic approach, where all events are published one place, and then fanned out to subscribers based on filter rules. So in theory one could read consumers bsser on those rules alone, but the granularity of said rules could be very coarse (wildcard namespace filters, for example).

[-]

ptoki@reddit

log all calls. ALL.

Then run a query on logs and ask what called what. You will not get full coverage but you will get everything what actually runs.

But you need to code the logging.

[-]

seunosewa@reddit

Sounds like what a profiler does.

[-]

ptoki@reddit

Yeah, but it may not be able to tell how frequently a function is used.

You would not run it on prod.

[-]

International_Cell_3@reddit

Discovery usually requires a duplex protocol and most event driven services don't have the notion of being both a source and sink for events. If you define a service such that it can always send and receive events then it's easy to add a "discovery" layer to each service, where they can first handshake before streaming events and include what events those services support.

The other option is to put a CRUD layer on top of the service, which is usually just nice for logging and management. So you can have your event stream doing its event streaming things while also having a REST API to query information about it (including metrics/telemetry/etc).

In the actual service implementation you have a method called register_event_type(...) or something that takes a description of the event, and send_event(...) needs to have an assertion failure if you try and send an event whose type was not registered so the programmer knows they fucked up when they debug in their test env.

[-]

zamN@reddit

Seems like good tracing would solve this? Trace your emit calls and handlers

[-]

Cualkiera67@reddit

The ones it cares about should be in a single file called subscriptions or something.

The ones it fires, you can create a file called pubs that exports a list of names. Then all calls to publish should use one of them

[-]

sarhoshamiral@reddit

One option would be to put all events in the same namespace across the libraries and rely on completion to enumerate them including documentation.

That way you dont have to keep extra documentation around.

[-]

atehrani@reddit

At my last job, this was the major hurdle.

Designing user interfaces that account for the delay.

Designers and PMs could not understand eventual consistency. They wanted to create UIs for a strongly consistent system (classic). These different paradigms do not integrate well.

[-]

notyourancilla@reddit

First question that pops to mind when I hear stuff like this is if product/design wanted to create something X why did engineering create Y?

Too often I see systems built based on what engineering wanted to create (distributed asynchronous messaging system) instead of what was needed (a simple crud app).

[-]

pelrun@reddit

There's a lot of "engineering created Y because product/design explicitly requested Y when actually wanting X" out there too.

[-]

grauenwolf@reddit

Where I work, the problem is that the Y in "product/design explicitly requested Y" is microservices, an event bus, and the top 3 product offerings from Azure or AWS>

[-]

mirvnillith@reddit

XSLT can generate any text. I’ve used it, professionally, to generate SQL for populating test data.

[-]

grauenwolf@reddit

SQL doesn't care about extra whitespace.

[-]

mirvnillith@reddit

True, but any ”unwanted” extra space would come from the data being transformed and not the text being added/injected/provided by XSLT. So it would be an input and not output problem.

[-]

grauenwolf@reddit

Still a problem.

[-]

mirvnillith@reddit

But not with XSLT being able to output XML. You can still have functions to sanitize spaces.

[-]

grauenwolf@reddit

Sure, if your goal is to output XML then XSLT is great.

My objection is in trying to force-fit it into all text processing tasks.

[-]

mirvnillith@reddit

The right tool for any job, surely. And XSLT is a tool for turning XML into something else, but not the only one.

[-]

josefx@reddit

XSLT, which doesn't give a damn about spaces because it generates XML.

Are you confusing XML with HTML? Whitespace may not be relevant to the XML structure itself, but the parser wont randomly strip spaces from your data.

[-]

grauenwolf@reddit

No, but it doesn't care much about randomly adding in spaces.

[-]

josefx@reddit

And you have examples of this happening were it isn't caused by the programmer?

[-]

sleepless-deadman@reddit

Also, it's generating flat files... just write a custom function to pad/truncate and call that for the fields? I don't see what the inherent issue in using XSLT is.

The only thing XSLT won't care about is extra whitespace outside the tags in the source, and if you have to care about that, it's not even XML, so I could understand the issue there.

[-]

grauenwolf@reddit

You sound like the manager who fired me and then wasted another 4 months failing to get it to work.

All the while ignore the working positional file generator that I offered instead.

[-]

sleepless-deadman@reddit

Sounds like he couldn't deliver. He should've chosen the working option instead if that was already compatible with your ecosystem.

My team does create xslts semi-regularly for data transforms, we mostly generate c/psvs but a few flat positional files as well. Never had a problem. But hey, don't know the context or how complicated mappings you needed.

[-]

I_AM_AN_AEROPLANE@reddit

Why does product / design have an opinion on how?! Thats insane.

[-]

CherryLongjump1989@reddit

Usually it's a knee-jerk reaction to incompetent engineering.

[-]

grauenwolf@reddit

Yes it us. But I work in the world of consulting, so the paycheck helps me swallow my professional pride.

[-]

nerd5code@reddit

I thought plaintext was one of the supported output formats? Though IDR whether that was a 2.0 addition or not, I guess, and anything whitespace-sensitive was extra-miserable to begin with.

[-]

grauenwolf@reddit

Plain text sure, but not 100% position sensitive plain text.

[-]

Head-Criticism-7401@reddit

Here it's the reverse. Engineering (me) wants to create a direct connection between the systems. Yet, some person in management has heard of event driven architecture, and now, we need to REWRITE our entire backend, and our 3 ERP systems for it.

The entire project is doomed, doomed from the start.

[-]

grauenwolf@reddit

CRUD is boring.

[-]

Asyncrosaurus@reddit

As soon as an Engineer starts a project with the phrase "wouldn't it be cool if...", expect an overengineered mess and colossal waste of dev hours to work on.

[-]

lemmsjid@reddit

Agreed. The limiting factor on a strongly consistent system is often (not always) cost. Because optimizing for cost adds complexity and slows down time to market, there should be a very clear negotiation with product on the decision making and tradeoffs.

[-]

Fiennes@reddit

See, this is why I like what Amazon does. You place an order, it confirms it after a brief check. Then, their back-end processes to their thing. If there's problems, you'll get an email about it.

[-]

atehrani@reddit

Agreed. Some websites do it well to the point where you don't notice it.

I tried to explain to them that e-mail is similar to an eventually consistent system. It just never stuck

[-]

throwaway490215@reddit

There are two paths towards "Senior engineer". Become irreplaceable, or learn how to put problems into words for others ~~to understand~~ to parrot without thinking about it.

[-]

RiverboatTurner@reddit

That's true for Senior Engineer without the air quotes. To be a "senior engineer" all you need is roughly 2.5 years of experience listed on your resume.

[-]

grauenwolf@reddit

My first job, other than some solo consulting, was as a senior analyst. I didn't need no 2.5 years experience.

[-]

gyroda@reddit

I feel attacked.

[-]

Tasgall@reddit

Please tell my manager(s) that 🙃

[-]

Cakeking7878@reddit

I think Walmart also does that but a while back I hit as issue where every time I placed an online order, it would place, then immediately cancel and send back to me with “there’s been an issue”. Sometimes I would like more of that up front processing to happen immediately so when I get a “it’s placed” message it’s actually locked in and not canceling randomly 20 minutes later with no explanation as to why

[-]

mattgen88@reddit

Amazons cart had a fun eventual consistency but for us a few months ago.

We had a large order of stuff pre tariffs. A bed frame for my daughter, some cabinets, bulk cleaners and what not. About 1k USD.

My wife went to check out. Pays. Comes back to the home screen and the cart was still populated as if she cancelled his order. So she tried again... 2k dollars later...

Few days later I'm flagging down the FedEx driver to refuse delivery of a second bed to try and get my money back because Amazon said they couldn't do anything about it.

[-]

Sweet_Television2685@reddit

opposite to my online food order, the platform confirmed restaurant started cooking, cancelled it later, turned out the restaurant had closed

some of those statuses are assumptions, end user wont know the difference

[-]

josefx@reddit

If there's problems, you'll get an email about it.

Getting a "payment confirmed" in the UI at the same time as a "your payment is fucked please fix" per email confused the hell out of me the first time I ran into it. Got the same result trying to "fix" it and gave up after several rounds. Turns out my card didn't have online transactions enabled, so no amount of "fixing" could make the transaction happen.

[-]

OneMillionSnakes@reddit

Yeah, sadly a lot of people want all the perks of eventual consistency, but are unwilling to accept any limitations.

[-]

rcls0053@reddit

People are so tuned to synchronous behavior that I'm currently working with a system where we use RabbitMQ for communication but somehow wrap asynchronous calls with sync RPC wrapper...

[-]

CherryLongjump1989@reddit

Because these two concepts have nothing to do with one another.

[-]

CpnStumpy@reddit

Seen people try this several times.

It's fucking asinine. It's always the dumbest worst thing ever and gets replaced by something shitty because even a shitty alternative ends up working better

[-]

CherryLongjump1989@reddit

This has to do with asynchronicity, it has nothing to do with eventing.

[-]

TwentyCharactersShor@reddit

I've had product people argue that you can make an async process synchronous. Something somewhere has to wait and no, i can't magic it to go any faster.

[-]

MarsupialMisanthrope@reddit

You can (and you can go the other way too), but you can’t fix the wait that’s the whole reason the call was made async in the first place.

I can do a lot of things in code, but instantaneous over the network ACID isn’t one of them.

[-]

MrBlackWolf@reddit

That's a very good point. Non technical people don't understand eventual consistency. Both users and business stakeholders. On the other side, engineering KPIs push for fast endpoints and high scalability.

[-]

troublemaker74@reddit

It's not horrible if you're using GraphQL (subscriptions) or listening to websocket events.

[-]

Careless_Detail_2318@reddit

To be fair, designers and PMs live off in some fairytale land of their own making and don't understand the practical side of things

[-]

rom_romeo@reddit

If I learned one thing about the UI and the eventual consistency, it could be probably summed up in this sentence: You can either lie and be fast, or “tell the truth” and be slower.

[-]

ZukowskiHardware@reddit

Live view solves that. What you are explaining is more a problem of JavaScript and react where you have to explicitly define every component that needs to update.

[-]

Fiennes@reddit

Javascript has nothing to do with it, I think you misunderstand the process.

[-]

pikapp336@reddit

That’s not how that works

[-]

duderduderes@reddit

None of these are problems exclusively of event driven systems. Microservices suffer from all the exacts same issues: breaking API changes, debugging across many service boundaries, retries and dropping calls. And all the same strategies for handling these issues apply across both.

The real reason to use one or the other is if you want to decouple processing from action.

[-]

svix_ftw@reddit

aren't many microservices event driven tho?

Synchronous microservices I think are less common, since you can just go monolith at that point.

[-]

CherryLongjump1989@reddit

If you just want to shove things into a queue to handle them later, you just need a queue. You don't need events.

[-]

duderduderes@reddit

Let me rephrase. Events are good at decoupling something happening from the processing of that thing into some action or business process as those processes can be long running, asynchronous, varied (1:N) so it tends to better evoke the contract between systems.

[-]

CherryLongjump1989@reddit

Decoupling is a tricky business because it has a specific criteria that must be met. In the most loose sense, is about reducing the number of assumptions one component makes about another in order to function. So how does eventing meet that criteria? If anything, it makes it worse. Why?

You're taking something that is a business logic concern and you're placing it into the infrastructure, at the service boundary. So now, instead of a service implementing a queue internally and exposing it through an API, it forces everyone else to communicate via some vendor-specific messaging implementation. Which has all sorts of nasty implications for coupling.

Second, by shoving data into service boundaries, you are now coupling these services across time. Instead of one component owning its own schema for an internal queue that it fully owns, you've now got multiple components that must be aware of schema evolution -- which couples them, in some cases, literally to the deployment schedule of every other service that is consuming or producing events at this service boundary.

We could go on all day - but I just don't see decoupling as a real thing here.

[-]

MWilbon9@reddit

Interesting take

[-]

CherryLongjump1989@reddit

I’m interested as to why? To me it seems obvious - like one of those things that you can’t unsee after you see it. I might also point out that the ability to perform tasks asynchronously is not “decoupling”, otherwise cron jobs would be considered decoupling. The sort of idea that one network request means coupling, but two network requests means decoupling, is a mental model that I can’t wrap my head around.

[-]

CherryLongjump1989@reddit

It's just common sense.

[-]

Optimal_Platypus1910@reddit

Event-driven systems are hard because they require you to think in terms of asynchronous flows, not simple step-by-step logic. Debugging becomes tricky since events may trigger in unexpected orders, and tracking state across multiple services is challenging. On top of that, you need robust monitoring and error handling to avoid silent failures. That’s why many teams look for eco event solutions that simplify orchestration, observability, and scalability, so the system remains efficient and sustainable in the long run.

[-]

maxinstuff@reddit

I find this mostly becomes a problem when UX expectations are naively mapped onto architecture/technical implementation. Your users should not have to think about this, and your engineers should not naively map what users say onto the architecture.

In fact, you should never have to explain to a user what “eventual consistency” is - if you find yourself having this discussion, it’s probably already gone off the rails.

Their experience should just be that the application works.

An action should simply complete fast enough that my next dependent action can see that change faster than I can perform it — that’s the only requirement. As far as the user is concerned, that is “realtime time”.

[-]

SquirrelOtherwise723@reddit

Distributed System are hard.

[-]

pauloyasu@reddit

as a former gamedev now working on enterprise bs development because it pays more, work less and is orders of magnitude easier, event driven is a breeze

[-]

CherryLongjump1989@reddit

Events ≠ message queues. He treats “event-driven” as if it’s a property of the infrastructure (“we have RabbitMQ → we are event-driven”). Wrong. TCP, pipes, sockets, whatever — they’re all asynchronous message systems. Eventing is just a way you choose to interpret messages.
Schema versioning is not unique to eventing.

You add/remove fields? That’s API evolution.

gRPC, REST, protobufs, JSON APIs all have the exact same problem. He’s smuggling a general distributed systems problem under the “event-driven is hard” banner.

Observability/debugging again isn’t special.

Correlation IDs exist in RPC tracing, too.

The “string of calls vs. cut-up events” is just tracing in a fan-out system.

This isn’t an eventing issue, it’s any distributed system issue.

Failures, retries, DLQs. That’s queue semantics. They show up whether you call your messages “events,” “jobs,” or “requests.” Nothing event-specific here.
Idempotency. Same deal: RPC calls must be idempotent if retried. This isn’t eventing, it’s networking.
Eventual consistency. Again, not unique to event-driven. Any system with multiple data copies faces it. He’s acting like it’s an inherent tax of “event-driven,” when in reality it’s the tax of distribution.

[-]

Ok_Dust_8620@reddit

Agree - these problems aren’t unique to event-driven architecture. The point is that they become pretty much unavoidable once you choose events and this level of indirection between services. With a distributed system using RPCs, you can, for example, still have strong consistency if your database architecture supports it. So it’s more like: these are problems you’ll definitely encounter - not that other architectures can’t introduce similar challenges.

[-]

CherryLongjump1989@reddit

With a distributed system ~~using RPCs~~, you can, for example, still have strong consistency if your database architecture supports it.

It does not make a difference if you are using an RPC or an event.

[-]

Ok_Dust_8620@reddit

With events, besides using backward-compatible schema updates (which aren’t always possible), you could also maintain multiple streams - similar to how we often support several versions of the same API, at least during the migration period until all clients are on the latest version.

[-]

Ok-Breakfast-3742@reddit

Not if you spend time to construct a state diagram to understand the system as the first step. I’ve done it plenty.

[-]

EasyBig9261@reddit

The first part about formatting is simply bullshit.. For example in Java, you can configure your object mapper in Java to not fail on extra fields.

[-]

sickhippie@reddit

That would be the "backwards compatibility" strategy he talks about right after it. It's not "bullshit" really. If a downstream consumer gets a field it doesn't expect, it won't know what to do with it. It can be configured to skip it, but that's not really "doing" something with it. If it expects a field that doesn't come in or comes in with the wrong data type, though? That's where the bugs really start to flow.

[-]

Rambo_11@reddit

They're not.

Workflows are hard.

[-]

_predator_@reddit

It's very rare to be event-driven and not require sagas, or is my perception just skewed? The very basic order shipping use case that people love to use for EDA demos would be a hot mess for everything but the happy path.

[-]

grauenwolf@reddit

I use events such as "Hey background process, wake up and go check the database. There's work to be done." or for sending pricing updates to a desktop application.

The idiots at my work want to use it for "I'm the UI and I want the first 10 customer records."

[-]

Few_Source6822@reddit

It's very rare to be event-driven and not require sagas, or is my perception just skewed?

I'd draw a distinction between "require from a technical standpoint to ensure sane transaction management" and "required as a way to ensure we are able to consistently present a clean user experience that matches their expectations and doesn't lead to us needing to support the consequences of downstream problems with our support teams".

In my experience, having worked at companies both small and large, you might be surprised at how many organizations simply don't even bother with things like sagas or two-phase commits as a way to build distributed systems and instead just... kind of wing it. In my experience, plenty of organizations just kind of wing it and are happy getting the benefits of the looser coupling between systems without dealing with the mess of consequences that come with not fully managing those interactions sanely. Sometimes just getting your teams to be more autonomous and not dead end your user with an ugly error is good enough over making sure that what you're presenting to them is actually correct.

I'm not defending it.

[-]

markoNako@reddit

So they would just let the systems continue to work without consistency guarantee? I wonder in such cases wouldn't that bring some serious bugs and issues in the application? I assume that also the type of work the app is doing is also very important ( in finance and healthcare that would be disaster) compared to something else where mostly availability is important but even then it's hard to imagine for me how that actually works

[-]

ptoki@reddit

So they would just let the systems continue to work without consistency guarantee?

Sometimes good enough and we will tackle this if it becomes a problem works well enough that nobody cares.

Because the issue may happen just 3 times a year and with all the other issues it will be 30 times a year, fixable by human.

The extreme case is like skip the dishes or uber where it seems the edgecases and unexpected scenarios happen in like 30% of times...

[-]

Few_Source6822@reddit

I wonder in such cases wouldn't that bring some serious bugs and issues in the application?

It sure can. Not every bug or problem is as reputation damaging as the example you laid out, like a bank not properly recording your paycheck being deposited or a doctor's cancer diagnosis and notes not being added to your chart such that your regular doctor can coordinate with your oncologist.

Fact is, if you've got a product that people want to use, they'll actually tolerate more problems than you might think. I've seen companies literally factor in error rates and customer churn into their business model over problems that at their core could be addressed by more robust distributed transaction handling, but it just made more sense to prioritize other work, or it was too hard/time consuming to build up staff to learn how to do more advanced handling.

That's what customer support teams that issue credits/refunds are for. And ultimately, for many businesses they know they're going to need them anyway so they'ld rather just use them and focus on other things. Sometimes if the problem is bad enough, a dev or two gets tagged in to build a more specific list of impacted users and a sense of the impact to help fix it.

Things like sagas are hard not just because they're a more advanced engineering problem, but often times because what you actually need in your saga is happening between teams, and that coordination is not obvious for many organizations out there.

[-]

Deep-Thought@reddit

I think there's an argument to be made that there are some cases where using sagas/orchestration slows you down enough that given the tiny amount of affected requests, it can make business sense to just swallow the financial impact of any paying back for any errors instead.

[-]

Few_Source6822@reddit

Oh for sure.

The example I was thinking of was a company that knew that it should but simply didn't/couldn't because coordinating between teams was too difficult. I suspect that's often the more common reason why that doesn't happen.

[-]

BosonCollider@reddit

You can use a message bus with transactional semantics to simplify the error handling in some cases, especially if your scale is small enough that you can just use something like pgmq and use postgres for both queues and relational data.

Alternatively if your language has a good concurrency story you can have a big coroutine procedure do the whole thing instead of breaking it up. The trend in most programming languages has been to replace event driven programming with breakpoints in "normal" synchronous functions. Imo something similar will eventually happen to EDA on top of a broker, apache pulsar has a really nice concept of pulsar functions for example.

[-]

ptoki@reddit

Not really.

The key is usually either an arbiter (single entity solving the collisions/conflicts) or a form of subscription where even if something is missing now it will be delivered/created later and the flow will be able to continue.

Just extra steps but not locally in code but somewhere else.

The challenge is in predicting if the used flow/technology can handle all the edge cases or limiting those. Which is usually a non coding problem and just requires some businessman beating.

[-]

RetiredApostle@reddit

Sagas for sagas are harder.

[-]

scruffles360@reddit

We solved this problem in a unique way: services are configured to receive messages by specifying a target (usually sns) and a graphql subscription query. Each service is getting their own data format as requested. We can consult the configuration when making api changes to see which apps would be affected. Haven’t seen any problems since we launched it at least 5 years ago

[-]

drislands@reddit

OP, why did you change the title to be grammatically incorrect for the reddit post when it's correct in the article?

[-]

farsightxr20@reddit

Every system is event-driven. At the OS internals level, it's all events in the form of messages to/from hardware devices (keyboard, network, etc.).

On top of these low-level events we build higher-level abstractions based on semantic relationships between events. Good abstractions simplify reasoning and information flow in the majority of cases, e.g. you don't need to think about the TCP handshake process or congestion control when you request a file from the network, it's all just one higher-level fetch operation. There will always be niche cases that benefit from lower-level control, which requires breaking the abstraction and ideally, introducing a new purpose-built abstraction so that complexity doesn't proliferate through the entire system.

The mistake I see most often is people starting with events and never building any higher abstraction (massive spaghetti). An "event-driven" architecture is often just a euphemism for "no architecture".

[-]

davidalayachew@reddit

They aren't hard, they just scale in complexity about as well as they scale in performance. Imo, they're just completely over-valued as a solution for performance/throughput problems.

Event-driven systems exchange simplicity for throughput/performance, like the article said. Several things that you get "for free" in a Strongly Consistent setup, you have to either abandon or recreate in an Eventually Consistent setup.

The problem is, people see the pretty performance numbers of Eventual consistency, then assume that the cost of abandoning or recreating some of the necessary benefits of Strong Consistency is small in comparison. It's not, and the cost shoots up very quickly. Even moreso when you are distributed.

The article lists an example -- the concept of a Correlation ID. This is an example of recreating the benefit you would get from a simple stack trace (to use Java terminology) if you were Strongly Consistent.

And while implementing and enforcing a Correlation ID is quite easy, weaving all of the relevant events with the same Correlation ID together into a single tree view (again, recreating a benefit) can range from non-trivial to quite difficult. It's not just SELECT * FROM EVENT_TABLE WHERE CORRELATION_ID = '123'. It's also being able to identify the parent-child relationship between each task that causes things to be messy. Identifying the parent-child relationship with Strong consistency is almost free.

So, again -- it's a game of tradeoffs. It's just that the costs are not that obvious, hence why I think this programming style is overblown. People get into it for genuinely good reasons, make bad estimates about the costs until later, and then it's the sunk cost fallacy until things become untenable.

Imo, event-driven systems are at their best when the Cartesian Product between possible type of events and possible queues is "low".

For example, in most UI Frameworks, there is usually an event queue, which is a single queue that processes all user interactions for the entire GUI. Cool, 1 multiplied by X is X, so as long as you don't have too many of X (different types of events), then this gives you both good performance and a relatively simple user model.

Alternatively, if your situation demands many events and many queues, then using a State Transition Diagram to model your whole system's state, where certain events can ONLY originate from one system state, makes even a giant number of events and queues not too hard to wrangle.

To explain it in simpler terms, you can actually have many queues and many events, but as long as they are siloed off such that only ABC-related Events touch ABC-related queues, you can keep the complexity quite low. That's because you'd be summing up the Cartesian product of each "domain" (in this case, ABC). And if the sum total of all those Cartesian products is still "low", then you're golden. Just beware crossing the wires. Once you have too many couplings, it's not the sum of 2 Cartesian products anymore, it's just one big one that you need to consider. That's because these 2 domains are no longer separate, but 1 kind-of-coupled jumbo domain

So again -- it's all about tradeoffs. Just know that it's not a silver bullet for your performance problems. Use it only if you know that you can avoid the costs of it easily, even far into the future.

[-]

VictoryMotel@reddit

Why this thing that not true?

[-]

WhyJustSlightly@reddit

skill issue?