None of these are problems exclusively of event driven systems. Microservices suffer from all the exacts same issues: breaking API changes, debugging across many service boundaries, retries and dropping calls. And all the same strategies for handling these issues apply across both.
The real reason to use one or the other is if you want to decouple processing from action.
Let me rephrase. Events are good at decoupling something happening from the processing of that thing into some action or business process as those processes can be long running, asynchronous, varied (1:N) so it tends to better evoke the contract between systems.
Decoupling is a tricky business because it has a specific criteria that must be met. In the most loose sense, is about reducing the number of assumptions one component makes about another in order to function. So how does eventing meet that criteria? If anything, it makes it worse. Why?
You're taking something that is a business logic concern and you're placing it into the infrastructure, at the service boundary. So now, instead of a service implementing a queue internally and exposing it through an API, it forces everyone else to communicate via some vendor-specific messaging implementation. Which has all sorts of nasty implications for coupling.
Second, by shoving data into service boundaries, you are now coupling these services across time. Instead of one component owning its own schema for an internal queue that it fully owns, you've now got multiple components that must be aware of schema evolution -- which couples them, in some cases, literally to the deployment schedule of every other service that is consuming or producing events at this service boundary.
We could go on all day - but I just don't see decoupling as a real thing here.
Biggest challenge I have run across is event discovery. Haven’t yet found a good automated way for a service to document what events it fires and what events it cares about.
Discovery usually requires a duplex protocol and most event driven services don't have the notion of being both a source and sink for events. If you define a service such that it can always send and receive events then it's easy to add a "discovery" layer to each service, where they can first handshake before streaming events and include what events those services support.
The other option is to put a CRUD layer on top of the service, which is usually just nice for logging and management. So you can have your event stream doing its event streaming things while also having a REST API to query information about it (including metrics/telemetry/etc).
In the actual service implementation you have a method called register_event_type(...) or something that takes a description of the event, and send_event(...) needs to have an assertion failure if you try and send an event whose type was not registered so the programmer knows they fucked up when they debug in their test env.
Totally. Do you have any ideas how this can be solved? I created an open source project called EventCatlog to help, but always looking at ways to make it better.
There's AsyncAPI, which is basically OpenAPI for events. One could have some kind of automation based on reading such a spec from a feed - a lazy option could be to just have a snapshot test in the consumer that fails on any changes to the document.
For tracking consumers, (OTEL) logging/metrics that includes message contract type, version, consumer. Some libraries (f.ex. NServiceBus, but think hard before you commit to a vendor lock-in) has this built-in.
Also, some transport topologies use a single-topic approach, where all events are published one place, and then fanned out to subscribers based on filter rules. So in theory one could read consumers bsser on those rules alone, but the granularity of said rules could be very coarse (wildcard namespace filters, for example).
as a former gamedev now working on enterprise bs development because it pays more, work less and is orders of magnitude easier, event driven is a breeze
Off-topic, but it really bothers me even as a non-native speaker: Can people no longer ask questions correctly? I see this all the time in Reddit titles. It should either be “Why are event-driven systems hard?” or “Why event-driven systems are hard” as a statement.
What bothers me is supposed intelligent people getting faux confused over perfectly understandable English sentences. There is no confusion over what was being conveyed by this Title. The article's content works for both a statement or a question.
I think its just dullards wanting to mansplain the conventions of the English language under the guise of the rest of us not know them, news flash we all fucking know already.
Learning the common conventions (there are no rules) of the English language might have been the highlight of your life but for the rest of us they are trivial and not something we get so excited over.
What bothers me is supposed intelligent people getting faux confused over perfectly understandable English sentences.
Non native speakers are both more susceptible to make some kind of errors and more sensible to the errors. The first is obvious. The second is because we wonder if the erroneous structure isn't something correct but we don't know about and thus bringing a change of meaning.
I appreciate good writing and would like to see a high level of literacy in our society. Go ahead with your ad hominems and the watering down of standards; I will not be a part of that.
Interestingly, in Chinese writing, embedded questions are supposed to have a trailing question-mark. Thus, one would write: "Yesterday he asked me why I bought a new car?"
Probably because for non native speaker the wrong order in "Why Event-Driven Systems are Hard?" sound totally fine (especially if you native language allows such order), and you could keep asking question like that for you whole (English speaking) life and no one bothers to correct you. Really, the only place where I was corrected about such wrong order was when doing Duolingo and translating Spanish sentences to English :D
It's especially egregious because judging by the username, OP is associated with the website in the link. So they wrote it right once, then fucked it up on Reddit. What the hell?
If you deliberately make a minor spelling or grammatical error the title of a post, a certain number of people will rush to be the first to correct you. This counts as early engagement and boosts the visibility of your post.
I think it is one of side products of language popularity across many other cultures.
You have to accept it probably. It indeed was a surprise to me that even natives started to ask questions in that non question form. I just concluded that this is something english got from the world in exchange of being popular.
And if you understand this form then it means its working.
I used to get annoyed by this too, but after experiencing what it's like to learn another language I just assume they're an ESL speaker and have become a lot more tolerant.
(I swear though, if someone talks about "web scrapping" one more time I might actually lose my sanity)
(I swear though, if someone talks about "web scrapping" one more time I might actually lose my sanity)
Autocorrect and swipey keyboards on phones account for most of my typos. Often some very strange ones.
Fun side thing: one of the exam boards for the A level course in computing (OCR, in case anyone's curious) had a typo where they called it "disk threshing" rather than "disk thrashing". They were seemingly incapable of fixing this typo for years, as it would keep appearing in their exam papers over the years. I looked into it and the only people who were using the term were specifically making content for that exam.
I do understand that, but as an ESL speaker myself I feel I pay even more attention to English grammar than most native speakers. Not to say I don’t make mistakes, but I make a conscious effort not to import German grammar into English.
I think part of it is the fact that the statement is valid. People see the Why at the start of the sentence and think they need to include a question mark at the end
That’s because you see languages as fixed rules that will be always the same, but it’s not, languages are constantly changing, people adopts terms and new words like trends and we even change the meaning of existing words, you adapt or be ready to be upset everyday listening/reading people…
I’m Spanish native so it’s the same with my language too…
Designing user interfaces that account for the delay.
Designers and PMs could not understand eventual consistency. They wanted to create UIs for a strongly consistent system (classic). These different paradigms do not integrate well.
First question that pops to mind when I hear stuff like this is if product/design wanted to create something X why did engineering create Y?
Too often I see systems built based on what engineering wanted to create (distributed asynchronous messaging system) instead of what was needed (a simple crud app).
Here it's the reverse. Engineering (me) wants to create a direct connection between the systems. Yet, some person in management has heard of event driven architecture, and now, we need to REWRITE our entire backend, and our 3 ERP systems for it.
The entire project is doomed, doomed from the start.
Where I work, the problem is that the Y in "product/design explicitly requested Y" is microservices, an event bus, and the top 3 product offerings from Azure or AWS>
I thought plaintext was one of the supported output formats? Though IDR whether that was a 2.0 addition or not, I guess, and anything whitespace-sensitive was extra-miserable to begin with.
Also, it's generating flat files... just write a custom function to pad/truncate and call that for the fields? I don't see what the inherent issue in using XSLT is.
The only thing XSLT won't care about is extra whitespace outside the tags in the source, and if you have to care about that, it's not even XML, so I could understand the issue there.
As soon as an Engineer starts a project with the phrase "wouldn't it be cool if...", expect an overengineered mess and colossal waste of dev hours to work on.
Agreed. The limiting factor on a strongly consistent system is often (not always) cost. Because optimizing for cost adds complexity and slows down time to market, there should be a very clear negotiation with product on the decision making and tradeoffs.
See, this is why I like what Amazon does. You place an order, it confirms it after a brief check. Then, their back-end processes to their thing. If there's problems, you'll get an email about it.
There are two paths towards "Senior engineer". Become irreplaceable, or learn how to put problems into words for others ~~to understand~~ to parrot without thinking about it.
That's true for Senior Engineer without the air quotes. To be a "senior engineer" all you need is roughly 2.5 years of experience listed on your resume.
I think Walmart also does that but a while back I hit as issue where every time I placed an online order, it would place, then immediately cancel and send back to me with “there’s been an issue”. Sometimes I would like more of that up front processing to happen immediately so when I get a “it’s placed” message it’s actually locked in and not canceling randomly 20 minutes later with no explanation as to why
Amazons cart had a fun eventual consistency but for us a few months ago.
We had a large order of stuff pre tariffs. A bed frame for my daughter, some cabinets, bulk cleaners and what not. About 1k USD.
My wife went to check out. Pays. Comes back to the home screen and the cart was still populated as if she cancelled his order. So she tried again... 2k dollars later...
Few days later I'm flagging down the FedEx driver to refuse delivery of a second bed to try and get my money back because Amazon said they couldn't do anything about it.
If there's problems, you'll get an email about it.
Getting a "payment confirmed" in the UI at the same time as a "your payment is fucked please fix" per email confused the hell out of me the first time I ran into it. Got the same result trying to "fix" it and gave up after several rounds. Turns out my card didn't have online transactions enabled, so no amount of "fixing" could make the transaction happen.
People are so tuned to synchronous behavior that I'm currently working with a system where we use RabbitMQ for communication but somehow wrap asynchronous calls with sync RPC wrapper...
It's fucking asinine. It's always the dumbest worst thing ever and gets replaced by something shitty because even a shitty alternative ends up working better
I've had product people argue that you can make an async process synchronous. Something somewhere has to wait and no, i can't magic it to go any faster.
That's a very good point. Non technical people don't understand eventual consistency. Both users and business stakeholders. On the other side, engineering KPIs push for fast endpoints and high scalability.
If I learned one thing about the UI and the eventual consistency, it could be probably summed up in this sentence: You can either lie and be fast, or “tell the truth” and be slower.
Live view solves that. What you are explaining is more a problem of JavaScript and react where you have to explicitly define every component that needs to update.
Events ≠ message queues.
He treats “event-driven” as if it’s a property of the infrastructure (“we have RabbitMQ → we are event-driven”). Wrong. TCP, pipes, sockets, whatever — they’re all asynchronous message systems. Eventing is just a way you choose to interpret messages.
Schema versioning is not unique to eventing.
You add/remove fields? That’s API evolution.
gRPC, REST, protobufs, JSON APIs all have the exact same problem.
He’s smuggling a general distributed systems problem under the “event-driven is hard” banner.
Observability/debugging again isn’t special.
Correlation IDs exist in RPC tracing, too.
The “string of calls vs. cut-up events” is just tracing in a fan-out system.
This isn’t an eventing issue, it’s any distributed system issue.
Failures, retries, DLQs.
That’s queue semantics. They show up whether you call your messages “events,” “jobs,” or “requests.” Nothing event-specific here.
Idempotency.
Same deal: RPC calls must be idempotent if retried. This isn’t eventing, it’s networking.
Eventual consistency.
Again, not unique to event-driven. Any system with multiple data copies faces it. He’s acting like it’s an inherent tax of “event-driven,” when in reality it’s the tax of distribution.
Agree - these problems aren’t unique to event-driven architecture. The point is that they become pretty much unavoidable once you choose events and this level of indirection between services. With a distributed system using RPCs, you can, for example, still have strong consistency if your database architecture supports it. So it’s more like: these are problems you’ll definitely encounter - not that other architectures can’t introduce similar challenges.
With events, besides using backward-compatible schema updates (which aren’t always possible), you could also maintain multiple streams - similar to how we often support several versions of the same API, at least during the migration period until all clients are on the latest version.
That would be the "backwards compatibility" strategy he talks about right after it. It's not "bullshit" really. If a downstream consumer gets a field it doesn't expect, it won't know what to do with it. It can be configured to skip it, but that's not really "doing" something with it. If it expects a field that doesn't come in or comes in with the wrong data type, though? That's where the bugs really start to flow.
It's very rare to be event-driven and not require sagas, or is my perception just skewed? The very basic order shipping use case that people love to use for EDA demos would be a hot mess for everything but the happy path.
I use events such as "Hey background process, wake up and go check the database. There's work to be done." or for sending pricing updates to a desktop application.
The idiots at my work want to use it for "I'm the UI and I want the first 10 customer records."
It's very rare to be event-driven and not require sagas, or is my perception just skewed?
I'd draw a distinction between "require from a technical standpoint to ensure sane transaction management" and "required as a way to ensure we are able to consistently present a clean user experience that matches their expectations and doesn't lead to us needing to support the consequences of downstream problems with our support teams".
In my experience, having worked at companies both small and large, you might be surprised at how many organizations simply don't even bother with things like sagas or two-phase commits as a way to build distributed systems and instead just... kind of wing it. In my experience, plenty of organizations just kind of wing it and are happy getting the benefits of the looser coupling between systems without dealing with the mess of consequences that come with not fully managing those interactions sanely. Sometimes just getting your teams to be more autonomous and not dead end your user with an ugly error is good enough over making sure that what you're presenting to them is actually correct.
So they would just let the systems continue to work without consistency guarantee? I wonder in such cases wouldn't that bring some serious bugs and issues in the application? I assume that also the type of work the app is doing is also very important ( in finance and healthcare that would be disaster) compared to something else where mostly availability is important but even then it's hard to imagine for me how that actually works
I wonder in such cases wouldn't that bring some serious bugs and issues in the application?
It sure can. Not every bug or problem is as reputation damaging as the example you laid out, like a bank not properly recording your paycheck being deposited or a doctor's cancer diagnosis and notes not being added to your chart such that your regular doctor can coordinate with your oncologist.
Fact is, if you've got a product that people want to use, they'll actually tolerate more problems than you might think. I've seen companies literally factor in error rates and customer churn into their business model over problems that at their core could be addressed by more robust distributed transaction handling, but it just made more sense to prioritize other work, or it was too hard/time consuming to build up staff to learn how to do more advanced handling.
That's what customer support teams that issue credits/refunds are for. And ultimately, for many businesses they know they're going to need them anyway so they'ld rather just use them and focus on other things. Sometimes if the problem is bad enough, a dev or two gets tagged in to build a more specific list of impacted users and a sense of the impact to help fix it.
Things like sagas are hard not just because they're a more advanced engineering problem, but often times because what you actually need in your saga is happening between teams, and that coordination is not obvious for many organizations out there.
I think there's an argument to be made that there are some cases where using sagas/orchestration slows you down enough that given the tiny amount of affected requests, it can make business sense to just swallow the financial impact of any paying back for any errors instead.
The example I was thinking of was a company that knew that it should but simply didn't/couldn't because coordinating between teams was too difficult. I suspect that's often the more common reason why that doesn't happen.
You can use a message bus with transactional semantics to simplify the error handling in some cases, especially if your scale is small enough that you can just use something like pgmq and use postgres for both queues and relational data.
Alternatively if your language has a good concurrency story you can have a big coroutine procedure do the whole thing instead of breaking it up. The trend in most programming languages has been to replace event driven programming with breakpoints in "normal" synchronous functions. Imo something similar will eventually happen to EDA on top of a broker, apache pulsar has a really nice concept of pulsar functions for example.
The key is usually either an arbiter (single entity solving the collisions/conflicts) or a form of subscription where even if something is missing now it will be delivered/created later and the flow will be able to continue.
Just extra steps but not locally in code but somewhere else.
The challenge is in predicting if the used flow/technology can handle all the edge cases or limiting those. Which is usually a non coding problem and just requires some businessman beating.
We solved this problem in a unique way: services are configured to receive messages by specifying a target (usually sns) and a graphql subscription query. Each service is getting their own data format as requested. We can consult the configuration when making api changes to see which apps would be affected. Haven’t seen any problems since we launched it at least 5 years ago
Every system is event-driven. At the OS internals level, it's all events in the form of messages to/from hardware devices (keyboard, network, etc.).
On top of these low-level events we build higher-level abstractions based on semantic relationships between events. Good abstractions simplify reasoning and information flow in the majority of cases, e.g. you don't need to think about the TCP handshake process or congestion control when you request a file from the network, it's all just one higher-level fetch operation. There will always be niche cases that benefit from lower-level control, which requires breaking the abstraction and ideally, introducing a new purpose-built abstraction so that complexity doesn't proliferate through the entire system.
The mistake I see most often is people starting with events and never building any higher abstraction (massive spaghetti). An "event-driven" architecture is often just a euphemism for "no architecture".
They aren't hard, they just scale in complexity about as well as they scale in performance. Imo, they're just completely over-valued as a solution for performance/throughput problems.
Event-driven systems exchange simplicity for throughput/performance, like the article said. Several things that you get "for free" in a Strongly Consistent setup, you have to either abandon or recreate in an Eventually Consistent setup.
The problem is, people see the pretty performance numbers of Eventual consistency, then assume that the cost of abandoning or recreating some of the necessary benefits of Strong Consistency is small in comparison. It's not, and the cost shoots up very quickly. Even moreso when you are distributed.
The article lists an example -- the concept of a Correlation ID. This is an example of recreating the benefit you would get from a simple stack trace (to use Java terminology) if you were Strongly Consistent.
And while implementing and enforcing a Correlation ID is quite easy, weaving all of the relevant events with the same Correlation ID together into a single tree view (again, recreating a benefit) can range from non-trivial to quite difficult. It's not just SELECT * FROM EVENT_TABLE WHERE CORRELATION_ID = '123'. It's also being able to identify the parent-child relationship between each task that causes things to be messy. Identifying the parent-child relationship with Strong consistency is almost free.
So, again -- it's a game of tradeoffs. It's just that the costs are not that obvious, hence why I think this programming style is overblown. People get into it for genuinely good reasons, make bad estimates about the costs until later, and then it's the sunk cost fallacy until things become untenable.
Imo, event-driven systems are at their best when the Cartesian Product between possible type of events and possible queues is "low".
For example, in most UI Frameworks, there is usually an event queue, which is a single queue that processes all user interactions for the entire GUI. Cool, 1 multiplied by X is X, so as long as you don't have too many of X (different types of events), then this gives you both good performance and a relatively simple user model.
Alternatively, if your situation demands many events and many queues, then using a State Transition Diagram to model your whole system's state, where certain events can ONLY originate from one system state, makes even a giant number of events and queues not too hard to wrangle.
To explain it in simpler terms, you can actually have many queues and many events, but as long as they are siloed off such that only ABC-related Events touch ABC-related queues, you can keep the complexity quite low. That's because you'd be summing up the Cartesian product of each "domain" (in this case, ABC). And if the sum total of all those Cartesian products is still "low", then you're golden. Just beware crossing the wires. Once you have too many couplings, it's not the sum of 2 Cartesian products anymore, it's just one big one that you need to consider. That's because these 2 domains are no longer separate, but 1 kind-of-coupled jumbo domain
So again -- it's all about tradeoffs. Just know that it's not a silver bullet for your performance problems. Use it only if you know that you can avoid the costs of it easily, even far into the future.
Qt’s signals and slots mechanism deal with many of the issues discussed in the article (e.g. signal signatures declare argument types and any mismatches are compile-time errors) for C++ and Python.
Curious if there are any JS frameworks out there that use this mechanism?
duderduderes@reddit
None of these are problems exclusively of event driven systems. Microservices suffer from all the exacts same issues: breaking API changes, debugging across many service boundaries, retries and dropping calls. And all the same strategies for handling these issues apply across both.
The real reason to use one or the other is if you want to decouple processing from action.
CherryLongjump1989@reddit
If you just want to shove things into a queue to handle them later, you just need a queue. You don't need events.
duderduderes@reddit
Let me rephrase. Events are good at decoupling something happening from the processing of that thing into some action or business process as those processes can be long running, asynchronous, varied (1:N) so it tends to better evoke the contract between systems.
CherryLongjump1989@reddit
Decoupling is a tricky business because it has a specific criteria that must be met. In the most loose sense, is about reducing the number of assumptions one component makes about another in order to function. So how does eventing meet that criteria? If anything, it makes it worse. Why?
You're taking something that is a business logic concern and you're placing it into the infrastructure, at the service boundary. So now, instead of a service implementing a queue internally and exposing it through an API, it forces everyone else to communicate via some vendor-specific messaging implementation. Which has all sorts of nasty implications for coupling.
Second, by shoving data into service boundaries, you are now coupling these services across time. Instead of one component owning its own schema for an internal queue that it fully owns, you've now got multiple components that must be aware of schema evolution -- which couples them, in some cases, literally to the deployment schedule of every other service that is consuming or producing events at this service boundary.
We could go on all day - but I just don't see decoupling as a real thing here.
wildjokers@reddit
Biggest challenge I have run across is event discovery. Haven’t yet found a good automated way for a service to document what events it fires and what events it cares about.
International_Cell_3@reddit
Discovery usually requires a duplex protocol and most event driven services don't have the notion of being both a source and sink for events. If you define a service such that it can always send and receive events then it's easy to add a "discovery" layer to each service, where they can first handshake before streaming events and include what events those services support.
The other option is to put a CRUD layer on top of the service, which is usually just nice for logging and management. So you can have your event stream doing its event streaming things while also having a REST API to query information about it (including metrics/telemetry/etc).
In the actual service implementation you have a method called
register_event_type(...)
or something that takes a description of the event, andsend_event(...)
needs to have an assertion failure if you try and send an event whose type was not registered so the programmer knows they fucked up when they debug in their test env.ptoki@reddit
log all calls. ALL.
Then run a query on logs and ask what called what. You will not get full coverage but you will get everything what actually runs.
But you need to code the logging.
seunosewa@reddit
Sounds like what a profiler does.
zamN@reddit
Seems like good tracing would solve this? Trace your emit calls and handlers
Reasonable-Steak-723@reddit
Totally. Do you have any ideas how this can be solved? I created an open source project called EventCatlog to help, but always looking at ways to make it better.
imdrunkwhyustillugly@reddit
There's AsyncAPI, which is basically OpenAPI for events. One could have some kind of automation based on reading such a spec from a feed - a lazy option could be to just have a snapshot test in the consumer that fails on any changes to the document.
For tracking consumers, (OTEL) logging/metrics that includes message contract type, version, consumer. Some libraries (f.ex. NServiceBus, but think hard before you commit to a vendor lock-in) has this built-in.
Also, some transport topologies use a single-topic approach, where all events are published one place, and then fanned out to subscribers based on filter rules. So in theory one could read consumers bsser on those rules alone, but the granularity of said rules could be very coarse (wildcard namespace filters, for example).
Cualkiera67@reddit
The ones it cares about should be in a single file called subscriptions or something.
The ones it fires, you can create a file called pubs that exports a list of names. Then all calls to publish should use one of them
sarhoshamiral@reddit
One option would be to put all events in the same namespace across the libraries and rely on completion to enumerate them including documentation.
That way you dont have to keep extra documentation around.
pauloyasu@reddit
as a former gamedev now working on enterprise bs development because it pays more, work less and is orders of magnitude easier, event driven is a breeze
germansnowman@reddit
Off-topic, but it really bothers me even as a non-native speaker: Can people no longer ask questions correctly? I see this all the time in Reddit titles. It should either be “Why are event-driven systems hard?” or “Why event-driven systems are hard” as a statement.
Plank_With_A_Nail_In@reddit
What bothers me is supposed intelligent people getting faux confused over perfectly understandable English sentences. There is no confusion over what was being conveyed by this Title. The article's content works for both a statement or a question.
I think its just dullards wanting to mansplain the conventions of the English language under the guise of the rest of us not know them, news flash we all fucking know already.
Learning the common conventions (there are no rules) of the English language might have been the highlight of your life but for the rest of us they are trivial and not something we get so excited over.
JMBourguet@reddit
Non native speakers are both more susceptible to make some kind of errors and more sensible to the errors. The first is obvious. The second is because we wonder if the erroneous structure isn't something correct but we don't know about and thus bringing a change of meaning.
thesituation531@reddit
Grammar exists for a reason.
And proper grammar makes that easier.
germansnowman@reddit
I appreciate good writing and would like to see a high level of literacy in our society. Go ahead with your ad hominems and the watering down of standards; I will not be a part of that.
thesituation531@reddit
I'm a native a English speaker, and it greatly bothers me too.
AvidStressEnjoyer@reddit
There is a surge of second language English speakers moving into dev with varying English language skills.
All I know is that they speak more languages than me and do so more capably.
nerd5code@reddit
I prefer “Does it be that event-driven systems do be hard, or doesn’t do be doing being?” personally.
nepios83@reddit
Interestingly, in Chinese writing, embedded questions are supposed to have a trailing question-mark. Thus, one would write: "Yesterday he asked me why I bought a new car?"
germansnowman@reddit
That is indeed interesting, thanks!
CherryLongjump1989@reddit
The sentence was grammatically correct and is perfectly fine English.
germansnowman@reddit
No, it isn’t. If you put the “are” after the object, it makes it a statement. If you want to ask a question, the “are” must go before the object.
CherryLongjump1989@reddit
I realized it immediately after but Reddit's delete function is broken. They must be using events.
germansnowman@reddit
Fair enough
CichyK24@reddit
Probably because for non native speaker the wrong order in "Why Event-Driven Systems are Hard?" sound totally fine (especially if you native language allows such order), and you could keep asking question like that for you whole (English speaking) life and no one bothers to correct you. Really, the only place where I was corrected about such wrong order was when doing Duolingo and translating Spanish sentences to English :D
seunosewa@reddit
At some point it should be incorporated into the grammar.
CherryLongjump1989@reddit
It already is - and was. The sentence is grammatically fine.
bunk3rk1ng@reddit
If it wasn't phrased as a question it wouldn't get any clicks.
tao_of_emptiness@reddit
It’s just a sort of editorial/colloquial shorthand for “reasons why x is hard.”
germansnowman@reddit
That makes it even worse, as it looks even less than a question.
drislands@reddit
It's especially egregious because judging by the username, OP is associated with the website in the link. So they wrote it right once, then fucked it up on Reddit. What the hell?
germansnowman@reddit
As I wrote elsewhere, I did check the website when writing my original comment, and it matched the title. I think it has been edited since.
ForgettableUsername@reddit
If you deliberately make a minor spelling or grammatical error the title of a post, a certain number of people will rush to be the first to correct you. This counts as early engagement and boosts the visibility of your post.
ptoki@reddit
I think it is one of side products of language popularity across many other cultures.
You have to accept it probably. It indeed was a surprise to me that even natives started to ask questions in that non question form. I just concluded that this is something english got from the world in exchange of being popular.
And if you understand this form then it means its working.
imdrunkwhyustillugly@reddit
A more illustrious title would perhaps be
NoInkling@reddit
I used to get annoyed by this too, but after experiencing what it's like to learn another language I just assume they're an ESL speaker and have become a lot more tolerant.
(I swear though, if someone talks about "web scrapping" one more time I might actually lose my sanity)
gyroda@reddit
Autocorrect and swipey keyboards on phones account for most of my typos. Often some very strange ones.
Fun side thing: one of the exam boards for the A level course in computing (OCR, in case anyone's curious) had a typo where they called it "disk threshing" rather than "disk thrashing". They were seemingly incapable of fixing this typo for years, as it would keep appearing in their exam papers over the years. I looked into it and the only people who were using the term were specifically making content for that exam.
germansnowman@reddit
I do understand that, but as an ESL speaker myself I feel I pay even more attention to English grammar than most native speakers. Not to say I don’t make mistakes, but I make a conscious effort not to import German grammar into English.
NSNick@reddit
The really hard rules are the ones native speakers don't realize are rules until they're broken. Things like:
GrinQuidam@reddit
The trick to English is all the rules are lies and if you understand what someone said, they're communicating correctly.
Properness is very static and does not accommodate the culture of language
HoushouCoder@reddit
Ironically, the actual title of the article is "Why are Event-Driven Systems Hard?" which is correct
germansnowman@reddit
I don’t think it was originally, I wish I had made a screenshot.
Immotommi@reddit
I think part of it is the fact that the statement is valid. People see the Why at the start of the sentence and think they need to include a question mark at the end
FullPoet@reddit
The level of literacy in the US (at least) is plummeting.
nemec@reddit
OP is not a native English speaker, either.
germansnowman@reddit
I expected as much.
OrchidLeader@reddit
If they have dyslexia, then yeah, it’s difficult knowing when they’ve swapped words around in a sentence like this.
I’m super paranoid about doing it and end up checking my wording several times, and I still sometimes get it wrong.
germansnowman@reddit
Fair enough. It seems to me though that most people never, ever check their titles.
RetiredApostle@reddit
Seems like a rhetorical question?
germansnowman@reddit
That does not matter – my point is that the grammar is wrong, rhetorical question or not.
Substantial-Reward70@reddit
That’s because you see languages as fixed rules that will be always the same, but it’s not, languages are constantly changing, people adopts terms and new words like trends and we even change the meaning of existing words, you adapt or be ready to be upset everyday listening/reading people…
I’m Spanish native so it’s the same with my language too…
atehrani@reddit
At my last job, this was the major hurdle.
Designers and PMs could not understand eventual consistency. They wanted to create UIs for a strongly consistent system (classic). These different paradigms do not integrate well.
notyourancilla@reddit
First question that pops to mind when I hear stuff like this is if product/design wanted to create something X why did engineering create Y?
Too often I see systems built based on what engineering wanted to create (distributed asynchronous messaging system) instead of what was needed (a simple crud app).
Head-Criticism-7401@reddit
Here it's the reverse. Engineering (me) wants to create a direct connection between the systems. Yet, some person in management has heard of event driven architecture, and now, we need to REWRITE our entire backend, and our 3 ERP systems for it.
The entire project is doomed, doomed from the start.
pelrun@reddit
There's a lot of "engineering created Y because product/design explicitly requested Y when actually wanting X" out there too.
grauenwolf@reddit
Where I work, the problem is that the Y in "product/design explicitly requested Y" is microservices, an event bus, and the top 3 product offerings from Azure or AWS>
I_AM_AN_AEROPLANE@reddit
Why does product / design have an opinion on how?! Thats insane.
CherryLongjump1989@reddit
Usually it's a knee-jerk reaction to incompetent engineering.
grauenwolf@reddit
Yes it us. But I work in the world of consulting, so the paycheck helps me swallow my professional pride.
nerd5code@reddit
I thought plaintext was one of the supported output formats? Though IDR whether that was a 2.0 addition or not, I guess, and anything whitespace-sensitive was extra-miserable to begin with.
grauenwolf@reddit
Plain text sure, but not 100% position sensitive plain text.
josefx@reddit
Are you confusing XML with HTML? Whitespace may not be relevant to the XML structure itself, but the parser wont randomly strip spaces from your data.
sleepless-deadman@reddit
Also, it's generating flat files... just write a custom function to pad/truncate and call that for the fields? I don't see what the inherent issue in using XSLT is.
The only thing XSLT won't care about is extra whitespace outside the tags in the source, and if you have to care about that, it's not even XML, so I could understand the issue there.
grauenwolf@reddit
CRUD is boring.
Asyncrosaurus@reddit
As soon as an Engineer starts a project with the phrase "wouldn't it be cool if...", expect an overengineered mess and colossal waste of dev hours to work on.
lemmsjid@reddit
Agreed. The limiting factor on a strongly consistent system is often (not always) cost. Because optimizing for cost adds complexity and slows down time to market, there should be a very clear negotiation with product on the decision making and tradeoffs.
Fiennes@reddit
See, this is why I like what Amazon does. You place an order, it confirms it after a brief check. Then, their back-end processes to their thing. If there's problems, you'll get an email about it.
atehrani@reddit
Agreed. Some websites do it well to the point where you don't notice it.
I tried to explain to them that e-mail is similar to an eventually consistent system. It just never stuck
throwaway490215@reddit
There are two paths towards "Senior engineer". Become irreplaceable, or learn how to put problems into words for others ~~to understand~~ to parrot without thinking about it.
RiverboatTurner@reddit
That's true for Senior Engineer without the air quotes. To be a "senior engineer" all you need is roughly 2.5 years of experience listed on your resume.
grauenwolf@reddit
My first job, other than some solo consulting, was as a senior analyst. I didn't need no 2.5 years experience.
gyroda@reddit
I feel attacked.
Tasgall@reddit
Please tell my manager(s) that 🙃
Cakeking7878@reddit
I think Walmart also does that but a while back I hit as issue where every time I placed an online order, it would place, then immediately cancel and send back to me with “there’s been an issue”. Sometimes I would like more of that up front processing to happen immediately so when I get a “it’s placed” message it’s actually locked in and not canceling randomly 20 minutes later with no explanation as to why
mattgen88@reddit
Amazons cart had a fun eventual consistency but for us a few months ago.
We had a large order of stuff pre tariffs. A bed frame for my daughter, some cabinets, bulk cleaners and what not. About 1k USD.
My wife went to check out. Pays. Comes back to the home screen and the cart was still populated as if she cancelled his order. So she tried again... 2k dollars later...
Few days later I'm flagging down the FedEx driver to refuse delivery of a second bed to try and get my money back because Amazon said they couldn't do anything about it.
Sweet_Television2685@reddit
opposite to my online food order, the platform confirmed restaurant started cooking, cancelled it later, turned out the restaurant had closed
some of those statuses are assumptions, end user wont know the difference
josefx@reddit
Getting a "payment confirmed" in the UI at the same time as a "your payment is fucked please fix" per email confused the hell out of me the first time I ran into it. Got the same result trying to "fix" it and gave up after several rounds. Turns out my card didn't have online transactions enabled, so no amount of "fixing" could make the transaction happen.
OneMillionSnakes@reddit
Yeah, sadly a lot of people want all the perks of eventual consistency, but are unwilling to accept any limitations.
rcls0053@reddit
People are so tuned to synchronous behavior that I'm currently working with a system where we use RabbitMQ for communication but somehow wrap asynchronous calls with sync RPC wrapper...
CherryLongjump1989@reddit
Because these two concepts have nothing to do with one another.
CpnStumpy@reddit
Seen people try this several times.
It's fucking asinine. It's always the dumbest worst thing ever and gets replaced by something shitty because even a shitty alternative ends up working better
CherryLongjump1989@reddit
This has to do with asynchronicity, it has nothing to do with eventing.
TwentyCharactersShor@reddit
I've had product people argue that you can make an async process synchronous. Something somewhere has to wait and no, i can't magic it to go any faster.
MarsupialMisanthrope@reddit
You can (and you can go the other way too), but you can’t fix the wait that’s the whole reason the call was made async in the first place.
I can do a lot of things in code, but instantaneous over the network ACID isn’t one of them.
MrBlackWolf@reddit
That's a very good point. Non technical people don't understand eventual consistency. Both users and business stakeholders. On the other side, engineering KPIs push for fast endpoints and high scalability.
troublemaker74@reddit
It's not horrible if you're using GraphQL (subscriptions) or listening to websocket events.
Careless_Detail_2318@reddit
To be fair, designers and PMs live off in some fairytale land of their own making and don't understand the practical side of things
rom_romeo@reddit
If I learned one thing about the UI and the eventual consistency, it could be probably summed up in this sentence: You can either lie and be fast, or “tell the truth” and be slower.
ZukowskiHardware@reddit
Live view solves that. What you are explaining is more a problem of JavaScript and react where you have to explicitly define every component that needs to update.
Fiennes@reddit
Javascript has nothing to do with it, I think you misunderstand the process.
pikapp336@reddit
That’s not how that works
CherryLongjump1989@reddit
Events ≠ message queues. He treats “event-driven” as if it’s a property of the infrastructure (“we have RabbitMQ → we are event-driven”). Wrong. TCP, pipes, sockets, whatever — they’re all asynchronous message systems. Eventing is just a way you choose to interpret messages.
Schema versioning is not unique to eventing.
You add/remove fields? That’s API evolution.
gRPC, REST, protobufs, JSON APIs all have the exact same problem. He’s smuggling a general distributed systems problem under the “event-driven is hard” banner.
Correlation IDs exist in RPC tracing, too.
The “string of calls vs. cut-up events” is just tracing in a fan-out system.
This isn’t an eventing issue, it’s any distributed system issue.
Failures, retries, DLQs. That’s queue semantics. They show up whether you call your messages “events,” “jobs,” or “requests.” Nothing event-specific here.
Idempotency. Same deal: RPC calls must be idempotent if retried. This isn’t eventing, it’s networking.
Eventual consistency. Again, not unique to event-driven. Any system with multiple data copies faces it. He’s acting like it’s an inherent tax of “event-driven,” when in reality it’s the tax of distribution.
Ok_Dust_8620@reddit
Agree - these problems aren’t unique to event-driven architecture. The point is that they become pretty much unavoidable once you choose events and this level of indirection between services. With a distributed system using RPCs, you can, for example, still have strong consistency if your database architecture supports it. So it’s more like: these are problems you’ll definitely encounter - not that other architectures can’t introduce similar challenges.
CherryLongjump1989@reddit
It does not make a difference if you are using an RPC or an event.
Ok_Dust_8620@reddit
With events, besides using backward-compatible schema updates (which aren’t always possible), you could also maintain multiple streams - similar to how we often support several versions of the same API, at least during the migration period until all clients are on the latest version.
Ok-Breakfast-3742@reddit
Not if you spend time to construct a state diagram to understand the system as the first step. I’ve done it plenty.
EasyBig9261@reddit
The first part about formatting is simply bullshit.. For example in Java, you can configure your object mapper in Java to not fail on extra fields.
sickhippie@reddit
That would be the "backwards compatibility" strategy he talks about right after it. It's not "bullshit" really. If a downstream consumer gets a field it doesn't expect, it won't know what to do with it. It can be configured to skip it, but that's not really "doing" something with it. If it expects a field that doesn't come in or comes in with the wrong data type, though? That's where the bugs really start to flow.
Rambo_11@reddit
They're not.
Workflows are hard.
_predator_@reddit
It's very rare to be event-driven and not require sagas, or is my perception just skewed? The very basic order shipping use case that people love to use for EDA demos would be a hot mess for everything but the happy path.
grauenwolf@reddit
I use events such as "Hey background process, wake up and go check the database. There's work to be done." or for sending pricing updates to a desktop application.
The idiots at my work want to use it for "I'm the UI and I want the first 10 customer records."
Few_Source6822@reddit
I'd draw a distinction between "require from a technical standpoint to ensure sane transaction management" and "required as a way to ensure we are able to consistently present a clean user experience that matches their expectations and doesn't lead to us needing to support the consequences of downstream problems with our support teams".
In my experience, having worked at companies both small and large, you might be surprised at how many organizations simply don't even bother with things like sagas or two-phase commits as a way to build distributed systems and instead just... kind of wing it. In my experience, plenty of organizations just kind of wing it and are happy getting the benefits of the looser coupling between systems without dealing with the mess of consequences that come with not fully managing those interactions sanely. Sometimes just getting your teams to be more autonomous and not dead end your user with an ugly error is good enough over making sure that what you're presenting to them is actually correct.
I'm not defending it.
markoNako@reddit
So they would just let the systems continue to work without consistency guarantee? I wonder in such cases wouldn't that bring some serious bugs and issues in the application? I assume that also the type of work the app is doing is also very important ( in finance and healthcare that would be disaster) compared to something else where mostly availability is important but even then it's hard to imagine for me how that actually works
ptoki@reddit
Sometimes good enough and we will tackle this if it becomes a problem works well enough that nobody cares.
Because the issue may happen just 3 times a year and with all the other issues it will be 30 times a year, fixable by human.
The extreme case is like skip the dishes or uber where it seems the edgecases and unexpected scenarios happen in like 30% of times...
Few_Source6822@reddit
It sure can. Not every bug or problem is as reputation damaging as the example you laid out, like a bank not properly recording your paycheck being deposited or a doctor's cancer diagnosis and notes not being added to your chart such that your regular doctor can coordinate with your oncologist.
Fact is, if you've got a product that people want to use, they'll actually tolerate more problems than you might think. I've seen companies literally factor in error rates and customer churn into their business model over problems that at their core could be addressed by more robust distributed transaction handling, but it just made more sense to prioritize other work, or it was too hard/time consuming to build up staff to learn how to do more advanced handling.
That's what customer support teams that issue credits/refunds are for. And ultimately, for many businesses they know they're going to need them anyway so they'ld rather just use them and focus on other things. Sometimes if the problem is bad enough, a dev or two gets tagged in to build a more specific list of impacted users and a sense of the impact to help fix it.
Things like sagas are hard not just because they're a more advanced engineering problem, but often times because what you actually need in your saga is happening between teams, and that coordination is not obvious for many organizations out there.
Deep-Thought@reddit
I think there's an argument to be made that there are some cases where using sagas/orchestration slows you down enough that given the tiny amount of affected requests, it can make business sense to just swallow the financial impact of any paying back for any errors instead.
Few_Source6822@reddit
Oh for sure.
The example I was thinking of was a company that knew that it should but simply didn't/couldn't because coordinating between teams was too difficult. I suspect that's often the more common reason why that doesn't happen.
BosonCollider@reddit
You can use a message bus with transactional semantics to simplify the error handling in some cases, especially if your scale is small enough that you can just use something like pgmq and use postgres for both queues and relational data.
Alternatively if your language has a good concurrency story you can have a big coroutine procedure do the whole thing instead of breaking it up. The trend in most programming languages has been to replace event driven programming with breakpoints in "normal" synchronous functions. Imo something similar will eventually happen to EDA on top of a broker, apache pulsar has a really nice concept of pulsar functions for example.
ptoki@reddit
Not really.
The key is usually either an arbiter (single entity solving the collisions/conflicts) or a form of subscription where even if something is missing now it will be delivered/created later and the flow will be able to continue.
Just extra steps but not locally in code but somewhere else.
The challenge is in predicting if the used flow/technology can handle all the edge cases or limiting those. Which is usually a non coding problem and just requires some businessman beating.
RetiredApostle@reddit
Sagas for sagas are harder.
scruffles360@reddit
We solved this problem in a unique way: services are configured to receive messages by specifying a target (usually sns) and a graphql subscription query. Each service is getting their own data format as requested. We can consult the configuration when making api changes to see which apps would be affected. Haven’t seen any problems since we launched it at least 5 years ago
drislands@reddit
OP, why did you change the title to be grammatically incorrect for the reddit post when it's correct in the article?
farsightxr20@reddit
Every system is event-driven. At the OS internals level, it's all events in the form of messages to/from hardware devices (keyboard, network, etc.).
On top of these low-level events we build higher-level abstractions based on semantic relationships between events. Good abstractions simplify reasoning and information flow in the majority of cases, e.g. you don't need to think about the TCP handshake process or congestion control when you request a file from the network, it's all just one higher-level fetch operation. There will always be niche cases that benefit from lower-level control, which requires breaking the abstraction and ideally, introducing a new purpose-built abstraction so that complexity doesn't proliferate through the entire system.
The mistake I see most often is people starting with events and never building any higher abstraction (massive spaghetti). An "event-driven" architecture is often just a euphemism for "no architecture".
davidalayachew@reddit
They aren't hard, they just scale in complexity about as well as they scale in performance. Imo, they're just completely over-valued as a solution for performance/throughput problems.
Event-driven systems exchange simplicity for throughput/performance, like the article said. Several things that you get "for free" in a Strongly Consistent setup, you have to either abandon or recreate in an Eventually Consistent setup.
The problem is, people see the pretty performance numbers of Eventual consistency, then assume that the cost of abandoning or recreating some of the necessary benefits of Strong Consistency is small in comparison. It's not, and the cost shoots up very quickly. Even moreso when you are distributed.
The article lists an example -- the concept of a Correlation ID. This is an example of recreating the benefit you would get from a simple stack trace (to use Java terminology) if you were Strongly Consistent.
And while implementing and enforcing a Correlation ID is quite easy, weaving all of the relevant events with the same Correlation ID together into a single tree view (again, recreating a benefit) can range from non-trivial to quite difficult. It's not just
SELECT * FROM EVENT_TABLE WHERE CORRELATION_ID = '123'
. It's also being able to identify the parent-child relationship between each task that causes things to be messy. Identifying the parent-child relationship with Strong consistency is almost free.So, again -- it's a game of tradeoffs. It's just that the costs are not that obvious, hence why I think this programming style is overblown. People get into it for genuinely good reasons, make bad estimates about the costs until later, and then it's the sunk cost fallacy until things become untenable.
Imo, event-driven systems are at their best when the Cartesian Product between possible type of events and possible queues is "low".
For example, in most UI Frameworks, there is usually an event queue, which is a single queue that processes all user interactions for the entire GUI. Cool, 1 multiplied by X is X, so as long as you don't have too many of X (different types of events), then this gives you both good performance and a relatively simple user model.
Alternatively, if your situation demands many events and many queues, then using a State Transition Diagram to model your whole system's state, where certain events can ONLY originate from one system state, makes even a giant number of events and queues not too hard to wrangle.
To explain it in simpler terms, you can actually have many queues and many events, but as long as they are siloed off such that only ABC-related Events touch ABC-related queues, you can keep the complexity quite low. That's because you'd be summing up the Cartesian product of each "domain" (in this case, ABC). And if the sum total of all those Cartesian products is still "low", then you're golden. Just beware crossing the wires. Once you have too many couplings, it's not the sum of 2 Cartesian products anymore, it's just one big one that you need to consider. That's because these 2 domains are no longer separate, but 1 kind-of-coupled jumbo domain
So again -- it's all about tradeoffs. Just know that it's not a silver bullet for your performance problems. Use it only if you know that you can avoid the costs of it easily, even far into the future.
Spitfire1900@reddit
The place I’m working at now originally picked up queuing because there was poor support for HTTPTimeouts and async http calls on Java 6
CopyEdits@reddit
How to grammar?
Immotommi@reddit
Statement starting with why is what?
NightlyWave@reddit
Qt’s signals and slots mechanism deal with many of the issues discussed in the article (e.g. signal signatures declare argument types and any mismatches are compile-time errors) for C++ and Python.
Curious if there are any JS frameworks out there that use this mechanism?
VictoryMotel@reddit
Why this thing that not true?
WhyJustSlightly@reddit
skill issue?