Atomic Idempotency: A Practical Approach to Exactly-Once Execution

Posted by ymz-ncnk@reddit | programming | View on Reddit | 16 comments

[-]

aka-rider@reddit

The problem is stated correctly, but the solution is incorrect.

The only question one needs to ask is, "What would happen if the server was struck by lightning?" Go or not, lightweight or not — it won't work atomically, $100 withdrawal will be made.

The Saga pattern (not in this article, but in general) is an even worse solution. If the server goes off during the rollback stage, it's a disaster, and in between, you're left with the nasty split-brain and Byzantine Generals problem. The server might have done the job and committed the results, but the ACK to the client got lost.

The simplest solution we have at the moment is ACID, and it relies on ARIES who is interested.
RAFT consensus, and friends try to solve the problem for multiple nodes.

[-]

ymz-ncnk@reddit (OP)

Edit: for application-specific durability one may use a journal similarly to DBMS

- I’m going to withdraw 100$ (begin transaction)

- I have withdrawn 100$ successfully (commit)

If the second step is missing, not much you can do, maybe you have withdrawn and failed to store the result in the journal or maybe the transaction failed you can at least tell that the transaction was incomplete, and apply application-specific recovery step.

If by journal you mean a distributed log (otherwise it’d get hit by the same lightning as the local DB), the service becomes responsible for business logic, idempotency, and durable result persistence. For each operation it must:

Check the log to see if it should run.
Write its intent.
Write the result.

That’s a lot of interactions with the log — expensive and slow for a single operation.

An alternative approach is to make the service purely idempotent and delegate durability to the caller (for example, an orchestrator). This keeps the service simple and fast, without requiring it to interact with the external system.

Another case of using idempotency is when the service polls possibly repeated events from a log. In that scenario, it can rely on a local DB to avoid executing the same operation twice.

[-]

aka-rider@reddit

I don't want to discourage you from finding your own solutions, nor from writing about them, by the way. When people confront my blog posts, I treat it as an opportunity to learn something new about engineering, or perhaps about writing.

Regarding the problem, I advise you to read about the Byzantine Generals problem.

When you make the call to withdraw $100 — whether with a central database or not — at this point, you are dealing with distributed consensus.

Now, the server that you made the API call to didn't respond. What has happened?

It failed to withdraw.
It withdrew but failed to respond.

Central DB or not, you don't know.

So, if this server is idempotent, you don't need anything else — just retry.

If it's not, your central DB doesn't change anything.

[-]

ymz-ncnk@reddit (OP)

I'm already aware of the Byzantine Generals problem, Paxos, and Raft — thanks.

There are no alternatives to consistency.

Absolutely agree. This problem should be effectively solved by the distributed database or log.

What we’re really discussing here is which component is responsible for writing data into that storage. There are multiple valid approaches, for example:

The services themselves (as you suggest).
An orchestrator.
Or even the user side (when services consume messages from a broker).

In all these cases, idempotency remains a key property and can be handled in different ways.

[-]

aka-rider@reddit

>What we’re really discussing here is which component is responsible for writing data into that storage.

It's not an architectural choice.

If "Withdraw 100$" is atomic (same transaction, same consensus) or idempotent — no additional solution is required. Or any solution would work for that matter.

But from the post itself "Withdraw 100$" is happening outside of the platform, no solution would make it consistent, not ACID DB, not Saga — it must be a part of the distributed consensus, or it will be inconsistent.

[-]

ymz-ncnk@reddit (OP)

Let’s look at an example: 1. We send a “withdraw $100” message to the message broker. 2. The broker appends it to a distributed log (which already uses consensus internally). 3. The service polls and applies the message from the broker. With atomic idempotency, it guarantees the same message won’t be applied twice. 4. If the service’s data is lost, it can simply re-poll and reapply the messages from the log.

Within this flow (broker + service), consistency holds — the service’s state is always reproducible from the log, and consensus is handled by the broker.

[-]

aka-rider@reddit

You either implement all in one ACID db, or you using one of the consensus protocols (see CAP theorem), or you are using application-specific recovery logic, e.g. withdraw 100 API must be also idempotent, and every other API call too.

There are no alternatives to consistency.

[-]

ymz-ncnk@reddit (OP)

This approach uses a single transaction to check whether the Idempotency Key exists and, if not, update the business data and store the key.

If the server crashes, the transaction either fully commits or rolls back, ensuring atomicity.

Such atomic idempotency can be considered a building block for a safe Saga pattern because operations can be repeated without causing duplicates.

[-]

aka-rider@reddit

>If the server crashes, the transaction either fully commits or rolls back, ensuring atomicity.

If the server is fried by lightning, all data is gone, and you have no idea what had happened to the "withdraw 100$" transaction.

[-]

ymz-ncnk@reddit (OP)

In that extreme case, Disaster Recovery (DR) must be performed first to restore the data state (for example, using a distributed log or an event store). This is an architecturally separate task from transaction handling.

[-]

aka-rider@reddit

That is my point exactly.

The problem is stated correctly, the proposed solution is incorrect.

[-]

ymz-ncnk@reddit (OP)

Could you elaborate on why the proposed solution is incorrect, or suggest the preferred approach for handling this scenario?

[-]

ymz-ncnk@reddit (OP)

> The server might have done the job and committed the results, but the ACK to the client got lost.

Exactly — if the operation is idempotent, a lost ACK isn’t a problem. You can safely retry once the server is available, and atomic idempotency ensures the operation is applied only once.

[-]