Ditch your (Mut)Ex, you deserve better

[-]

Digitalunicon@reddit

feels less like “ditch mutexes” and more like “ditch lazy concurrency.”

[-]

Peppy_Tomato@reddit

Good read, but just in case you didn't notice, the account balances in your system cannot be Zeroed out. Users can't close their bank accounts without leaving behind a penny 😉.

! Meant to be light-hearted. A trivial observation. !<

[-]

falconfetus8@reddit

THIS is the kind of content I want to see more of on this sub!

[-]

Mutexes scale incredibly well. In fact, all other solutions are usually worse when you benchmark them at scale. If your state is so large a mutex isn't appropriate you're at the point you need the ability to scale horizontally, at which point you need a database or message queue.

It's no surprise one of the key things that hyperscalers have that you don't are distributed locks.

[-]

poelzi@reddit

[ ] you used NUMA systems [ ] you used multi core systems > 100 cores [ ] you understabd lockless design patterns, message passing, etc

[-]

trailing_zero_count@reddit

Mutexes absolutely do not scale incredibly well. Wait-free atomic implementations of data structures absolutely destroy mutex implementations past even a relatively small number of threads.

To be clear, I'm talking about in-memory, in-process mutexes. If you're talking about something else (a "distributed lock") then fine.

[-]

tsimionescu@reddit

Atomic reads&writes typically scale quite poorly with contention, because they require a busy loop to retry. So if you have 64 threads trying to update the same memory location on a 32-core processor, it will typically be better to have a mutex than to have all of the cores stuck in a busy loop trying to do a CAS update.

Conversely, if you have low contention (read-heavy scenarios with only occasional writes) then a mutex will bring much more overhead than doing an atomic read and the occasional CAS loop in the writers. So this is very much use-case dependent.

[-]

trailing_zero_count@reddit

There are plenty of lock-free algorithms that don't require any retries. If you use fetch_add to get an index, for example, you're guaranteed to have a usable result when it returns.

[-]

International_Cell_3@reddit

If you use fetch_add to get an index, for example, you're guaranteed to have a usable result when it returns.

Unless the buffer you're indexing into is full. In fact fetch_add is not the ideal way to implement a lock-free fifo, which is only wait-free if you can tolerate messages being dropped (or overwritten).

[-]

trailing_zero_count@reddit

Your assumption that the wait-free FIFO must be bounded is outdated. Please read https://dl.acm.org/doi/10.1145/2851141.2851168

Spinning can be avoided by suspending the consumer asynchronously (`co_await chan.pull()`) if there's no data ready in the slot after fetch_add is called. https://github.com/tzcnt/TooManyCooks/blob/main/include/tmc/channel.hpp#L1216

[-]

International_Cell_3@reddit

An unbounded queue cannot be wait-free except in the academic sense.

[-]

trailing_zero_count@reddit

I'm tired of arguing with you. You are making absolute statements with nothing to back them up. This will be the last time I respond to you.

I assume you're not talking about the consumer side, because if the queue is empty, you're going to have to wait *somehow* - whether that be sleeping the thread, suspending a coroutine, spin waiting, or returning and checking again later.

On the producer side, it's pretty easy to make it wait-free. Starting from the top level call:

First, you fetch_add to get your write index.
Then you find the right block (only needed if the latest block has moved ahead since you last wrote). If you need to allocate a new block, races against other producers are resolved with "if cmpxchg" and not "while cmpxchg".
Then you write the data.
Finally you mark the data as ready. If the consumer started waiting for you during the operation, you get the consumer instead. Once again this uses "if cmpxchg".
If you raced with a consumer during the last step, you wake the waiting consumer now.

There are absolutely no waits, spins, or sleeps during this operation. It is guaranteed to complete in a fixed, countable number of atomic operations.

[-]

International_Cell_3@reddit

An unbounded queue cannot be wait free because memory allocation on practical systems is not wait free unless the allocator itself is bounded.

If you need to allocate a new block, races against other producers are resolved with "if cmpxchg" and not "while cmpxchg

new is not wait-free.

[-]

trailing_zero_count@reddit

You're moving the goalposts here because this discussion is about mutexes vs atomics. How does using a mutex help solve this problem?

Your original comment was about "hyperscalers" and now you've switched to talking about domains where "being wait-free actually matters" - embedded and/or HFT. In those domains you won't be using a mutex either.

[-]

International_Cell_3@reddit

This depends heavily on the workload but in general, lock-free algorithms are worse in aggregate than mutexes for even moderate amounts of contention.

"Wait-free atomic implementations of data structures" are rare and hard to implement correctly (usually backed on assumptions like magic allocation/deallocation or treating garbage collection as free). Even a wait free queue is rare due to the need for backoff and retry when the queue is full (or using linked lists to handle unbounded queueing). All of this is complex and does not "destroy mutex implementations."

All modern mutex implementations are essentially free when uncontended with a single syscall when there are pending waiters, which is very cheap compared to the minimum number of atomic instructions and cache thrashing of lock-free data structures.

[-]

trailing_zero_count@reddit

Having personally written a decent number of lock-free data structures, I'm not sure how to respond to this.

Mutexes always try an atomic lock first, which would cause the "cache thrashing" you are talking about. But then they fall back to a syscall and then the kernel often has to do a spinlock too, under high contention. Ever heard of `native_queued_spin_lock_slowpath`?

For example I maintain a set of benchmarks for high-performance runtimes. https://github.com/tzcnt/runtime-benchmarks I'll give you one guess which implementation uses a mutex for its internal task queue... (hint: you'll need to scroll the Markdown table to the right)

In my opinion, the most appropriate usage for a mutex is if you need to execute a complex operation as if it were atomic. So if you need to read and write multiple fields without anyone interfering, and the data set is too large to fit into a single variable, then that's a good time to use a Mutex. Because this operation *cannot* be expressed in a wait-free manner.

[-]

lelanthran@reddit

Wait-free atomic implementations of data structures absolutely destroy mutex implementations past even a relatively small number of threads.

Aren't the implementation of mutexes in things like pthreads done by first attempting acquisition on a wait-free lock?

[-]

trailing_zero_count@reddit

If you fail to get the lock and you have to spin, then syscall and sleep, that's not wait-free.

Wait-free is something like fetch_add, which is guaranteed to return a usable value after it returns.

[-]

Confident_Ad100@reddit

Mutex scales well enough for redis, DynamoDB and Postgres.

Yeah, it would be great if you can use atomic data structures, but they have limitations and don’t compose well.

It would be even better if you don’t have to block at all 🤷

Usually you need a mutex when you are in a distributed system. There is a reason why you don’t see people use mutex on javascript.

[-]

sammymammy2@reddit

Mutexes are annoying because they don't compose. You need a total mutex acquisition order to avoid deadlocks, hooray.

[-]

syklemil@reddit

Unlike a bathroom key however, mutexes are only conceptual locks, not real locks, and as such they operate on the honor system.

If the programmer forgets to lock the mutex the system won't stop them from accessing the data anyways, and even then there's no actual link between the data being locked and the lock itself, we need to trust the programmers to both understand and respect the agreement. A risky prospect on both counts.

NB: This again varies by language. I'm pretty fine with the way mutexes work in Rust, where they wrap the value, something along the lines of:

{
    let mut balance = account.balance.lock().unwrap();
    let has_funds = balance >= amount;
    if has_funds {
        *balance -= amount;
        true
    } else {
        false
    }
} // lock goes out of scope here, unlocking the mutex

or, since locking the balance might fail, it might be possible possible to do something more monadic along the lines of (where I'm inventing some methods and ignoring the type signatures of real methods, e.g. checked_sub is real but doesn't modify anything, etc)

account
    .balance
    .lock()
    .and_modify(|mut balance|
         balance.checked_sub(amount))

There's also the RwLock construct which, like the Rust reference rules, allows multiple read-only locks XOR one write-lock to exist.

My only problem with it is that IMO the mutex terminology is kind of backwards for this kind of API, as the value is normally locked away in the mutex and protected from reads & writes, and I wind up thinking of it as needing to unlock the mutex to access the value within.

Or, my second problem with it is that it's pretty habit-forming, so going back to something like Go-style mutexes feels like some sort of joke, where they've only vaguely understood the assignment and seriously botched it.

[-]

ToaruBaka@reddit

I really like Rust's Mutex implementation (well, poisoning is weird), as it draws a distinction between a data mutex and a critical path (or code) mutex.

The good old non-value-wrapping mutex's purpose was to prevent access to a region of code by more than one thread, whereas a data mutex's purpose is to prevent concurrent access to data. You can obviously use a data mutex as a code mutex, but as the default mutex is a data mutex, it encourages much finer-grained locking (which has its own problems, but tends to be easier to reason about IMO).

[-]

syklemil@reddit

(well, poisoning is weird)

I swear I've seen some github issue about non-poisoning mutexes, though I get the impression that's gonna be a tradeoff around which sort of error situations you'd rather deal with

[-]

ToaruBaka@reddit

Yeah, there are tradeoffs to poisoning. It's nice that you can get insights into whether the lock is borked due to a crashing thread, but most of the time there's no way to recover from that state anyway (unless you built your algorithm with poisoning in mind from the start).

[-]

levodelellis@reddit

IMO if you lock (outside of a threading library) your code is wrong. Same with atomics. The codebase I'm currently on has a nice way to do threads, but I'm not sure if it drives most people crazy. There's no 'async' function, there's a (thread safe) work object you pass around and put messages into. It's 'easy' because the interface and how everything works is straightforward, but it's 'hard' because there's no await/a way to say wait until X message is done.

[-]

lelanthran@reddit

There's no 'async' function, there's a (thread safe) work object you pass around and put messages into. It's 'easy' because the interface and how everything works is straightforward, but it's 'hard' because there's no await/a way to say wait until X message is done.

That thread safe work object doesn't have a wait() method?

The codebase I'm currently on has a nice way to do threads, but I'm not sure if it would drive most people crazy.

I have a safe way to use threads - a thread-safe queue: any code, from anywhere, at any time, can thread-safely insert work items (i.e. a pointer to an object) into the queue, and a bunch of threads remove (in a thread-safe manner) pointers from the queue, operate on them and then caller a destroy or similar function.

Absolutely fearless concurrency!

[-]

levodelellis@reddit

That thread safe work object doesn't have a wait() method?

No. Just a sleep for incoming messages

I have a safe way to use threads

Erlang figured it out 40yrs ago :P
This lib isn't exactly that but I'm not going to complain about something that has never gotten in my way and never caused a concurrency problem

[-]

Primary_Ads@reddit

this may be a good article but i will never know due to the awful CSS making ut illegable to me

[-]

Potterrrrrrrr@reddit

Is that a colour blind thing? The yellow on black is an odd choice but reads fine to me

[-]

Primary_Ads@reddit

yes

[-]

guepier@reddit

Interesting, the WebAIM Contrast Checker gives the chosen colour contrast the all-clear (same for the code snippets; only level 2 headings fail).

[-]

gpfault@reddit

I think STM doesn't get a whole lot of love because there's not a lot of situations where STM is preferable to just putting all your shared mutable state into a database. Most applications need their shared state to be persistent anyway so STM isn't even an option.

[-]