been staring at this volatile keyword stuff for hours and I genuinely cannot figure out where my understanding breaks

[-]

vowelqueue@reddit

A common mental model is that each thread executes its instructions sequentially, but that when multiple threads are active those instructions can be interleaved or happen at the same time in unpredictable ways. This model can help to explain why you need synchronization, but is not accurate.

Rather, the mental model you should have is that without the use of volatile or other synchronization, there is no temporal relationship between what happens on one thread and what happens on another.

For example, say one thread has two non-volatile variables “a” and “b”. It increments variable “a” and then increments variable “b”.

You might think that if you read those variables from a different thread, you might see neither of them incremented. Or you might see “a” but not “b”. Or you might see both. But in actuality you might see “b” incremented” but not “a”.

The use of volatile does something very important: not only does it make data updates visible to other threads, it makes the ordering of instructions that you expect in a single thread visible to other threads.

[-]

teraflop@reddit

I'm not really clear on what your actual question is.

it seemed straightforward at first just forces reads from main memory instead of cache simple enough.

Sorry, but this is a common misunderstanding. volatile is not about memory versus cache at all. Modern CPUs ensure cache coherence which means the data in the CPU caches is guaranteed to be consistent with the data in memory, even when multiple CPUs are concurrently accessing memory.

volatile is really about code generation and instruction ordering. Firstly, it tells the compiler to generate code that actually performs a memory access every time you reference the variable. So for instance, if you read the variable twice, the compiler must actually generate two instructions that load that memory address, instead of just loading it once into a register and then using that register twice.

And secondly, a volatile access usually generates a "memory barrier" instruction which tells the CPU not to try to optimize the memory access by re-ordering it relative to other instructions in the pipeline. (Like I said, this doesn't have anything to do with the cache, because whenever the CPU does actually issue the memory access, cache coherence ensures that both the main memory and the cache are kept in sync with each other.)

I tried tracing through a two thread example on paper and i still couldn't figure out exactly where my mental model is off.

If you don't say exactly what the example was, and what specifically you had trouble understanding, I'm not sure how to help you with this.

I know i++ isn't atomic but i still expected visibility guarantees to make things more predictable across threads and apparently that assumption was completely wrong.

volatile does make things more predictable, because with volatile you can rely on the inter-thread ordering guarantees that the Java memory model provides, and without volatile you have practically no guarantees at all. But that doesn't mean that just adding volatile makes your thread safety problems go away.

Bear in mind that most of the time, programmers shouldn't need to use volatile at all. They can just use higher-level synchronization building blocks (mutexes, condition variables, queues, monitors, AtomicIntegers, etc.) which have more understandable behavior. If you're using volatile, that's a sign that you're trying to do something trickier than normal.

So instead of trying to understand all the possible ways volatile can affect a program's behavior, I would suggest you learn about the algorithms that are used to implement synchronization primitives. You'll see that the algorithms only work if memory operations are performed in a specific order, and that's why volatile (or something like it) is needed to implement them correctly. Most operating systems textbooks will discuss synchronization primitives, e.g. Silberschatz's Operating System Concepts.

[-]

pixel293@reddit

The CPU pipeline (and compilers) assume that only 1 thread is accessing a piece of memory. Volatile tells both this is not true.

Synchronized also force both to get the values from memory or at least the CPU cache, otherwise you might not see the new values.

Volatile in C is also used if the hardware is updating a memory location.