Learn concurrency - a deep dive into multithreading with Python
Posted by pmz@reddit | Python | View on Reddit | 10 comments
The article explains concurrency in Python including topics like multithreading, multiprocessing, race conditions, and synchronization mechanisms such as locks. It then takes a deep dive into switching off GIL to enable *real* multithreading in Python, highlighting the differences, the benefits and the gotchas with clear code examples.
https://blog.geekuni.com/2026/04/python-concurrency.html?m=1
tedivm@reddit
If you're looking for an easy way to handle multiprocessing I have a library, QuasiQueue, that is both simple and powerful.
gdchinacat@reddit
Looks pretty cool, thanks for sharing!
I was a bit surprised that reader and writer are limited to str|int. Consider making QueueRunner a generic class so it can be used with whatever type the user wants (within reason...you need to pass values across process boundaries so obviously not every conceivable type will work).
Also, the reader()/writer() naming and docs were confusing. The typical name for this type of function is consumer/producer. Also, writer() is not 'responsible for adding new items to the queue', it just produces an iterable whose items are enqueued.
These are both documentation and semantic issues that can be cleaned up without changing the design in any way. Overall it looks great.
As for design, I'm curious if you considered using ProcessPoolExecutor (https://docs.python.org/3/library/concurrent.futures.html#processpoolexecutor) for the process management? If so, I'd be interested in why you chose not to use it? I have very limited experience with it (I built a service on top of it but someone else built the service framework so I didn't directly use it). Specifically, are there any gotchas or limitations that wouldn't have worked for your use case?
busybody124@reddit
Really nice writeup. Out of curiosity (maybe I missed it) why do you switch to a thread pool executor in the last snippet as opposed to manual thread management. I understand how the API is more ergonomic but is it actually needed for the solution?
pmz@reddit (OP)
ThreadPoolExecutor is modern way of managing the threads and makes it easy to collect return values
gdchinacat@reddit
Even with the GIL it is not safe to do += on shared variables. The issue is the global is loaded onto the stack, incremented, then stored back to the global. The GIL can be released between any of these steps and if the code that executes in the meantime does any of these steps the value will not be what is intended.
LOAD_GLOBAL copies the shared state, BINARY_OP increments the value, then STORE_GLOBAL updates the shared state with the value. If thread A does a LOAD_GLOBAL, then the GIL is released and thread B does the same they will both increment the same value and both STORE_GLOBAL back and an increment will be missed.
saucealgerienne@reddit
the thing that tripped me up early was thinking threads would help with CPU-bound work. GIL makes that basically useless in CPython. once I understood the I/O vs CPU distinction most of the time asyncio just ends up being the cleaner choice for what I was building.
TheseTradition3191@reddit
Worth adding asyncio to the picture since the article focuses on threading/multiprocessing. For I/O-bound work - HTTP calls, DB queries, file reads - async/await handles thousands of concurrent operations with a single thread and zero locking complexity. Less conceptual overhead than threading, more predictable than managing a process pool for network-heavy code.
The rule of thumb I use: asyncio for I/O concurrency, multiprocessing for CPU parallelism, threading mostly for legacy code or C extension interop where you can't easily go async. The GIL-removal story is interesting but for most application code the async path is the right defualt for concurrency and you don't have to think about memory bandwidth ceilings at all.
Maggie7_Him@reddit
The memory bandwidth ceiling is real — hit it doing parallel screenshot capture across 50 browser instances. CPU sat at 15% but RAM bandwidth was maxed. Switched from 50 threads to a process pool with 8 workers and throughput doubled.
Ha_Deal_5079@reddit
free-threading is dope but the single thread perf hit keeps me on the default build for most things. maybe 3.14 will bridge the gap
quant_macro_daily@reddit
Good timing on covering GIL removal, it's worth noting that even with the GIL disabled (Python 3.13+), most CPU-bound workloads won't automatically see linear scaling. The bottleneck shifts to memory bandwidth and cache contention between threads pretty quickly.
For pure CPU parallelism,
multiprocessingwith shared memory (multiprocessing.shared_memory) is still the more predictable path on most workloads. Threading shines most when you're I/O-bound or waiting on external calls, which is probably 80% of real-world Python use cases anyway.