I Replaced Redis Locks with Database Atomicity and You Should Too
Posted by soap94@reddit | programming | View on Reddit | 43 comments
Posted by soap94@reddit | programming | View on Reddit | 43 comments
Simple_Horse_550@reddit
Cache stampede checks….
PiotrDz@reddit
Wouldn't redis locks be faster?
fkukHMS@reddit
OP has no idea what they are talking about. Both solutions are fast, both are highly scalable (redis more, but irrelevant unless OP is in a FANG sized company), neither will save OP if they don't know how to implement a basic distributed lock.
Old_Pomegranate_822@reddit
I find it hard to believe that the time difference is measurable when compared to the time taken to run the jobs they were processing. You don't win anything by shaving a few milliseconds off a job that lasts a minute
PiotrDz@reddit
Yeah but the title says "replace the locks", so there is one important factor to consider.
pdpi@reddit
"Faster" only matters insofar as you're not sacrificing basic correctness.
PiotrDz@reddit
What do you mean by "correctnes". Redis lock behaves similarly to DB lock. Maybe it is not stored on disk in consistent manner, but when connection to Db/redis instance is lost lock is effectively lost. I don't think that the consistent disk storage is so useful.
pdpi@reddit
Correctly implemented, redis locks are fine. The point is that it's more complexity, and more things to fuck up. If you're synchronizing access to one service (the DB) by using another service (Redis) instead of using the first service's own sync primitives, you're adding a whole bunch of new failure modes you didn't have to begin with.
Treebro001@reddit
This should be the core takeaway of the entire thread I'd say.
PiotrDz@reddit
I don't think it matters. If you want your locks to be also a part of db transaction then it is a different problem. I am thinking about locks in more generic term : synchronising access between clustered instances of an app.
TldrDev@reddit
Vibe coders in shambles.
dpark@reddit
I agree the lock in SQL is the better option, but I still want to understand where the bug was in the Redis solution. That flow looks like it should work. How did they have two workers processing the same task if only one lock succeeded?
“The winner sometimes processed tasks that were already being handled”
This implies at least two winners.
ProtectHerEgao@reddit
It sounds like their main issue is managing distributed locks rather than an issue with redis itself. If I had to guess, their redis is sharded while their database is not which leads to these issues. The author mentions distributed locks at some points. If their database is sharded, I would imagine there to be similar issues.
If their redis is sharded, they need to use Redlock or another distributed locking mechanism instead of just writing it to the master shard. Replication lag or failover situations might cause the lock to be acquired by two processes.
Their ghost lock issue can be easily fixed by setting an TTL on the key. Something that redis supports natively.
I also have some doubts about putting higher loads on the database especially some high frequency like locking.
Databases are not magic and redis isn't incapable. Things just have to be designed properly.
non3type@reddit
I generally agree with everything you’re saying but I can help but think that if you need an acid compliant transactional database you start with a traditional rdbm. This felt like they chased the new shiny without any awareness of its strengths/weaknesses going in.
IQueryVisiC@reddit
If you want ACID, at the start of your project, do you set isolation of the RDBMS to "serializable" or what is really ACID. Is snapshot or any of the weaker forms ACID? Do they behave like in OPs scenario? What is a practical example of the effect of isolation? Does serializable lock the whole db, even other tables and records?
GergelyKiss@reddit
That's a very good point but doesn't matter in OPs scenario because the usage is so simplistic: one update, no reads, on one specific row (hopefully identified by its PK even).
Whether a DB engine supports ACID and how it does exactly is vendor-dependent, but all the ones I've seen so far would do an update like this atomically, even with the most lenient isolation level.
Real problems start when there are selects and inserts, and lots of them in one transaction...
IQueryVisiC@reddit
two inserts (on different tables) are the default example for a transaction to make sure that the amount of money in a bank stays constant ( invariable? ) and accounts don't drop below 0 . Isolation level can be changed in MS SQL server. It does not depend on the vendor=Microsoft. Row lock yeah, seems to be just a performance optimization.
ZirePhiinix@reddit
This is typically how most problems arise. Misunderstanding the tech and using it wrong.
robberviet@reddit
I agreed on the sharding. About 12 years ago I worked with a PHP codebase that use memcached lock for sharded mysql (10), only for payment transactions. It workded fine. It also used TTL. Not sure if memcache is sharded but i guess not.
That was right after graduation so I didn't understand much, only now I understood what it does.
Treebro001@reddit
Redis lock doesn't wait for the transaction to actually commit on the db end before releasing, this is not the case for a db row lock. (At least this is what I've had issues with in the past with bad redis locks in similar situations)
Redis locks should never be used to help ensure transactional integrity within a db because of this.
TheMoonMaster@reddit
Yeah, the article implied (as read) that they abandoned redis because they didn’t understand the problem, which isn’t a good message to send.
That omission makes this post lose a ton of its value. What was the mistake? How can others avoid it? Why did a db solve it when you clearly said the logic was sound?
Unfortunate because I was really curious
wtfbbq81@reddit
You're not my supervisor
codesnik@reddit
there's nothing in ACID that would guarantee your "rowcount" to be correct all the time. Use transaction, select for update, and maybe "skip locked" instead of selecting exact record by id.
Bayakoo@reddit
Isn’t the issue that we are using locks on system A to safeguard access to a system B. You can’t about potential edge cases in such solution.
Since the exclusive access system already supports locks then use those locks - and you guarantee 100% no edge cases (except application bugs)
kernel_task@reddit
This is a classic producer/consumer problem and you are just reinventing the wheel. Go with Pub/Sub or something and you can distribute the work much more efficiently than a bum-rush of lock contention. You can then impose quotas on the producers to solve your fairness issue.
Soccer_Vader@reddit
I totally agree the SQL lock is the better option than re-inventing the wheel, but I have some questions:
Santrhyl@reddit
Postgre will release the lock as soon as the transaction is done or if the connection is closed.
So no need for a TTL but you should consider what happen if my cron crashes.
I agree they missed that
Soccer_Vader@reddit
I don't think they were using Postgres.js in built transaction lock tho? These are the example queries:
As far as I know you need to use explicitly use
SKIP LOCKED
when using theSELECT
query to not get rows that are currently in another transactions.What they are using is simple atomicity of Postgres that Redis already guarantees, that is why my belief is Redis was never the problem was their architecture, and when they migrated to using Postgres, they probably refactored, which "fixed" this problem, but they will probably re-introduce them once they start tinkering with this piece of code again.
Santrhyl@reddit
Yeah my bad, I read to many locks comments and forgot they didn't use that ... (advisory lock)
xeio87@reddit
Did I miss something? How did they solve tasks that crash on the SQL version without a timeout on the processing lock?
Or I guess the real problem is they were locking at the user level (on Redis) rather than task level and don't care about if individual tasks never finish?
Santrhyl@reddit
Postgre will release the lock as soon as the transaction is done or if the connection is closed.
xeio87@reddit
None of the examples in the article actually used database locks near as I can tell, just update...where and row count checks to see if the update grabbed the "lock".
Santrhyl@reddit
Oh yes you’re right :-) too many locks comments :-)
crap-with-feet@reddit
Let me add two more lessons to learn from this: 1. Do a little research up front. 99% of the time your problem is not only already a solved problem but there’s probably a standard for the solution. 2. The “new shiny” (Redis locks in this case) is most likely a solution looking for a problem or a solution for a problem different from your problem. If the new solution wasn’t designed for your specific problem, trying to adapt it to fit will often end in tears. The designers of the new thing haven’t tested it with your problem.
“New” isn’t automatically bad but neither is “old”. Old exists because it works and has survived the test of time.
jimbojsb@reddit
I mean Redis locking has been going on with extreme success for what…15 years? Longer? This is just a classic “distributed systems are hard” problem and unfortunately we’ve proven time and time again you can’t teach that, everyone has to fail at it in their own special way.
ub3rh4x0rz@reddit
Yeah I keep chuckling at the repeated mention of redis locking as the "new shiny" when its one of the most boring, well documented ways you can accomplish locking (and distributed locking with redlock).
"Do everything in postgres" works a lot better and longer than many people think, but redis is standard fare and I'd put them both in the category of "you almost certainly already use them, and you can use them for this other thing and get pretty good results, so don't add a new db". Redis is a better fit for basic locking IMO, but not as a replacement for properly designing your relational database schema and using its builtin features properly
Sweaty-Link-1863@reddit
That cat looks like it just debugged production code
PritchardBufalino@reddit
The author completely fucked up his implementation of a distributed lock then calls them bad. Intellectually irresponsible
ImNotHere2023@reddit
I read "user level locking" and instantly knew everything that came after would be garbage designs. Was not disappointed.
OutOfDiskSpace44@reddit
Not going to comment on the technical aspect here...the bigger question for me is how long this took, sounds like a few weeks, and it isn't clear what scale they're operating at to know if that's a good trade-off in time to some kind of solution.
RoomyRoots@reddit
Kinda funny this trend of pushing NoSQL solutions and people returning to DBs as they find that they just werk.
angellus@reddit
The problem was not Redis locks, in fact the author is not even using Redis locks. The probably is they tried to roll their own solution and fucked it up because they forgot to add a timeout to their "lock".
Having an extra layer of production on top of atomic database queries is not a bad idea. Especially in those cases where you must not have race conditions. Redis locks are also great when you need distributed locks that automatically get cleaned up.
inputwtf@reddit
Author clearly doesn't know about redis-py lock