accidentally built a leaky bucket instead of a token bucket (my scraper was 10x too slow)
Posted by jalilbouziane@reddit | Python | View on Reddit | 1 comments
I've been working on an integration with a legacy API that is extremely strict about rate limits.
I wrote a RateLimiter class to handle the timing precisely using time.monotonic() to avoid drift.
Here is the code I was using:
import time
class StrictLimiter:
def __init__(self, rate_limit_per_sec):
self.interval = 1.0 / rate_limit_per_sec
self.last_check = 0
def wait(self):
# The "Clever" Trap: Enforcing strict spacing
now = time.monotonic()
elapsed = now - self.last_check
if elapsed < self.interval:
sleep_time = self.interval - elapsed
time.sleep(sleep_time)
self.last_check = time.monotonic()
It looks clean, but the throughput was terrible. The API actually allows bursts (up to 10 requests instantly), as long as the average stays at 1/sec.
My code was enforcing a rigid 1-second gap between every single request.
I had implemented a smoothing traffic when I needed a token bucket. On a batch of 50 items, my code took 50 seconds. A token bucket takes \~40 seconds because it consumes the first 10 instantly.
So, I had to rewrite this to actually bank time when the scraper was idle.
I put together a simulation to prove the difference. It mocks an API that allows a burst of 10 but bans you if you exceed the average.
The StrictLimiter (above) survives but is too slow.
A naive loop gets banned.
The TokenBucket hits the sweet spot.
If you want to test if your implementation handles the banked time logic correctly without triggering a ban:
Challenge (No login required to run the code).
Ok_Tap7102@reddit
The only hard and fast rule is 1 req/sec.
Why don't you just time.sleep(1), and then another time.sleep(1) if you encounter a HTTP 429
Over any appreciable time duration, the 10 request burst becomes negligible.
If it took you longer than 10 seconds to come up with your complex idea and implement it, you've saved negative time.