accidentally built a leaky bucket instead of a token bucket (my scraper was 10x too slow)

Posted by jalilbouziane@reddit | Python | View on Reddit | 1 comments

I've been working on an integration with a legacy API that is extremely strict about rate limits.

I wrote a RateLimiter class to handle the timing precisely using time.monotonic() to avoid drift.

Here is the code I was using:

import time

class StrictLimiter:
    def __init__(self, rate_limit_per_sec):
        self.interval = 1.0 / rate_limit_per_sec
        self.last_check = 0

    def wait(self):
        # The "Clever" Trap: Enforcing strict spacing
        now = time.monotonic()
        elapsed = now - self.last_check

        if elapsed < self.interval:
            sleep_time = self.interval - elapsed
            time.sleep(sleep_time)

        self.last_check = time.monotonic()

It looks clean, but the throughput was terrible. The API actually allows bursts (up to 10 requests instantly), as long as the average stays at 1/sec.

My code was enforcing a rigid 1-second gap between every single request.

I had implemented a smoothing traffic when I needed a token bucket. On a batch of 50 items, my code took 50 seconds. A token bucket takes \~40 seconds because it consumes the first 10 instantly.

So, I had to rewrite this to actually bank time when the scraper was idle.

I put together a simulation to prove the difference. It mocks an API that allows a burst of 10 but bans you if you exceed the average.

The StrictLimiter (above) survives but is too slow.

A naive loop gets banned.

The TokenBucket hits the sweet spot.

If you want to test if your implementation handles the banked time logic correctly without triggering a ban:

Challenge (No login required to run the code).

[-]

Ok_Tap7102@reddit

The only hard and fast rule is 1 req/sec.

Why don't you just time.sleep(1), and then another time.sleep(1) if you encounter a HTTP 429

Over any appreciable time duration, the 10 request burst becomes negligible.

If it took you longer than 10 seconds to come up with your complex idea and implement it, you've saved negative time.