accidentally built a leaky bucket instead of a token bucket (my scraper was 10x too slow)

Posted by jalilbouziane@reddit | Python | View on Reddit | 1 comments

I've been working on an integration with a legacy API that is extremely strict about rate limits.

I wrote a RateLimiter class to handle the timing precisely using time.monotonic() to avoid drift.

Here is the code I was using:

import time

class StrictLimiter:
    def __init__(self, rate_limit_per_sec):
        self.interval = 1.0 / rate_limit_per_sec
        self.last_check = 0

    def wait(self):
        # The "Clever" Trap: Enforcing strict spacing
        now = time.monotonic()
        elapsed = now - self.last_check

        if elapsed < self.interval:
            sleep_time = self.interval - elapsed
            time.sleep(sleep_time)

        self.last_check = time.monotonic()

It looks clean, but the throughput was terrible. The API actually allows bursts (up to 10 requests instantly), as long as the average stays at 1/sec.

My code was enforcing a rigid 1-second gap between every single request.

I had implemented a smoothing traffic when I needed a token bucket. On a batch of 50 items, my code took 50 seconds. A token bucket takes \~40 seconds because it consumes the first 10 instantly.

So, I had to rewrite this to actually bank time when the scraper was idle.

I put together a simulation to prove the difference. It mocks an API that allows a burst of 10 but bans you if you exceed the average.

The StrictLimiter (above) survives but is too slow.

A naive loop gets banned.

The TokenBucket hits the sweet spot.

If you want to test if your implementation handles the banked time logic correctly without triggering a ban:

Challenge (No login required to run the code).