Is mitigating FastAPI event loop I/O overhead via PyO3 worth the FFI complexity? (Benchmarks inside)

Posted by mordechaihadad@reddit | Python | View on Reddit | 10 comments

Usually when you run high-concurrency rate limiting inside FastAPI, you are usually forcing python's single threaded event loop to spend precious time on network driver I/O just to verify a token before the request even hits the application logic.

I wanted to see how cleanly I could isolate the Redis network layer outside of python, so I built rustgate using PyO3 and a multi-threaded tokio driver.

Disclaimer: This is basically a proof of concept. It's basically tied to another experimental crate I am working on (axum-rate-limiter), and so it's not super configurable or abstracted as of now. Could you use in production? Probably, but why?

That being said, the raw performance under a 100-concurrency flood on a heavy, dynamically rerouted endpoint turned out pretty efficient:

Pushed 1,128 req/sec without dropping a connection.

Fastest response hit 15.3 ms.

Fails closed instantly with immediate 429 rejections to protect downstream application logic.

The cool part: I benched a naked, no-op /health endpoint (literally just returning {"status": "ok"}) on the same machine, and it maxed out at 1,496 req/sec.

The fact that crossing FFI boundaries, handling memory pinning, and doing a multi-threaded Tokio to Redis round-trip only costs \~370 req/s, proves that the Rust integration added almost non existent overhead.

I’ve dropped the GitHub link and the core architectural layout in the comments section below to keep this thread focused on the performance discussion.

[-]

Actual__Wizard@reddit

Do you understand the concept of multiplexing?

[-]

Ok_Tap7102@reddit

Can you ELI5: what problem were you encountering that this solves? ie do you genuinely require 1,000 requests per second to a remote GPT API server?

How would you pro/con this approach against say bifrost?

[-]

mordechaihadad@reddit (OP)

An oversight from me, when I looked up bifrost to answer your question I did not see the fact that it supports token-aware rate limiting. I will have to re-answer to you properly.

[-]

mordechaihadad@reddit (OP)

Hey, first you are utterly correct, none needs 1000 rq/s.
Secondly this is a stress test of the gateway layer, not LLM provider. I am not a competitor to bifrost by any chance, this is basically a proof of concept showing rate limiting in python while using Rust with PyO3 under the hood, to stop malicious traffic from hijacking your event loop or just to rate limit.

Basically I wrote axum-rate-limiter as a POC (which was initially built for SurrealDB), and decided today to see hook my this crate, to this new project and see in the context of AI infra (I don't really pay for AI so I get rate limited quite a lot and was curious)

Hopefully this answers your question.

[-]

teerre@reddit

These numbers don't make any sense without a baseline

[-]

mordechaihadad@reddit (OP)

My apologies, you are right. I will have to update the post once I get to writing the pure python equivalent.

[-]

riksi@reddit

How would this work when you have multiple fastapi processes? (assuming one per-core, say, 16)

Can they all talk to the same rust process or do you need 1 rust process per fastapi process?

[-]

SeniorScienceOfficer@reddit

If you’re looking to speed up an ASGI api application, just use Starlette. While FasAPI has syntax sugar and shortens development time, it does so at the cost of performance because of extra imports, validation, setup, etc.

According to this ASGI performance benchmarks, it is significantly more performant in transactions per second: https://gist.github.com/patx/0c64c213dcb58d1b364b412a168b5bb6#results-table

[-]

latkde@reddit

Many production deployments address event loop overhead by switching from the standard library's event loop implementation to alternatives like uvloop.

In particular, the widely used Uvicorn ASGI server will automatically select uvloop if available: https://uvicorn.dev/concepts/event-loop/

That page documents further alternatives, some of which are also based on a Rust/Tokio stack.

[-]

mordechaihadad@reddit (OP)

https://github.com/MordechaiHadad/rustgate