What’s a low memory way to run a Python http endpoint?
Posted by alexp702@reddit | Python | View on Reddit | 72 comments
I have a simple process that has a single endpoint that needs exposing on http. Nothing fancy but need to run it in a container using minimal memory. Currently running with uvicorn which needs \~600Mb of ram on start up. This seems crazy.
I have also tried Grainian which seems similar usage.
For perspective a Nodejs container uses 128mb, and a full phpmyadmin uses 20!
I realise you shouldn’t compare but a 30x increase in memory is not a trivial matter with current ram pricing!
fiskfisk@reddit
uvicorn should not use 600MB by itself. Are you allocating memory in your application to handle requests?
Bjoern is commonly mentioned as a low memory use http server for Python:
https://github.com/jonashaag/bjoern
I'd just evaluate bottle.py and the built-in http server as well. Not sure about gunicorn's requirements.
alexp702@reddit (OP)
bjoern seems very old - 4 years since update!
fiskfisk@reddit
If it works, it works. It's not like the basic http protocol has changed in 25+ years.
ra_men@reddit
And this is why we have mass supply chain vulnerabilities.
fiskfisk@reddit
Something can be stable without security issues. Software that doesn't evolve tends to be more stable and have less issues, but of course, you should always check if there are any known issues.
ra_men@reddit
Before this year I’d agree with you. Now we’re seeing so many 10-20 year old pieces of software have security gaps due to ai pen testing that everything feels up for grabs.
Fabulous-Possible758@reddit
Sure, age isn’t a perfectly reliable indicator of security. But it’s the same with any bug: new code can introduce bugs, regardless of who wrote it, and up to now people using the software and reviewing code over time was the only thing that can catch them.
So far AI has managed to catch a couple holes which no one noticed before, and this is a pretty recent development. It looks like most of the people doing the research with it are doing the responsible thing and letting developers know, and over time AI pen testing will just be part of the development process for secure software.
ra_men@reddit
Again I agree with you, but that doesn’t address the main concern here which is lack of updates. 4 years without updates for an http server is a quick no-go in any professional environment, that’s 100s of relevant CVEs. Obviously here for OPs case I doubt that matters.
alexp702@reddit (OP)
It does matter to me. We’re building a system for the future and whilst this component is not large or high frequency in use it is important.
ra_men@reddit
I’m not sure what people’s issue is with the concern over lack of updates, it’s definitely a legitimate concern.
murdoc1024@reddit
I dont get why you 're being downvoted, your viewpoint is perfectly valid.
unkz@reddit
Bottle's security surface is pretty damn small.
ra_men@reddit
Bjoern (if that’s what you meant) is pretty small, but that doesn’t mean the surface is small. It’s actually fairly broad, as are all http servers.
alexp702@reddit (OP)
Yes, but bjoern uses the older V2 WSGI protocol which is now (apparently according to my AI) WSGI V3. Personally I don't go for stuff that's not obviously maintained and relatively active - it causes problems down the line.
yerfatma@reddit
But not this one.
kaszak696@reddit
Your AI is hallucinating to you, and you'd know that if you'd spent a minute reading the original spec instead of just lazily having a robot lie to you.
alexp702@reddit (OP)
Not doing a full implementation just rapidly prototyping a solution to see the memory usage. If the AI can get it up and running in 10 minutes warts and all that’s good for this purpose. I’m impressed - in about 2hrs it has allowed me to test almost every proposal here. I just spotted that in the output, the 4 year unmaintained problem still stands.
UsefulOwl2719@reddit
With this approach, you will always be disappointed in the long run. I look for deps that I would be happy using if they never received another commit. Recent churn indicates a project is still in development or is trying to design itself around a moving target.
axonxorz@reddit
If you're concerned about having up-to-date implementations, your LLM has failed you. You should be targeting ASGI, even if you aren't using it's concurrency advantages.
japherwocky@reddit
they're running it in a docker container, it's docker not uvicorn
Huge-Habit-6201@reddit
How about cherrypy?
jkh911208@reddit
Use Go
nggit@reddit
Try https://github.com/nggit/tremolo . In my docker it never eats > 50MB for simple usage. It has been made with minimum cyclic ref so memory will stay low for long run. But t's not the fastest / well-known server.
thegoz@reddit
have you considered not using Python? if it’s a simple process it might be worth it. I dont know if you have any AI ar your disposal but rewriting into a different language is kinda doable
WJMazepas@reddit
It does seem crazy, because I have dev servers with 512MB of RAM and a medium FastAPI application uses 300MB on startup
This node comparison, is running a similar program?
LoreBadTime@reddit
I was greasy, and used half the AWS free SSD as swap. Not that I need to use the server that much
tRfalcore@reddit
is that proncounced like greddy like bread
alexp702@reddit (OP)
I am running on a Mac - Arm64 image for all. Node seems to have a baseline of 128MB - seems a well documented thing to do with the garbage collector. You can reduce it with some command line flags, but it then starts to become unstable. My actual python program is using about 70Mb on start up (possibly due to libraries) - which I can live with. My surprise is how hard it seems to be to serve this without eating up RAM.
We have a bunch of 16Gb Macs, developing a docker based system. Most code based containers are Node, with a Python one stuffed in there. I want to make sure the team has as much ram left as possible, which began this investigation. Disk isn't a problem - just ram as we grow the number of services.
hstarnaud@reddit
I noticed docker for Mac isn't really good at freeing up memory. It will request memory when it needs it but then the container keeps that memory allocated until it shuts down even if it's not using it anymore.
So it's maybe not your memory on server start-up which is at play here, rather your peak memory usage within a request
WJMazepas@reddit
Hmmm maybe it's due to running on Docker in MacOS that is using all that memory.
Does the node code is also running in Docker?
alexp702@reddit (OP)
Yes everything pretty similar to a windows PC. Side note we were using Amd64 images on arm macs - they use about 30% more ram to emulate.
Will pick up tomorrow I think!
makinggrace@reddit
Is that RAM usage the containerized usage? How much of it is just starting the container?
alexp702@reddit (OP)
I am using uv run in the container - I think it may be part of the problem, as it seems to not matter what I try it stubbornly wants 512MB.
makinggrace@reddit
Oh you're running it in Docker. Docker is not a lightweight container. Can you use something like Podman?
corey_sheerer@reddit
UV should not be installed in your runtime. You should create a multi stage build and copy the installs into your runtime step. Search for recommended container builds for UV and should find something to improve upon what you have
nickN42@reddit
Which base image are you using?
alexp702@reddit (OP)
Currently python:3.13-slim
yerfatma@reddit
Do you have to use Docker? That may be a lot of your overhead. What is this single endpoint doing?
AreWeNotDoinPhrasing@reddit
They are using docker on arm Mac’s emulating amd64 in the containers lol they have no idea what they’re doing and won’t listen to reason (based on other comment chains they are replying to).
nickN42@reddit
I think you need to look into your service first. I just tried to run a hello-world server with uvicorn (using
uv run) using slim python image -- although I went with uv provided one, based on trixie-slim; and according topodman statsit takes about 30 megs of RAM running.JohnDisinformation@reddit
Have you trimmed it down?
Stick that in your docker file.
cheerycheshire@reddit
OP has a problem with ram usage, not image size...
Also, for python, it's better to use multi-stage builds - use non-slim version to build libs (can apt install everything needed, no need to uninstall and purge stuff later), then copy built libs over to a slim image of same OS (where you don't use apt at all, unless of course you need some system tool) and your project files.
geravalas@reddit
That won't reduce memory usage or image size unless you squash.
Feeling_Ad_2729@reddit
600MB is almost certainly your application's imports, not uvicorn itself. Uvicorn alone uses maybe 30-40MB. The usual culprits: numpy/pandas at import time, heavy ML libraries, pydantic v1 vs v2 (v2 is much leaner).
Profile what's actually using memory first:
For genuinely lightweight Python HTTP: - bottle — single file, zero dependencies, runs on the stdlib WSGI server or gunicorn - falcon — built for low overhead APIs, much lighter than FastAPI - aiohttp — if you need async but not FastAPI's ecosystem - flask + waitress — simple, predictable memory
If you genuinely need minimal footprint (IoT, serverless cold starts), bottle + gunicorn in a slim Docker image usually lands around 50-80MB total. But fix the import problem first — swapping the HTTP framework won't help if you're importing pandas in your handler.
alexp702@reddit (OP)
Thanks - I will check that as some of that may be happening.
lanupijeko@reddit
how did you check it's python?
If you are checking container memory usage, could it be the container's context?
alexp702@reddit (OP)
Htop shows the usage all in the “uv run uvicorn” process.
CatolicQuotes@reddit
Nothing fancy? What exactly is this nothing fancy? What package does it use? Python loads everything into memory.
paperlantern-ai@reddit
This is almost certainly not uvicorn itself - a bare uvicorn app should sit around 30-40MB. The fact that you're seeing 512MB+ regardless of which server you try points to something else in your container setup. Since you mentioned using
uv runinside the container, that's likely a big contributor - uv should only be in your build stage, not your runtime. Try a multi-stage Dockerfile: build/install deps with uv in the first stage, then copy just the venv into a cleanpython:3.13-slimfinal stage. You'll probably land around 80-100MB total.Mr-Cas@reddit
600MB is crazy. That must be a misconfiguration. My full fledged feature rich full stack software projects with large Flask APIs, hosted using waitress, consume about 40-60MB.
jvlomax@reddit
Have you tried
python -m http.server?aisingiorix@reddit
Not recommended for production: https://docs.python.org/3/library/http.server.html#http-server-security
jvlomax@reddit
Sure, I wouldn't use it for critical production infrastructure. But I don't know what OP wants so it may be an option
ra_men@reddit
This is similar to the recent Axios compromise fyi
VEMODMASKINEN@reddit
Use Go, it's easier to build and the image will be less than 10mb.
alexp702@reddit (OP)
I need python - various libraries on the end point require it, unless there is some trick to Go <-> python?
WJMazepas@reddit
Nah, it wont be a good integration between then.
If it was an endpoint just checking some stuff and returning a value, then yeah Go would be fantastic for that.
If you need a lot of python stuff, then stick with Python.
Go also have a lot of libraries similar to Python, but it can be a good amount of work of porting a Python code to Go
thekicked@reddit
You may want to use memray to see which parts of the server are taking up a lot of memory
https://bloomberg.github.io/memray/
AlexMTBDude@reddit
You need to make your mind up about the units that you use; Is it millibits (mb), megabits (Mb) och megabytes (MB)?
ReadyAndSalted@reddit
What in the world is a millibit? mb means megabyte.
npisnotp@reddit
I think he meant "mib" which is the abbreviation for "mebibyte".
"Megabyte" is computed as 10⁶, while "Mebibyte" is computed as 2²⁰; as you can imagine the curve for "Megabyte" raises slower than "Mebibyte".
This is the reason why when you buy a 500 "gigabytes" disk it only have 465, because it really have 500000000000 bytes and not 536870912000; unfortunately the correct unit terms are not widely used.
cheerycheshire@reddit
No, the person above is just pedantic - SI prefixes go m for milli (1e-1) and M for mega (1e6).
Also, it would be MiB - again, mebi is Mi, byte is B (small b is bit). mi prefix doesn't exist, and Mib would be mebibit, not mebibyte.
AlexMTBDude@reddit
Nope: https://en.wikipedia.org/wiki/Megabyte
Ever taken a comp sci course?
duskhat@reddit
Have you? In what world is it realistic to ask this question for anything other than Megabytes?
jmelloy@reddit
Absolutely nobody read his message and thought that “mb” meant millibits.
alexp702@reddit (OP)
Sorry all Megabytes. All the others are irrelevant to me ;-)
corey_sheerer@reddit
I'm not sure, but I'd assume 600mb isn't a requirement as I have deployed a few fastapi services to kubernetes with quota limits at 500mb. Haven't tried to lower it, but it seems plausible to lower this to maybe 200mb. It seems like something else may be at play. How did you install Fastapi into your image?
sys_exit_0@reddit
I will
UpsetCryptographer49@reddit
You can do it with an Apache HTTP Server with mod_wsgi. This needs about 20mb python + 20mb wsgi + your app. Apache needs about 10mb for master + 25mb for each worker process.
Particular-Plan1951@reddit
Another option is using FastAPI with careful dependency control.
In some setups the memory footprint is actually pretty small.
But the container base image can also make a big difference.
kaszak696@reddit
Do you really need a full-blown production-grade web server for your use case? The Python standard library has a very basic module for simple http serving. Hard to say whether it's suitable for you without knowing what exactly are your needs.
TheMagicTorch@reddit
FastAPI? Flask?