What is the local LLM alternative of Codex?
Posted by Euphoric_North_745@reddit | LocalLLaMA | View on Reddit | 25 comments
Open AI codex got so many updates recently, it now does a lot of things in your computer, I tried a few, did not try all of them, and based on my experience with Open AI, they usually have more propaganda
Anyway, what is the local LLM alternative of Codex? I mean at Codex level
Zealousideal-Lie8829@reddit
Code Llama (Meta), StarCoder / StarCoder2
chibop1@reddit
Codex is free open source, and it supports local models. It even has built-in --oss flag to support their gpt-oss.
https://github.com/openai/codex
You can hook up to any local engine that supports openai compatible api.
Qwen-3.6-27b works great!
Alternative_You3585@reddit
None, simply nothing is at gpt 5.5 level you can run locally without having a mini datacenter
ttkciar@reddit
That's an exaggeration. You can run GLM-5.1 locally on a single eight-GPU server, no need for a "mini datacenter".
fuckable-switcher@reddit
You can also run a 1trillion parameter model on a 1tb Intel optane drive
And they are dirt cheap and abundant
Alternative_You3585@reddit
glm 5.1 can't compete with gpt 5.5
I was thinking of a few Kimi agents working in parallel, if you Google, a mini datacenter threshold on Google it'll spit out
A mini (or micro) data center generally starts at 5 kW [...]
you'll need like 8 rtx 6000 pros to reach such, which is somewhat realistic if you operate at high context.
And yeah context is another point, glm nor Kimi got 1M, nearest would be mimo (but it's benchmaxxed) or Deepseek (v4 got hallucination rates of gptoss20b).
So no it's not an exaggeration
fuckable-switcher@reddit
Grok also has a high context limit so does llama 4
fuckable-switcher@reddit
No you don’t
A single arc pro b70 A threadripper 700 series 256 gb ram
And then an Intel optane drive
ttkciar@reddit
It doesn't beat GPT-5.5, but it certainly does compete with GPT-5.5, at least for codegen. Benchmarks put it only about 8% to 10% lower than GPT-5.5 for codegen tasks.
Thanks for catching me up on the threshold for "mini datacenter". Usually that implies at least a couple of 40U racks, to me, but if we're only talking about a 5 kW threshold, that covers a lot of hobbyist homelabs, and even a single modern multi-GPU server can reach that.
Still, when we are talking about hardware requirements for local inference, "mini datacenter" can be offputting, especially to neophytes. It might be more helpful to specify VRAM requirements or expenditure estimates.
fuckable-switcher@reddit
Several models do beat chat gpt and are free and open source
Fi3nd7@reddit
What? Disagree there. The closest thing is deep seek that's open source. Even then gpt is better. And deep seek v4 requires like 70k of hardware at full quant
Perfect-Campaign9551@reddit
OpenCode with Qwen 3.6 27b
Euphoric_North_745@reddit (OP)
how is the 27b llm doing? can it write code, and buid lt and and resolve coding errors?
fuckable-switcher@reddit
Yes it can quiet well
Almost and local model will outperform ChatGPT due to smn called FINETUNE so yk
fuckable-switcher@reddit
Open code is free and open source
croninsiglos@reddit
You can use Codex with local models.
Euphoric_North_745@reddit (OP)
Isn't all on the server side? is it the same codex?
croninsiglos@reddit
Both the codex-cli and the codex app can use local models through llamacpp, lmstudio, ollama, etc.
SM8085@reddit
There's codex-cli, which is the local client for models such as the gpt codex series models. Which the models themselves are on openAI servers.
IMO they make it a pain in the ass to make local, I had a different bot look at the config and it made a 'model catalog' and then edit it into the config. But now it can work with my llama-server with Qwen3.6-35B-A3B.
ttkciar@reddit
Codegen applications like Codex, Claude Code, and OpenCode have two parts, following the conventional client/server paradigm:
The application, which runs locally on your computer as a client,
The inference service, which can be running anywhere as a server -- OpenAI's API, Claude Code's API, or llama.cpp's API
If you have llama.cpp's
llama-serverrunning on your own computer, you can use it as the inference service, so that both sides of the client/server system are running locally.For example, you could run
llama-serveron your own computer so that you are using Qwen3.6-35B-A3B or Gemma-4-26B-A4B locally instead of OpenAI or Claude, and then point the Codex application at it so that Codex is using your local model instead of a commercial service.To get an experience similar to the commercial services, though, you would need to host a very large, advanced model like GLM-5.1, which requires on the order of 512GB of VRAM, even with Q4_K_M quantization.
PhilWheat@reddit
Did you try OpenCode?
Euphoric_North_745@reddit (OP)
I did not, but it is time to try it I think, not sure if it is as good as Codex, this is why I am asking people who tested similar software
PhilWheat@reddit
I find it useful - not perfect, but none of them has been. One of the issues I run into is that most of the tools require bash and I use powershell. OpenCode has been the only one I haven't run into major issues with. Probably not relevant to your situation, but the reason I can't give you a proper evaluation of it against the others.
Euphoric_North_745@reddit (OP)
I love powershell, codex uses bash and makes powershell mistakes all the time, i have powershell default in linux, but when running codex, the script runs it inside bash 😄
Elkal277@reddit
tried a bunch, codeqwen1.5 and deepseek coder v2 are solid. not quite on par but close enough for local setups