What is the local LLM alternative of Codex?

Posted by Euphoric_North_745@reddit | LocalLLaMA | View on Reddit | 25 comments

Open AI codex got so many updates recently, it now does a lot of things in your computer, I tried a few, did not try all of them, and based on my experience with Open AI, they usually have more propaganda

Anyway, what is the local LLM alternative of Codex? I mean at Codex level

[-]

Zealousideal-Lie8829@reddit

Code Llama (Meta), StarCoder / StarCoder2

[-]

chibop1@reddit

Codex is free open source, and it supports local models. It even has built-in --oss flag to support their gpt-oss.

https://github.com/openai/codex

You can hook up to any local engine that supports openai compatible api.

Qwen-3.6-27b works great!

[-]

Alternative_You3585@reddit

None, simply nothing is at gpt 5.5 level you can run locally without having a mini datacenter

[-]

ttkciar@reddit

That's an exaggeration. You can run GLM-5.1 locally on a single eight-GPU server, no need for a "mini datacenter".

[-]

fuckable-switcher@reddit

You can also run a 1trillion parameter model on a 1tb Intel optane drive

And they are dirt cheap and abundant

[-]

Alternative_You3585@reddit

glm 5.1 can't compete with gpt 5.5

I was thinking of a few Kimi agents working in parallel, if you Google, a mini datacenter threshold on Google it'll spit out

A mini (or micro) data center generally starts at 5 kW [...]

you'll need like 8 rtx 6000 pros to reach such, which is somewhat realistic if you operate at high context.

And yeah context is another point, glm nor Kimi got 1M, nearest would be mimo (but it's benchmaxxed) or Deepseek (v4 got hallucination rates of gptoss20b).

So no it's not an exaggeration

[-]

fuckable-switcher@reddit

Grok also has a high context limit so does llama 4

[-]

fuckable-switcher@reddit

No you don’t

A single arc pro b70 A threadripper 700 series 256 gb ram

And then an Intel optane drive

[-]

ttkciar@reddit

glm 5.1 can't compete with gpt 5.5

It doesn't beat GPT-5.5, but it certainly does compete with GPT-5.5, at least for codegen. Benchmarks put it only about 8% to 10% lower than GPT-5.5 for codegen tasks.

Thanks for catching me up on the threshold for "mini datacenter". Usually that implies at least a couple of 40U racks, to me, but if we're only talking about a 5 kW threshold, that covers a lot of hobbyist homelabs, and even a single modern multi-GPU server can reach that.

Still, when we are talking about hardware requirements for local inference, "mini datacenter" can be offputting, especially to neophytes. It might be more helpful to specify VRAM requirements or expenditure estimates.

[-]

fuckable-switcher@reddit

Several models do beat chat gpt and are free and open source

[-]

Fi3nd7@reddit

What? Disagree there. The closest thing is deep seek that's open source. Even then gpt is better. And deep seek v4 requires like 70k of hardware at full quant

[-]

Perfect-Campaign9551@reddit

OpenCode with Qwen 3.6 27b

[-]

Euphoric_North_745@reddit (OP)

how is the 27b llm doing? can it write code, and buid lt and and resolve coding errors?

[-]

fuckable-switcher@reddit

Yes it can quiet well

Almost and local model will outperform ChatGPT due to smn called FINETUNE so yk

[-]

fuckable-switcher@reddit

Open code is free and open source

[-]

croninsiglos@reddit

You can use Codex with local models.

[-]

Euphoric_North_745@reddit (OP)

Isn't all on the server side? is it the same codex?

[-]

croninsiglos@reddit

Both the codex-cli and the codex app can use local models through llamacpp, lmstudio, ollama, etc.

[-]

SM8085@reddit

There's codex-cli, which is the local client for models such as the gpt codex series models. Which the models themselves are on openAI servers.

IMO they make it a pain in the ass to make local, I had a different bot look at the config and it made a 'model catalog' and then edit it into the config. But now it can work with my llama-server with Qwen3.6-35B-A3B.

[-]

ttkciar@reddit

Codegen applications like Codex, Claude Code, and OpenCode have two parts, following the conventional client/server paradigm:

The application, which runs locally on your computer as a client,
The inference service, which can be running anywhere as a server -- OpenAI's API, Claude Code's API, or llama.cpp's API

If you have llama.cpp's llama-server running on your own computer, you can use it as the inference service, so that both sides of the client/server system are running locally.

For example, you could run llama-server on your own computer so that you are using Qwen3.6-35B-A3B or Gemma-4-26B-A4B locally instead of OpenAI or Claude, and then point the Codex application at it so that Codex is using your local model instead of a commercial service.

To get an experience similar to the commercial services, though, you would need to host a very large, advanced model like GLM-5.1, which requires on the order of 512GB of VRAM, even with Q4_K_M quantization.

[-]

PhilWheat@reddit

Did you try OpenCode?

[-]

Euphoric_North_745@reddit (OP)

I did not, but it is time to try it I think, not sure if it is as good as Codex, this is why I am asking people who tested similar software

[-]

PhilWheat@reddit

I find it useful - not perfect, but none of them has been. One of the issues I run into is that most of the tools require bash and I use powershell. OpenCode has been the only one I haven't run into major issues with. Probably not relevant to your situation, but the reason I can't give you a proper evaluation of it against the others.

[-]

Euphoric_North_745@reddit (OP)

I love powershell, codex uses bash and makes powershell mistakes all the time, i have powershell default in linux, but when running codex, the script runs it inside bash 😄

[-]

Elkal277@reddit

tried a bunch, codeqwen1.5 and deepseek coder v2 are solid. not quite on par but close enough for local setups