Wanna try the best coding model with my rtx 3090, not sure where to start, I believe Qwen3.5-27B-UD-Q4_K_XL would be the best? if so should I use ollama with it?

Posted by dreamer_2142@reddit | LocalLLaMA | View on Reddit | 18 comments

I've already searched, but information is getting updated each week, so it's really hard to get an answer, I really hope some of you guys can give me some tips. And can I use an agent with it to enhance the code? Love to hear your setup.

Thanks!

[-]

b1231227@reddit

https://huggingface.co/DavidAU/Qwen3.6-27B-Heretic-Uncensored-FINETUNE-NEO-CODE-Di-IMatrix-MAX-GGUF
I recommend this model. I'm currently using IQ3_M to modify the llama.cpp code, and the automatic operation works quite well.

[-]

dreamer_2142@reddit (OP)

If I want to use it with Ollama, I need a "Modelfile" I assume? where can get a template to make that for this model?

[-]

vick2djax@reddit

You’re gonna get half the speeds and half the performance in Ollama. Just save yourself the confusion ahead. I started with Ollama and wasted so much time. It’s garbage.

[-]

Anbeeld@reddit

Qwen 3.6, not 3.5

[-]

LirGames@reddit

Forget ollama, use llama.cpp or llama-swap (which uses llama.cpp anyway). Unsloth Q4_K_XL is perfectly fine. You can run it with 80K context of you have the vision active in GPU or you can offload it to RAM/disable and you can easily go up to 96K context at Q8 KV Cache.

If you don't understand anything about this message. Just drop it into Gemini/Claude and ask help setting everything up (Docker highly recommended), they'll figure it out.

[-]

dreamer_2142@reddit (OP)

Thanks a lot, will do.

[-]

kosnarf@reddit

Check out llama-swap. It's been performing much better than ollama.

[-]

sine120@reddit

Skip Ollama, just learn to build llama.cpp. 27B Q4 is a good pick. Use llama-server and hook it up to opencode or Pi coding agents. Opencode is you just want something that works, Pi if you want to speed up prompt processing.

[-]

T0nd3@reddit

Good starting point. A few things to sharpen:

On the model: Qwen3-32B at Q4_K_M fits in your 24GB (around 19-20GB loaded) and is arguably the best coding model you can run locally right now. The "UD" unsloth quants are generally high quality — if you see Qwen3-32B-UD-Q4_K_XL that's a solid pick. If you want headroom for longer contexts, Qwen3-30B-A3B (the MoE variant) uses less VRAM at similar quality.

On Ollama: Yes, start with Ollama. It handles model management cleanly and exposes an OpenAI-compatible API which is important for the agent step. One thing to set:

OLLAMA_NUM_CTX=16384 ollama serve

The default context window is 2048 which is too small for coding tasks.

On agents: Two setups worth trying:

Continue.dev (VS Code/JetBrains) — open source, connects directly to Ollama, handles autocomplete + chat + codebase context. Takes 10 minutes to configure.
Aider — terminal-based, excellent for multi-file refactors. Works with any OpenAI-compatible endpoint: aider --model ollama/qwen3:32b

For Qwen3 specifically, enable thinking mode in Continue's system prompt or via /think in Ollama — it noticeably improves code quality on harder tasks.

[-]

Then-Topic8766@reddit

Damn bots! You will need Qwen 3.6 (27b or 35.b-A3b).

[-]

dreamer_2142@reddit (OP)

I've tried to use the downloaded model "Qwen3.5-27B-UD-Q4_K_XL" with OLLAMA, but it gives an unrelated answer to my question. I assume I need to download a specific model from their library and not any model I could find from huggingface?

[-]

Then-Topic8766@reddit

Try to use llama.cpp directly. Much better experience than ollama.

[-]

dreamer_2142@reddit (OP)

ok, thanks. any quantized you recommend for 24gb? and recommendation for an ai agent.

[-]

Then-Topic8766@reddit

Install llama.cpp. Depending of context size you want choose quant. I think Q4_K_XL should work with 3090. Try both 27b and 35bA3b. First is smarter but second is faster. And with second you can offload to RAM ang get bigger quant.

[-]

sagiroth@reddit

Bot account, dont trust this one. Recommending old setups

[-]

dreamer_2142@reddit (OP)

Almost fallen to it. thanks, but where can I find the club 3090 on github?

[-]

sagiroth@reddit

https://github.com/noonghunna/club-3090

[-]

dreamer_2142@reddit (OP)

Thanks m8!