Tried Qwen3.6-27B-UD-Q6_K_XL.gguf with CloudeCode, well I can't believe but it is usable

Posted by Clasyc@reddit | LocalLLaMA | View on Reddit | 19 comments

So I tried to run Qwen3-27B-UD-Q6_K_XL.gguf with 200K context on my RTX 5090 using llama.cpp. I'm getting around 50 tok/s, which is fine I guess, I don't really know this stuff so it might be improvable. But what I want to say is, I haven't tried local models for coding for quite a long time, and hell, I can't believe we're at the point where it's actually usable? Of course not the same first class experience as Opus 4.7, but damn, we are getting closer and closer.

Tried quite a difficult task, not casual CRUD stuff, to see if it can even try to prepare a plan that is somewhat making sense, and it did very well on the first try.

Of course that's just a general first impression and I haven't done real day to day coding with it, but at least I like what I see and it looks much more promising than my earlier experience with other models, which could start doing total nonsense at some points.