TheaterFire

Did anyone compare this model to the full Qwen coder? it claims to give almost identical performance at 60B

Posted by Significant_Fig_7581@reddit | LocalLLaMA | View on Reddit | 21 comments

Reply to Post

21 Comments

DeProgrammer99@reddit

I tried it *just* for you. Using the same settings and prompt to compare Q6\_K of this to the last quant I tried, UD-Q6\_K\_XL, whose output had [4 compile errors](https://www.reddit.com/r/LocalLLaMA/comments/1r3weq3/comment/o581ehs/) (ignoring the ones I said I can't blame it for) on my "make an entire TypeScript minigame" vibe check... First of all, the model file is 28% smaller, making it a bit faster because more of it fits on my GPUs. Also keep in mind this GGUF is a static quant--no importance matrix. Like the original model was fond of doing, it kinda-cheated by leaving out all the imports and only defined a few instances of the main gameplay data (plant seeds, in this case) with "..." after the first three. And ignoring the one "function called on the wrong object" error because my spec doesn't say that `fullSave` belongs to `GameState` and not `City`, it crapped out 22 compile errors: \- 2 uninitialized properties \- 1 use of a nonexistent type (`Difficulty`, accounting for 4 errors) \- 5 undeclared properties (accounting for 8 errors) \- 2 undefined functions \- 4 objects passed into functions that can't accept that type (calls concentrated in one function) \- 1 complete duplicate assignment of a property (specifically, an event handler) in one object initializer \- 1 function called on the wrong object (the spec *does* say it's a `City` method for this one, but the generated code tried to call it on `uiManager`) Keep in mind this pruned version was intended to retain math and coding skills *specifically* (second paragraph of the [model card](https://huggingface.co/bknyaz/Qwen3-Coder-Next-REAM)), so the performance degradation should be more noticeable on other kinds of prompts. I conclude that this REAM version is *pretty* *bad* compared to the original model, at least for TypeScript.
View on Reddit #78437672

ClimateBoss@reddit

can you compare MXFP4 to Q8 bruh ?
View on Reddit #78518835

Significant_Fig_7581@reddit (OP)

Thank you so much!!!!
View on Reddit #78437978

Mkengine@reddit

Also on swe-rebench it seems to do better than Qwen-Coder-480B: https://swe-rebench.com/
View on Reddit #78470147

rainbyte@reddit

Great review, thanks! This is the kind of content that makes this community so good :)
View on Reddit #78459523

nima3333@reddit

We should have more of these qualitative reviews, thank you.
View on Reddit #78459285

Round_Mixture_7541@reddit

Damn this is the best review i've read so far. Kudos!
View on Reddit #78442218

Sufficient-Rent6078@reddit

I'm indeed using the model since about a week (together with the [b7972](https://github.com/ggml-org/llama.cpp/releases/tag/b7972) llama.cpp release). I definitely prefer the `mradermacher/Qwen3-Coder-Next-REAM-GGUF:Q4_K_M` variant for coding with python over last years `Qwen3-Coder-30B-A3B-Instruct` \- it is aware of a number of relatively new language features that last years models never got right and gave satisfying answers in a light debugging session. On a dual `4090` system I still have about 3GB of VRAM headroom left on each card with `--ctx-size 120000` at 95 token/s. I have used `Qwen3-Coder-Next` a few times over API and definitely noticed a significant difference when trying to use it im my native language (German) - here the API model is already quite bad, but for the `REAM` model it generated multiple grammatical errors.
View on Reddit #78449803

Achso998@reddit

How does the Ream model compare in coding to the normal model?
View on Reddit #78462489

Sufficient-Rent6078@reddit

Hard to say, as I did not use the normal model that much. I find that `minimax-2.5`, `gemini-3-flash-preview`, `GLM-5` and `Kimi-K2.5` all sit in a more attractive price/performance spot when used via API so I don't have that much of a comparison. I have noticed (but can't tell so far if there are quantization/REAM specific differences), that `Qwen3-Coder-Next` does have more of an hallucination problem than the above models. It also shows some of the self-correction behavior you'd find in the thinking process of thinking models making the outputs a bit verbose.
View on Reddit #78502178

BetaOp9@reddit

To be honest, there are so many freaking models and variations of everything, it's a bit overwhelming to stay on top of them all let alone benchmark them. Then, performance aside, the actual usability and coherence can vary from task to task from user to user.
View on Reddit #78432799

StardockEngineer@reddit

https://www.instagram.com/reel/DUoPDfUjFud/?igsh=MzRlODBiNWFlZA==
View on Reddit #78484232

cleverusernametry@reddit

Yes. For your sanity sake, the rule out thumb is avoid shortcuts to the maximum possible extent. Use the biggest model, least quantized, non abliterated/pruned etc that can run at the slowest acceptable pace on your hardware. Defer to quality tokens rather than more tokens faster
View on Reddit #78478881

AcePilot01@reddit

this is why I just what ever the latest and greatest is, ignore the rest... then dont even look for a while til im bored and then I check for udpates again, it's just too much tbh, you can only have so many hobbies, less when they are fast paced.
View on Reddit #78458842

hieuphamduy@reddit

yeah, my thought as well! This is also how I view whenever I hear praises of the usability of low-quant of huge models, which got me hooked, and decided to download and try. Most of the time, the results seem decent on some simple test prompt. But as long as I put them to use on pretty technical, detail-driven task, they always fail miserably
View on Reddit #78435528

mouseofcatofschrodi@reddit

I tried it at q2\_K... No bueno: [https://www.reddit.com/r/LocalLLaMA/comments/1r4k79m/comment/o5c984o/](https://www.reddit.com/r/LocalLLaMA/comments/1r4k79m/comment/o5c984o/) The answers are worthless. But didn't get loops or nonsensical words, that's already something. I can't run any better quant to test.
View on Reddit #78432893

Significant_Fig_7581@reddit (OP)

Thank you 🙏
View on Reddit #78437416

mouseofcatofschrodi@reddit

Just as info, the Q3\_K\_M was already able to do the game flappy bird in one shot. Not out of the ordinary anymore, but a huge jump from the Q2 that was close to worthless
View on Reddit #78437740

Blizado@reddit

Q2_k is critical on longer answers, works best for shorter ones. I always try to use Imatrix Q4_K_M, it's always the best compromise at any model. Of course if you have enough VRAM and RAM (24+64 here).
View on Reddit #78451462

Significant_Fig_7581@reddit (OP)

Well I could run this one at Q4 but the internet here is super slow 😅
View on Reddit #78433135

Chromix_@reddit

Those are not imatrix quant yet. Aside from that the token and output weights are quantized a lot which can impact coding models. I'll wait for a quant that was created using imatrix and also has token and output weights at least on Q8. That's usually the case for Bartowski and Unsloth L / XL quants.
View on Reddit #78435900