Did anyone compare this model to the full Qwen coder? it claims to give almost identical performance at 60B

[-]

DeProgrammer99@reddit

I tried it *just* for you. Using the same settings and prompt to compare Q6\_K of this to the last quant I tried, UD-Q6\_K\_XL, whose output had [4 compile errors](https://www.reddit.com/r/LocalLLaMA/comments/1r3weq3/comment/o581ehs/) (ignoring the ones I said I can't blame it for) on my "make an entire TypeScript minigame" vibe check... First of all, the model file is 28% smaller, making it a bit faster because more of it fits on my GPUs. Also keep in mind this GGUF is a static quant--no importance matrix. Like the original model was fond of doing, it kinda-cheated by leaving out all the imports and only defined a few instances of the main gameplay data (plant seeds, in this case) with "..." after the first three. And ignoring the one "function called on the wrong object" error because my spec doesn't say that `fullSave` belongs to `GameState` and not `City`, it crapped out 22 compile errors: \- 2 uninitialized properties \- 1 use of a nonexistent type (`Difficulty`, accounting for 4 errors) \- 5 undeclared properties (accounting for 8 errors) \- 2 undefined functions \- 4 objects passed into functions that can't accept that type (calls concentrated in one function) \- 1 complete duplicate assignment of a property (specifically, an event handler) in one object initializer \- 1 function called on the wrong object (the spec *does* say it's a `City` method for this one, but the generated code tried to call it on `uiManager`) Keep in mind this pruned version was intended to retain math and coding skills *specifically* (second paragraph of the [model card](https://huggingface.co/bknyaz/Qwen3-Coder-Next-REAM)), so the performance degradation should be more noticeable on other kinds of prompts. I conclude that this REAM version is *pretty* *bad* compared to the original model, at least for TypeScript.

Reply

[-]

ClimateBoss@reddit

can you compare MXFP4 to Q8 bruh ?

Reply

[-]

Significant_Fig_7581@reddit (OP)

Thank you so much!!!!

Reply

[-]

Mkengine@reddit

Also on swe-rebench it seems to do better than Qwen-Coder-480B: https://swe-rebench.com/

Reply

[-]

rainbyte@reddit

Great review, thanks! This is the kind of content that makes this community so good :)

Reply

[-]

nima3333@reddit

We should have more of these qualitative reviews, thank you.

Reply

[-]

Round_Mixture_7541@reddit

Damn this is the best review i've read so far. Kudos!

Reply

[-]

Sufficient-Rent6078@reddit

I'm indeed using the model since about a week (together with the [b7972](https://github.com/ggml-org/llama.cpp/releases/tag/b7972) llama.cpp release). I definitely prefer the `mradermacher/Qwen3-Coder-Next-REAM-GGUF:Q4_K_M` variant for coding with python over last years `Qwen3-Coder-30B-A3B-Instruct` \- it is aware of a number of relatively new language features that last years models never got right and gave satisfying answers in a light debugging session. On a dual `4090` system I still have about 3GB of VRAM headroom left on each card with `--ctx-size 120000` at 95 token/s. I have used `Qwen3-Coder-Next` a few times over API and definitely noticed a significant difference when trying to use it im my native language (German) - here the API model is already quite bad, but for the `REAM` model it generated multiple grammatical errors.

Reply

[-]

Achso998@reddit

How does the Ream model compare in coding to the normal model?

Reply

[-]

Sufficient-Rent6078@reddit

Hard to say, as I did not use the normal model that much. I find that `minimax-2.5`, `gemini-3-flash-preview`, `GLM-5` and `Kimi-K2.5` all sit in a more attractive price/performance spot when used via API so I don't have that much of a comparison. I have noticed (but can't tell so far if there are quantization/REAM specific differences), that `Qwen3-Coder-Next` does have more of an hallucination problem than the above models. It also shows some of the self-correction behavior you'd find in the thinking process of thinking models making the outputs a bit verbose.

Reply

[-]

BetaOp9@reddit

To be honest, there are so many freaking models and variations of everything, it's a bit overwhelming to stay on top of them all let alone benchmark them. Then, performance aside, the actual usability and coherence can vary from task to task from user to user.

Reply

[-]

StardockEngineer@reddit

https://www.instagram.com/reel/DUoPDfUjFud/?igsh=MzRlODBiNWFlZA==

Reply

[-]

cleverusernametry@reddit

Yes. For your sanity sake, the rule out thumb is avoid shortcuts to the maximum possible extent. Use the biggest model, least quantized, non abliterated/pruned etc that can run at the slowest acceptable pace on your hardware. Defer to quality tokens rather than more tokens faster

Reply

[-]

AcePilot01@reddit

this is why I just what ever the latest and greatest is, ignore the rest... then dont even look for a while til im bored and then I check for udpates again, it's just too much tbh, you can only have so many hobbies, less when they are fast paced.

Reply

[-]

hieuphamduy@reddit

yeah, my thought as well! This is also how I view whenever I hear praises of the usability of low-quant of huge models, which got me hooked, and decided to download and try. Most of the time, the results seem decent on some simple test prompt. But as long as I put them to use on pretty technical, detail-driven task, they always fail miserably

Reply

[-]

mouseofcatofschrodi@reddit

I tried it at q2\_K... No bueno: [https://www.reddit.com/r/LocalLLaMA/comments/1r4k79m/comment/o5c984o/](https://www.reddit.com/r/LocalLLaMA/comments/1r4k79m/comment/o5c984o/) The answers are worthless. But didn't get loops or nonsensical words, that's already something. I can't run any better quant to test.

Reply

[-]

Significant_Fig_7581@reddit (OP)

Thank you 🙏

Reply

[-]

mouseofcatofschrodi@reddit

Just as info, the Q3\_K\_M was already able to do the game flappy bird in one shot. Not out of the ordinary anymore, but a huge jump from the Q2 that was close to worthless

Reply

[-]

Blizado@reddit

Q2_k is critical on longer answers, works best for shorter ones. I always try to use Imatrix Q4_K_M, it's always the best compromise at any model. Of course if you have enough VRAM and RAM (24+64 here).

Reply

[-]

Significant_Fig_7581@reddit (OP)

Well I could run this one at Q4 but the internet here is super slow 😅

Reply

[-]

Chromix_@reddit

Those are not imatrix quant yet. Aside from that the token and output weights are quantized a lot which can impact coding models. I'll wait for a quant that was created using imatrix and also has token and output weights at least on Q8. That's usually the case for Bartowski and Unsloth L / XL quants.

Reply

Did anyone compare this model to the full Qwen coder? it claims to give almost identical performance at 60B

Reply to Post

21 Comments

DeProgrammer99@reddit

ClimateBoss@reddit

Significant_Fig_7581@reddit (OP)

Mkengine@reddit

rainbyte@reddit

nima3333@reddit

Round_Mixture_7541@reddit

Sufficient-Rent6078@reddit

Achso998@reddit

Sufficient-Rent6078@reddit

BetaOp9@reddit

StardockEngineer@reddit

cleverusernametry@reddit

AcePilot01@reddit

hieuphamduy@reddit

mouseofcatofschrodi@reddit

Significant_Fig_7581@reddit (OP)

mouseofcatofschrodi@reddit

Blizado@reddit

Significant_Fig_7581@reddit (OP)

Chromix_@reddit