APEX MoE quants update: 25+ new models since the Qwen 3.5 post + new I-Nano tier

Posted by mudler_it@reddit | LocalLLaMA | View on Reddit | 35 comments

Quick follow-up on APEX, the MoE-aware mixed-precision quant strategy. The original post was just about Qwen 3.5 35B-A3B ( https://www.reddit.com/r/LocalLLaMA/comments/1s9vzry/apex_moe_quantized_models_boost_with_33_faster/ ); since then the collection has grown to 30+ MoEs across most major families. Plus a new ultra-compressed tier landed.

Feedback so far

The reports coming back have been honestly better than I expected!

Thanks to everyone reporting back, that's what justifies pushing further on the low-bit tiers below.

Models added since the first post

Grouped by family, most are 30-70B-class MoEs that fit one consumer GPU at I-Mini/I-Compact:

Qwen lineage

Frontier-size MoEs (rented Blackwell to quantize)

Hybrid Mamba / SSM MoEs

Gemma 4 family

Community MoE merges

New tier: I-Nano (IQ2_XXS)

Pushes mid-layer routed experts down to 2.06 bpw, near-edge to IQ2_S, edges to Q3_K, shared experts at Q5_K. About 20% smaller than I-Mini, viable only on MoE thanks to sparse per-token expert activation. Requires imatrix.

Examples:

Links

If you've used APEX quants and have feedback, comments welcome!