When are we gonna get more 1-Bit models(Medium & Large size)?

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 9 comments

Obviously this thought came after recent Prism ML's Bonsai 8B model.

This thread seems honest feedback on Bonsai-8B model. Few mentioned that halluciation happened few times. Hope future 1-bit models come with more improvements.

There's recent thread on simulation for Qwen3.5 models. That looks awesome for tiny GPUs. I also mentioned the size ratio for medium-big-large models(on some other thread) which seems nice. Pasting the size ratio below.

(Parameters : Size in GB)

8 : 1.5 (Bonsai 8B)
30: 5.625
50: 9.375
70: 13.125
100: 18.75
120: 22.5 (Qwen3.5-122B, GLM-4.5-Air, Step-3.5-Flash, Devstral-2-123B, Mistral-Small-4-119B)
200: 37.5
250: 46.875 (MiniMax-M2.5, Qwen3-235B-A22B)
300: 56.25 (GLM-4.7, Qwen3.5-397B-A17B, MiMo-V2-Flash, Trinity-Large-Thinking)
400: 75 (Llama-3.1-405B, Qwen3-Coder-480B-A35B, Llama-4-Maverick-17B-128E)
500: 93.75 (LongCat-Flash-Chat)
600: 112.5 (DeepSeek-V3/R1, Mistral-Large-3-675B)
700: 131.25 (GLM-5, GigaChat3.1-702B-A36B)
1000: 187.5 (Kimi-K2.5, Ling-2.5-1T, Ring-2.5-1T)

Wouldn't be nice to have more 1-bit models in above sizes? Like I could run 50B models just with 8GB VRAM, 100B models just with 24GB VRAM, ..... which seems a miracle.

Our dude is cooking something for us. Hope we get some in future soon.

Qwen 3 8B. I’m cooking the 397B right now, since you guys have such an appetite for bitnets. - u/Party-Special-5177

Anyone else cooking something like this? Please share.

[-]

Silver-Champion-4846@reddit

1.58bit might be better for cpu

DistanceSolar1449@reddit

1.58bit is often stored as 2 bit so it’s half as efficient

If it's half as efficient, how much better is it? Is the tradeoff worth it or just no?

Serious-Log7550@reddit

Btw, LLAMA added 1-bit Q1 support for Vulkan in recent release https://github.com/ggml-org/llama.cpp/releases/tag/b8742

nrauhauser@reddit

The one bit models are intriguing. What is going to make them really go will be native support for what they need being included in CPU vector processing. If they take root we should be looking for additions to Intel's AVX, similar effort from AMD, and perhaps those functions appearing in an ARM chip, rather than as a separate coprocessor.

I dunno about mentally just linear scaling things ... one bit didn't arrive in a vacuum, it was half a step behind TurboQuant. Who knows what other amazing advances we'll see ... I expect one bit to make a bigger deal at the bottom end, rather than the larger models.