When are we gonna get more 1-Bit models(Medium & Large size)?
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 9 comments
Obviously this thought came after recent Prism ML's Bonsai 8B model.
This thread seems honest feedback on Bonsai-8B model. Few mentioned that halluciation happened few times. Hope future 1-bit models come with more improvements.
There's recent thread on simulation for Qwen3.5 models. That looks awesome for tiny GPUs. I also mentioned the size ratio for medium-big-large models(on some other thread) which seems nice. Pasting the size ratio below.
(Parameters : Size in GB)
- 8 : 1.5 (Bonsai 8B)
- 30: 5.625
- 50: 9.375
- 70: 13.125
- 100: 18.75
- 120: 22.5 (Qwen3.5-122B, GLM-4.5-Air, Step-3.5-Flash, Devstral-2-123B, Mistral-Small-4-119B)
- 200: 37.5
- 250: 46.875 (MiniMax-M2.5, Qwen3-235B-A22B)
- 300: 56.25 (GLM-4.7, Qwen3.5-397B-A17B, MiMo-V2-Flash, Trinity-Large-Thinking)
- 400: 75 (Llama-3.1-405B, Qwen3-Coder-480B-A35B, Llama-4-Maverick-17B-128E)
- 500: 93.75 (LongCat-Flash-Chat)
- 600: 112.5 (DeepSeek-V3/R1, Mistral-Large-3-675B)
- 700: 131.25 (GLM-5, GigaChat3.1-702B-A36B)
- 1000: 187.5 (Kimi-K2.5, Ling-2.5-1T, Ring-2.5-1T)
Wouldn't be nice to have more 1-bit models in above sizes? Like I could run 50B models just with 8GB VRAM, 100B models just with 24GB VRAM, ..... which seems a miracle.
Our dude is cooking something for us. Hope we get some in future soon.
Qwen 3 8B. I’m cooking the 397B right now, since you guys have such an appetite for bitnets. - u/Party-Special-5177
Anyone else cooking something like this? Please share.
Silver-Champion-4846@reddit
1.58bit might be better for cpu
DistanceSolar1449@reddit
1.58bit is often stored as 2 bit so it’s half as efficient
Silver-Champion-4846@reddit
If it's half as efficient, how much better is it? Is the tradeoff worth it or just no?
Serious-Log7550@reddit
Btw, LLAMA added 1-bit Q1 support for Vulkan in recent release https://github.com/ggml-org/llama.cpp/releases/tag/b8742
nrauhauser@reddit
The one bit models are intriguing. What is going to make them really go will be native support for what they need being included in CPU vector processing. If they take root we should be looking for additions to Intel's AVX, similar effort from AMD, and perhaps those functions appearing in an ARM chip, rather than as a separate coprocessor.
I dunno about mentally just linear scaling things ... one bit didn't arrive in a vacuum, it was half a step behind TurboQuant. Who knows what other amazing advances we'll see ... I expect one bit to make a bigger deal at the bottom end, rather than the larger models.
silentus8378@reddit
This the dream.
EffectiveCeilingFan@reddit
Hard to say. Bonsai considers the technology their own proprietary intellectual property, and even gives the models a special name instead of just naming them as quantized Qwen3-8B, like everyone else does.
Though, I have a sneaking suspicion that it is just a slightly modified Microsoft BitNet and is actually incredibly simple.
LagOps91@reddit
i doubt it will be possible to do it for MoE models - they are harder to (re)-train. getting decent results at all for qwen3 8b is already quite impressive.
Silver-Champion-4846@reddit
Isn't it some qwen3-8b?