GLM-4.5-Air llama.cpp experiences?
Posted by DorphinPack@reddit | LocalLLaMA | View on Reddit | 41 comments
ik_llama.cpp too! I’d love to hear how people are running it (hardware, CLI flags, use case, etc.)
Bandwidth constraints and having a single 3090 are giving me a bit of analysis paralysis choosing a quant to start. I’m a patient hybrid inference gal, as long as it’s not seconds per token 😂. Workload is usually long context document work and coding (still looking a local Roo/aider to go steady with).
From what I’ve seen ~70GB for Q4 would be a good fit with typical the MoE CPU/GPU setup as I have >70GB of RAM to play with. I’m afraid to go too low with so few active parameters — or is that guiding principal more bound to total parameters?
I’m surprised I haven’t seen more yet but with gpt-oss dropping the morning the GLM GGUFs did I get why.
41 Comments
kironlau@reddit
DorphinPack@reddit (OP)
Paradigmind@reddit
DorphinPack@reddit (OP)
Paradigmind@reddit
DorphinPack@reddit (OP)
Paradigmind@reddit
CryptoCryst828282@reddit
DorphinPack@reddit (OP)
DorphinPack@reddit (OP)
Paradigmind@reddit
DorphinPack@reddit (OP)
DorphinPack@reddit (OP)
Paradigmind@reddit
DorphinPack@reddit (OP)
Paradigmind@reddit
Paradigmind@reddit
DorphinPack@reddit (OP)
DorphinPack@reddit (OP)
kironlau@reddit
DorphinPack@reddit (OP)
kironlau@reddit
DorphinPack@reddit (OP)
kironlau@reddit
DorphinPack@reddit (OP)
kironlau@reddit
DorphinPack@reddit (OP)
kironlau@reddit
DorphinPack@reddit (OP)
kironlau@reddit
DorphinPack@reddit (OP)
No-Statement-0001@reddit
Glittering-Call8746@reddit
relmny@reddit
Dundell@reddit
DorphinPack@reddit (OP)
a_postgres_situation@reddit
Dundell@reddit
DorphinPack@reddit (OP)
MatterMean5176@reddit
DorphinPack@reddit (OP)