Qwen3.6 GGUF is so good for debugging.
Posted by _BigBackClock@reddit | LocalLLaMA | View on Reddit | 19 comments
using unsloth dynamic quant on 16GB vram + 32GB dram. 200k q8_0 kv cache (context window)
_BigBackClock@reddit (OP)
UPDATE - configured ik_llama with proper cpu offloading, cpu kv cache; now getting 38.98 tok/s (249.71 tok/s in, 18.83 tok/s out).
spvn@reddit
which quant? Are you using Ik llama with such low VRAM?
sagiroth@reddit
What's the diff between llama.cpp?
volleyneo@reddit
Il llama better for gpu cpu setups, as 16gb vram is not enough for that model
R_Duncan@reddit
The advantage against using iso4 kv cache in llama-cpo-turboquant reduces if you don't have a recent cpu
_BigBackClock@reddit (OP)
UD Q4 K M and no, I’m not using ik llama
_BigBackClock@reddit (OP)
will try it tonight
DarthLoki79@reddit
What are you using to serve this? llama.cpp?
_BigBackClock@reddit (OP)
yes
metover@reddit
which os is this, I like your top statusbar
_BigBackClock@reddit (OP)
fedora hyprland. the top bar is waybar
CardinalRedwood@reddit
Context? Tokens per second? Cool!
_BigBackClock@reddit (OP)
200k q8_0 context windows, it’s pretty slow tho 10-12tok/s
9r4n4y@reddit
3.6 is really damn good :)
viperx7@reddit
Just waiting for the 27B
KAPMODA@reddit
Need more pixels please
_BigBackClock@reddit (OP)
Zoom in, it does have all the pixels
SM8085@reddit
Opencode 1.4.10 is out now, by the way. In another thread we explored that it only auto updates the patch releases, ie. the third number.
_BigBackClock@reddit (OP)
will update, thanks