GLM 4.7 Flash: Huge performance improvement with -kvu
Posted by TokenRingAI@reddit | LocalLLaMA | View on Reddit | 72 comments
TLDR; Try passing -kvu to llama.cpp when running GLM 4.7 Flash.
On RTX 6000, my tokens per second on a 8K token output rose from 17.7t/s to 100t/s
Also, check out the one shot zelda game it made, pretty good for a 30B:
[https://talented-fox-j27z.pagedrop.io](https://talented-fox-j27z.pagedrop.io)
72 Comments
DreamingInManhattan@reddit
PhilosophyEuphoric58@reddit
Synor@reddit
Aggressive_Arm9817@reddit
TokenRingAI@reddit (OP)
teachersecret@reddit
TokenRingAI@reddit (OP)
BuenosAir@reddit
teachersecret@reddit
TokenRingAI@reddit (OP)
Aggressive_Arm9817@reddit
TokenRingAI@reddit (OP)
zoyer2@reddit
ikkiyikki@reddit
Far-Low-4705@reddit
TokenRingAI@reddit (OP)
TokenRingAI@reddit (OP)
Mashiro-no@reddit
epyctime@reddit
fancyawesome@reddit
TokenRingAI@reddit (OP)
fancyawesome@reddit
TokenRingAI@reddit (OP)
fancyawesome@reddit
TokenRingAI@reddit (OP)
viperx7@reddit
fancyawesome@reddit
jacek2023@reddit
Cool-Chemical-5629@reddit
mycall@reddit
exceptioncause@reddit
jacek2023@reddit
Cool-Chemical-5629@reddit
No_Afternoon_4260@reddit
Cool-Chemical-5629@reddit
No_Afternoon_4260@reddit
AdInternational5848@reddit
jacek2023@reddit
No_Afternoon_4260@reddit
jacek2023@reddit
No_Afternoon_4260@reddit
jacek2023@reddit
No_Afternoon_4260@reddit
jacek2023@reddit
kaisurniwurer@reddit
Cool-Chemical-5629@reddit
pmttyji@reddit
Mean-Sprinkles3157@reddit
FluoroquinolonesKill@reddit
simracerman@reddit
SheepherderBeef8956@reddit
simracerman@reddit
__Maximum__@reddit
markole@reddit
17hoehbr@reddit
qwen_next_gguf_when@reddit
TokenRingAI@reddit (OP)
ClimateBoss@reddit
StardockEngineer@reddit
SectionCrazy5107@reddit
TokenRingAI@reddit (OP)
ClimateBoss@reddit
lmpdev@reddit
teachersecret@reddit
TokenRingAI@reddit (OP)
teachersecret@reddit
TokenRingAI@reddit (OP)
teachersecret@reddit
fractal_engineer@reddit
TokenRingAI@reddit (OP)
ethereal_intellect@reddit
Friendly-Pause3521@reddit