KV cache fix for GLM 4.7 Flash
Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 73 comments
tl;dr: remove Air from GLM 4.7 Flash
KV cache uses a lot of VRAM. GLM 4.7 Flash doesn’t even use V in the KV cache. With long contexts, this means gigabytes of VRAM saved, so you can run much longer context on the same setup.
73 Comments
teachersecret@reddit
AfterAte@reddit
Kitchen-Year-8434@reddit
AfterAte@reddit
teachersecret@reddit
AfterAte@reddit
floppypancakes4u@reddit
__Maximum__@reddit
teachersecret@reddit
Front_Eagle739@reddit
Cool-Chemical-5629@reddit
jacek2023@reddit (OP)
Cool-Chemical-5629@reddit
insulaTropicalis@reddit
ResidentPositive4122@reddit
AfterAte@reddit
Eisenstein@reddit
jacek2023@reddit (OP)
Able_Ad1273@reddit
rashaniquah@reddit
jacek2023@reddit (OP)
MrWeirdoFace@reddit
crantob@reddit
-p-e-w-@reddit
teachersecret@reddit
-p-e-w-@reddit
teachersecret@reddit
gtek_engineer66@reddit
Objective_Mousse7216@reddit
ilintar@reddit
sleepingsysadmin@reddit
jacek2023@reddit (OP)
Hunting-Succcubus@reddit
jacek2023@reddit (OP)
Hunting-Succcubus@reddit
jacek2023@reddit (OP)
ilintar@reddit
markole@reddit
teachersecret@reddit
Aggressive-Bother470@reddit
teachersecret@reddit
mister2d@reddit
jacek2023@reddit (OP)
teachersecret@reddit
Alarming-Ad8154@reddit
LocoMod@reddit
LocoMod@reddit
alex_bit_@reddit
ladz@reddit
harrro@reddit
robiinn@reddit
harrro@reddit
__Maximum__@reddit
viperx7@reddit
jacek2023@reddit (OP)
nasone32@reddit
Odd-Ordinary-5922@reddit
GaboureySidibe@reddit
WithoutReason1729@reddit
jacek2023@reddit (OP)
mister2d@reddit
__Maximum__@reddit
jacek2023@reddit (OP)
Hunting-Succcubus@reddit
LagOps91@reddit
jacek2023@reddit (OP)
LagOps91@reddit
shing3232@reddit
jacek2023@reddit (OP)
viperx7@reddit
FluoroquinolonesKill@reddit
Deep_Traffic_7873@reddit
jacek2023@reddit (OP)