server: fix checkpoints creation by jacekpoplawski · Pull Request #22929 · ggml-org/llama.cpp
Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 40 comments
Imagine you are using a local model for agentic coding. You discuss the idea (50k tokens), then say “implement it”. The agent reads files, writes files, runs commands, produces another 20k tokens and the code is ready. Then your next prompt is just “thank you”, and... nothing happens, you have to wait for "something".
What is happening is that some tools, like opencode, try to be smart and optimize the context. They modify something in the conversation history. In the best case, llama.cpp has to reprocess everything from that point. In the worst case, it has to reprocess the entire context (70k tokens) and you get “forcing full prompt re-processing...”
To avoid that, I switched from opencode to pi. Not because pi has some magical features, but because it does not do that kind of context rewriting.
Another issue is the model being smart by removing reasoning from the context. In the best case, llama.cpp only has to reprocess the last run (20k tokens). In the worst case, again, it has to reprocess everything (70k)
To avoid that, you can enable “preserve thinking”, at least with Qwen 3.6.
The goal of this PR is to avoid the worst case (full prompt re-processing) and get closer to the best case, where llama.cpp only reprocesses what actually changed. I have been using this code for about two weeks and in my opinion agentic coding is now more responsive.
40 Comments
tomobobo@reddit
DistanceSolar1449@reddit
k-u-got-me@reddit
mr_Owner@reddit
k-u-got-me@reddit
MuDotGen@reddit
Formal-Exam-8767@reddit
Imaginary-Unit-3267@reddit
jacek2023@reddit (OP)
Napster3301@reddit
Confident_Ideal_5385@reddit
crantob@reddit
Imaginary-Unit-3267@reddit
Xera1@reddit
jacek2023@reddit (OP)
farkinga@reddit
sammcj@reddit
FiLo420blazeit@reddit
New_Spray_7886@reddit
Several-Tax31@reddit
am17an@reddit
jacek2023@reddit (OP)
LegacyRemaster@reddit
ilintar@reddit
jacek2023@reddit (OP)
PaceZealousideal6091@reddit
YetAnotherAnonymoose@reddit
Anbeeld@reddit
YetAnotherAnonymoose@reddit
ImpossibleHot@reddit
joost00719@reddit
Existing_Bet_350@reddit
cleversmoke@reddit
NickCanCode@reddit
pmttyji@reddit
Unlucky-Message8866@reddit
jacek2023@reddit (OP)
ex-arman68@reddit
Kodix@reddit
RMK137@reddit