Context checkpoint erasure in llama.cpp ?

Posted by SimilarWarthog8393@reddit | LocalLLaMA | View on Reddit | 7 comments

Has anyone been able to solve or mitigate context checkpoints being erased during single user inference, specifically when function calling is part of the chat history? I've been using Qwen 3.5 35B A3B for some time (now using 3.6), tested in Cherry Studio & Open WebUI, and in all instances in the same chat session between prompts there are always checkpoints being erased. Is this because tool call content is not being passed back? I thought it could also be the CoT content not being preserved but even with preserve_thinking: true for Qwen 3.6 I get the same issue.

I use 128 checkpoints and 16GiB cache RAM so I'm not running out of checkpoints or RAM. Suggestions would be appreciated (: