llama.cpp speculative checkpointing was merged

Posted by AdamDhahabi@reddit | LocalLLaMA | View on Reddit | 89 comments

https://github.com/ggml-org/llama.cpp/pull/19493

Some prompts get a speedup, others don't (in case of low acceptance streak).
For coding, I got some 10%\~15% speedup with these params:

--spec-type ngram-mod --spec-ngram-size-n 24 --draft-min 48 --draft-max 64