Speculative decoding question, 665% speed increase

Posted by GodComplecs@reddit | LocalLLaMA | View on Reddit | 40 comments

Im using these settings in llama.cpp: --spec-type ngram-map-k --spec-ngram-size-n 24 --draft-min 12 --draft-max 48

Whats the real reason for lets say the prompt is for "minor changes in code", whats differing between models:
Gemma 4 31b: Doubles in tks gen so 100%
Qwen 3.6: Only 40% more speed
Devstrall small: 665% increase in speed (what?)