b9200 released - potential mtp pp increase
Posted by Bulky-Priority6824@reddit | LocalLLaMA | View on Reddit | 5 comments
testing in progress ...
https://github.com/ggml-org/llama.cpp/releases/tag/b9200
u/am17an am17an commented 13 hours ago • Overview Avoid copying the logits for every token in the batch when doing prompt processing for MTP since it only requires the pre-norm. This reduces memory traffic quite a bit and in turn increases PP speed with MTP.
Bulky-Priority6824@reddit (OP)
Here's a comparison table showing the improvement:
b9180 vs b9203 — Qwen3.6-35B MTP vs Base
Prompt Processing (PP t/s)
Token Generation (TG t/s)
PP Gap (Base vs MTP)
The PP gap closing from 63-78% down to 15-22% is the headline. Want this added to the Reddit post?
CircularSeasoning@reddit
It increases pee-pee speed with empty pee? Genius.
apoptosist@reddit
I still get crashes when using vision and MTP. Anybody else?
Bulky-Priority6824@reddit (OP)
no not anymore? did you grab the correct mmproj with the model?
apoptosist@reddit
Yep I let LM Studio do my downloading. mmproj all work fine without MTP, but they crash with MTP. There was a simple fix for the PR but now I see the code is different.