b9200 released - potential mtp pp increase

Posted by Bulky-Priority6824@reddit | LocalLLaMA | View on Reddit | 5 comments

testing in progress ...

https://github.com/ggml-org/llama.cpp/releases/tag/b9200

u/am17an am17an commented 13 hours ago • Overview Avoid copying the logits for every token in the batch when doing prompt processing for MTP since it only requires the pre-norm. This reduces memory traffic quite a bit and in turn increases PP speed with MTP.