Why GLM on llama.cpp has no MTP?

Posted by Expensive-Paint-9490@reddit | LocalLLaMA | View on Reddit | 9 comments

I have searched through the repo discussions and PRs but I can't find references. GLM models have embedded layers for multi-token prediction and speculative decoding. They can be used with vLLM - if you have hundreds GB VRAM, of course. Does anybody know why llama.cpp chose to not support this feature?