StepFun 3.5 MTP by pwilkin · Pull Request #23274 · ggml-org/llama.cpp

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 11 comments

so we have StepFun MTP before Gemma MTP :)

[-]

XccesSv2@reddit

I love the llama.cpp devs. Does this also work for 3.7?

[-]

Yep i downloaded the Q8 from here https://huggingface.co/notSnix/Step-3.7-Flash-Q4_K_M-MTP-GGUF and used it with unsloth Quants via --model-draft Step-3.7-Flash-MTP-Q8_0.gguf --spec-type draft-mtp --spec-draft-n-max 2 --spec-draft-p-min 0.60

So yes, it works, it feels very nice

[-]

ilintar@reddit

Sweet!

[-]

jacek2023@reddit (OP)

I hope it will work with both 3.5 and 3.7 because I prefered 3.5 in my local tests

[-]

dampflokfreund@reddit

Has Gemma 4 support been added?

[-]

jacek2023@reddit (OP)

link is in the post

[-]

mr_zerolith@reddit

Nice, any idea of what the additional ram requirement is?

[-]

pmttyji@reddit

Nice.

Dumb question : Does this requires new GGUF or existing one is fine to play with? Qwen3.6 models required new GGUFs so asked this question.

[-]

Then-Topic8766@reddit

It seams so. I build now and tried with StepFun gguf. It throws an error 'llama_init_from_model: context type MTP requested but model doesn't contain MTP layers'

[-]

Due_Net_3342@reddit

from what i get from the discussion this is more like a light version of the MTP implementation, with a proper one coming from stepfun?