Which Qwen models can do FIM (Fill in the middle) for autocompletion?
Posted by 0xbeda@reddit | LocalLLaMA | View on Reddit | 9 comments
I cannot find a definive answer. I think the following should be able to do FIM:
- Qwen 2.5 coder
- Qwen 3 coder
- Qwen 3 2507 refresh instruct models
- Qwen 3.5 instruct (intuition, please check)
- Qwen 3.6 instruct (intuition, please check)
What I verified:
- Qwen3-32B: no
- Qwen3-4B-Instruct-2507: yes
Tested with unsloth GGUFs in llama.cpp. All expose identical FIM PRE/SUF/MID metadata (shared tokenizer, IDs 151659–151664), so metadata proves nothing.
Is there any official statement?
Impossible_Art9151@reddit
we have FIM working here with qwen3.5-4b (replacing good old qwen2.5-coder). Havn't configured it by myself but can confirm it works ....
H3PO@reddit
Which ide and extension? I have the backend working and continue.dev gets autocomplete responses from vllm running Qwen3.6-35B-A3B but the extension doesn't show the suggestion.
u/DinoAmino Qwen3.5/3.6 seems to understand the fim tokens just fine, but it needs the "You are a code completion assistant" system prompt. I'm hacking the prompt into the template, since continue.dev doesn't support sending a system prompt to an autocomplete model
DinoAmino@reddit
Would love to know what was done to make this work, since it doesn't have any evidence of FIM support.
usrlocalben@reddit
There is an official statement here:
https://github.com/QwenLM/Qwen3-Coder?tab=readme-ov-file#fill-in-the-middle-with-qwen3-coder
usrlocalben@reddit
Cache is a must for real-time FIM, so I continue to use Qwen3-Coder-30B-A3B-Instruct since it is the latest model that doesn't have the less-desirable caching properties of linear/swa/delta-net.
synw_@reddit
Qwen 2.5 coder 1.7b q8 has served me well for fast autocomplete. I should probably try Qwen 3.5 2b to compare
kevin_1994@reddit
Hol up, I thought FIM support ended with qwen 2.5 coder and qwen 3 coder. You're telling me I should try the new qwen models?
Btw another series of models which works with FIM I can verify is granite 4, however the quality for code completions is terrible.
0xbeda@reddit (OP)
If you can run Qwen3-Code-30B-A3B, you should definitely try Qwen3.6-35B-A3B, at least for chat. (I can't tell completion performance yet.)
astyagun@reddit
I’ve tried qwen3-coder, it works. In llama.cpp+llama.vim.