Qwen3.6-27B-FP8 - JS file is too long and causing JSON truncation

Posted by poobear_74@reddit | LocalLLaMA | View on Reddit | 8 comments

Apologies in advance, if this is a newbie question. When running Qwen3.6-27B-FP8 using the below command on an Nvidia RTX PRO 5000, in opencode, I am seeing errors such as: "The issue is that the JS file is too long and causing JSON truncation. Let me split it into multiple files.", "The file is too long for the write tool. Let me use bash to write it instead.", "The heredoc approach is also failing because the content is too long and getting truncated. ", "The base64 approach works but it's tedious. Let me try a Python approach instead", "Let me take a different approach — write a Python script that generates the JS file, then run it.".

vllm serve Qwen/Qwen3.6-27B-FP8   --host 0.0.0.0 --port 8000   --max-model-len 65536   --download-dir /workspace/models   --enable-auto-tool-choice --tool-call-parser qwen3_xml   --max-num-seqs 4 --enable-prefix-caching --enable-chunked-prefill   --max-num-batched-tokens 16384 --trust-remote-codevllm serve Qwen/Qwen3.6-27B-FP8   --host 0.0.0.0 --port 8000   --max-model-len 65536   --download-dir /workspace/models   --enable-auto-tool-choice --tool-call-parser qwen3_xml   --max-num-seqs 4 --enable-prefix-caching --enable-chunked-prefill   --max-num-batched-tokens 16384 --trust-remote-code

When I change tool-call-parser to qwen3_parser, I get a whole lot of different errors:

⚙ invalid [tool=write, error=Invalid input for tool write: JSON parsing failed: Text: {"filePath": "/tmp/spaceinvaders/index.html".

⚙ invalid [tool=write, error=Invalid input for tool write: JSON parsing failed: Text: { "content": "

I'd appreciate guidance.

[-]

poobear_74@reddit (OP)

I am going to answer my own question. To fix tool calling, it helps to specify this Qwen 3.5/3.6 chat template when vllm is started. Download the chat template from https://huggingface.co/froggeric/Qwen-Fixed-Chat-Templates and add it to your vllm startup params like so:

vllm serve Qwen/Qwen3.6-27B-FP8   --host 0.0.0.0   --port 8000   --dtype auto   --kv-cache-dtype fp8_e4m3   --max-model-len 65536   --max-num-seqs 2   --enable-prefix-caching   --enable-auto-tool-choice   --tool-call-parser qwen3_xml   --reasoning-parser qwen3   --default-chat-template-kwargs '{"enable_thinking": true, "preserve_thinking": true}'   --enable-chunked-prefill   --gpu-memory-utilization 0.90  --chat-template chat_template.jinja vllm serve Qwen/Qwen3.6-27B-FP8   --host 0.0.0.0   --port 8000   --dtype auto   --kv-cache-dtype fp8_e4m3   --max-model-len 65536   --max-num-seqs 2   --enable-prefix-caching   --enable-auto-tool-choice   --tool-call-parser qwen3_xml   --reasoning-parser qwen3   --default-chat-template-kwargs '{"enable_thinking": true, "preserve_thinking": true}'   --enable-chunked-prefill   --gpu-memory-utilization 0.90  --chat-template chat_template.jinja

[-]

Slacker1540@reddit

I found this nearly always fixed it, but once and a while I still get the error.

[-]

poobear_74@reddit (OP)

Even with the chat template, I still can't it to work reliably. I've spent several days on it now. Error message: Lots of errors like this "JSON Parse error: Unterminated string] Thinking: The tool is having issues with the large file content. Let me write it in smaller pieces using the task tool to delegate this work." Clearly, tooling is broken. What a pity.

[-]

Slacker1540@reddit

Same issue here, though it's less common than without the template it makes the experience frustrating.

[-]

NNN_Throwaway2@reddit

Its qwen3_coder, not qwen3_parser. That's the problem.

[-]

poobear_74@reddit (OP)

Sorry, there is a clearly typo there. I would have used qwen3_coder, otherwise vllm wouldn't have started?

[-]

poobear_74@reddit (OP)

I can confirm that when setting tool-call-parser to qwen3_coder, I am still getting "⚙ invalid [tool=write, error=Invalid input for tool write: JSON parsing failed:

[-]

Sticking_to_Decaf@reddit

Oddly, I found qwen3_coder works better than qwen3_xml. I reverted to qwen3_coder and tool calling works much better