Anthropic merges consecutive same-role messages, OpenAI doesn't (+4 tokens), anyone token-counted this on open-weight models?

Posted by dmpiergiacomo@reddit | LocalLLaMA | View on Reddit | 4 comments

I build context/harness optimization tooling, so provider-side serialization quirks actually matter to me. If you're optimizing over prompts, you need to know exactly what hits the model. Checked whether two consecutive user messages serialize the same as one joined message. Tested split vs joined, token-counting both:

split  = [{"role":"user","content":"Some text."},
          {"role":"user","content":"Some other text."}]
joined = [{"role":"user","content":"Some text.\nSome other text."}]

Results on the closed APIs:

**Claude Opus 4.7**: 21 vs 21 — delta 0
**Claude Haiku 4.5**: 15 vs 15 — delta 0
**gpt-4o**: 18 vs 14 — delta 4
**gpt-5.5**: 17 vs 13 — delta 4

Clean split by provider. Both Anthropic models merge consecutive same-role messages (token-identical to a \n join). Both OpenAI models don't: the +4 is the role-delimiter scaffold for a second turn, and the split form even nudges the model to treat the inputs as separate items (gpt-5.5 enumerates them "1." / "2.").

Here's where this sub comes in: with the closed APIs you can only observe this behavior, you can't see the rule. With open weights you can read it directly off the chat template in the tokenizer config — tokenizer.apply_chat_template() will show you exactly whether consecutive same-role messages get merged, error, or pass through, and with what separator. I'd bet there's real variation across Llama, Qwen, Mistral, Gemma, etc. since each ships its own template.

Has anyone here actually checked? Specifically:

Does apply_chat_template merge consecutive same-role messages, or emit a second role block?
Do any templates outright reject non-alternating roles?
For the mergers: what separator, and is it token-identical to a plain join like Anthropic's apparently is?

Would be genuinely useful to have a table of this across the common open-weight templates. Happy to share my test scripts for the API side if anyone wants to extend it to local.

[-]

llama-impersonator@reddit

like the other guy said... really feels like you spent absolutely no time looking at local chat templates, cuz many of them do outright reject consecutive roles. gemma 4 for sure.

NNN_Throwaway2@reddit

Is this written with AI?

The repos you're asking about are public, you can easily check this yourself in five minutes and let us know.

oof, you got fair pointed

dmpiergiacomo@reddit (OP)

Fair point, for open weights you're right, the template's right there. The closed APIs were the part I couldn't inspect directly, so I used the token-counting instead. The resulting asymmetry is what I find interesting. Should consecutive same-role handling be standardized across providers, or is per-provider behavior fine as long as it's documented? Curious if there is a consolidated general opinion.