Anthropic merges consecutive same-role messages, OpenAI doesn't (+4 tokens), anyone token-counted this on open-weight models?

Posted by dmpiergiacomo@reddit | LocalLLaMA | View on Reddit | 4 comments

I build context/harness optimization tooling, so provider-side serialization quirks actually matter to me. If you're optimizing over prompts, you need to know exactly what hits the model. Checked whether two consecutive user messages serialize the same as one joined message. Tested split vs joined, token-counting both:

split  = [{"role":"user","content":"Some text."},
          {"role":"user","content":"Some other text."}]
joined = [{"role":"user","content":"Some text.\nSome other text."}]

Results on the closed APIs:

Clean split by provider. Both Anthropic models merge consecutive same-role messages (token-identical to a \n join). Both OpenAI models don't: the +4 is the role-delimiter scaffold for a second turn, and the split form even nudges the model to treat the inputs as separate items (gpt-5.5 enumerates them "1." / "2.").

Here's where this sub comes in: with the closed APIs you can only observe this behavior, you can't see the rule. With open weights you can read it directly off the chat template in the tokenizer config — tokenizer.apply_chat_template() will show you exactly whether consecutive same-role messages get merged, error, or pass through, and with what separator. I'd bet there's real variation across Llama, Qwen, Mistral, Gemma, etc. since each ships its own template.

Has anyone here actually checked? Specifically:

Would be genuinely useful to have a table of this across the common open-weight templates. Happy to share my test scripts for the API side if anyone wants to extend it to local.