I tested Qwen3.6-27B, Qwen3.6-35B-A3B, Qwen3.5-27B and Gemma 4 on the same real architecture-writing task on an RTX 5090

Posted by Gazorpazorp1@reddit | LocalLLaMA | View on Reddit | 46 comments

I ran a pretty simple but revealing local-LLM test.

At first I was only going to post about the two Qwens and Gemma4 and go to bed, and what do you know, I go on reddit and see a post that Qwen 3.6-27B dropped. Oh well...

Models tested:

Context: I’m working on fairly complex tool that takes noisy evidence and turns it into a structured “truth report.”

I gave the same Hermes writing agent (“Scribe”) the same task:

take 2 architecture blueprint docs (v1 baseline + v2 expansion) describing the "truth engine" and produce a unified `Masterplan.md` explaining:

- what the product is

- the user problem

- UX/product shape

- UVP/moat

- pipeline

- agent roles

- architecture

- trust/legal/provenance posture

- what changed between plan V1 and V2

V1: \~16k tokens,

V2: \~4.6k tokens,

Combined: \~20.6k tokens

Then I ran the full workflow locally on my RTX 5090 all 4 models:

- **Gemma4**
- **Qwen3.6-35B**
- **Qwen3.5-27B**
- **Qwen3.6-27B**

To make it fair and push the models, each model got:

  1. initial draft

  2. second-pass revision

  3. final polish

Each stage was directed and reviewed by my GPT-5.4 agent Manny, so this wasn’t just “ask once and compare vibes.”

## What I/Manny scored

- **Clarity**

- **Completeness**

- **Discipline**

- **Usefulness**

## Final results

### Clarity

- Gemma4: **9.4**

- Qwen3.6-27B: **8.8**

- Qwen3.6-35B: **8.1**

- Qwen3.5-27B: **7.4**

**Winner: Gemma4** (at a cost, read further below)

Gemma was the best editor. Cleanest structure, best pacing, strongest restraint.

---

### Completeness

- Qwen3.6-35B: **9.6**

- Qwen3.5-27B: **9.1**

- Qwen3.6-27B: **8.7**

- Gemma4: **7.9**

**Winner: Qwen3.6-35B**

The 35B Qwen wrote the most exhaustive architecture doc by far. Best sourcebook, most implementation mass.

---

### Discipline

- Gemma4: **9.5**

- Qwen3.6-27B: **8.6**

- Qwen3.6-35B: **7.7**

- Qwen3.5-27B: **6.8**

**Winner: Gemma4**

Gemma best preserved the actual product identity

---

### Usefulness

- Qwen3.6-27B: **9.3**

- Qwen3.6-35B: **9.2**

- Gemma4: **8.9**

- Qwen3.5-27B: **8.8**

**Winner: Qwen3.6-27B**

This was the surprise. The 27B Qwen 3.6 ended up as the best **overall practical workhorse** — better balance of depth, readability, and usability than the others.

## Final ranking

1. **Qwen3.6-27B** — best all-around balance

  1. **Gemma4** — best editor / strategist

  2. **Qwen3.6-35B** — best exhaustive drafter

  3. **Qwen3.5-27B** — solid, but clearly behind the others for this task

1) Best overall balance

Qwen3.6-27B This is the new interesting winner.

It doesn’t beat Gemma4 on clarity or discipline.
It doesn’t beat Qwen3.6-35B on completeness.

But it wins the thing that matters most for a real working master plan: balance. It’s the best compromise between:

2) Best editor / best strategist

Gemma4 If the goal is:

Then Gemma still wins.

3) Best exhaustive architecture quarry

Qwen3.6-35B If the goal is:

Then Qwen3.6-35B is still the beast.

4) Fourth place

Qwen3.5-27B Not bad. Not embarrassing.
But now clearly behind both Qwen3.6 variants and Gemma for this kind of long-form architecture/planning task.

## Actual takeaway

This ended up being a really clean split:

- **Gemma4 = best editor**

- **Qwen3.6-35B = best expander**

- **Qwen3.6-27B = best practical default**

- **Qwen3.5-27B = respectable, but not the winner**

So if I were setting a default local writing worker for long-form architecture/master-plan work today, I’d probably choose:

**Qwen3.6-27B**

It’s the best compromise between:

- readability

- completeness

- structure

- practical usefulness

Personal Note re Gemma 4: It was drastically shorter than the Qwens for the final output

So while I do agree that less is often more, I found the Gemma4 output lacking in both technical depth and detail. Sure, it captured the core concepts, but I would position the output as more of a pitching deck or high level concept, technical details and concepts however are sorely missing.
On the other end of the spectrum is Qwen3.6-35B which delivered 5x the volume. That document could really serve as a technical blueprint and architecture implementation bible. Qwen3.5-27B produced even more but this was quantity over quality.
I would honestly have rated Gemma4 less favourably than Manny did, so make of that what you will.

For First-draft only performance, I’d rank them:

One-shot ranking

  1. Qwen3.6-27B
  2. Qwen3.6-35B
  3. Qwen3.5-27B
  4. Gemma4

Why

1) Qwen3.6-27B

Best balance right out of the gate:

This was the best raw first shot.

2) Qwen3.6-35B

Very strong one-shot draft, but more sprawling:

If you want maximum raw material, this one was a beast.

3) Qwen3.5-27B

Good first-draft generator, but sloppier:

Still useful, but clearly behind both 3.6 variants.

4) Gemma4

Gemma (arguably) won the final polished-document contest, but not the first-draft contest. Its one-shot behaviour was:

It needed the later revision passes to get more substance. Depending on the audience, this may be either good or bad.

Short version