Qwen3.6-27B-Q6_K - images

Posted by Usual-Carrot6352@reddit | LocalLLaMA | View on Reddit | 39 comments

Settings:

temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0

Prompts:

- Create svg image of a capybara wearing a kimono drinking matcha tea

- Create svg image of a steampunk owl repairing a pocket watch

- Create svg image of a flamingo knitting a colorful sweater

- Create svg image of a sushi roll wearing sunglasses driving a go-kart

- Create svg image of a Victorian-era robot reading a newspaper in a cafe

- Create a svg image of a time-lapse composite showing a flower blooming, wilting, and transforming into butterflies across four seasons, all in one frame with seasonal lighting

Stats:

3min 10s, 27.55 t/s

4min 35s, 27.05 t/s

3min 20s, 27.55 t/s

7min 2s, 27.27 t/s

7min 23s, 27.19 t/s

8min 24s, 27.13 t/s

[-]

balerion20@reddit

In the current situation stats doesn’t matter much because you didn’t gave us your hardware, context size, framework… unless I didn’t see it

[-]

Usual-Carrot6352@reddit (OP)

24gb vram, i think it was 37 t/s in Open WebUI.

[-]

mission_tiefsee@reddit

Q6 quant on 24gb vram? context must be close to 0. Do you run it with llamacpp? Can you share your startup flags?

[-]

Usual-Carrot6352@reddit (OP)

Yes i was trying to find the sweet spot using llamacpp and Ik_llamacpp
./build/bin/llama-server \

-m \~/.lmstudio/models/lmstudio-community/Qwen3.6-27B-GGUF/Qwen3.6-27B-Q6_K.gguf \

-c 196608 -ngl 99 --no-mmap \

-fa on -ctk q8_0 -ctv q8_0 -np 1 \

-b 4096 -ub 1024 -t 8 -tb 16 \

--prio 2 --prio-batch 2 \

--temp 0.6 --top-p 0.95 --top-k 20 --min-p 0.0 \

--presence-penalty 0.0 --repeat-penalty 1.0 \

--chat-template-kwargs '{"enable_thinking":true}' \

--host 127.0.0.1 --port 8081

But I use now Hermes and these are my current setups as i just installed yesterday:
Good quality:
./build/bin/llama-server \

-m \~/.lmstudio/models/lmstudio-community/Qwen3.6-27B-GGUF/Abiray-Qwen3.6-27B-NVFP4.gguf \

-ngl 99 -c 131072 -t 8 \

--cache-type-k q8_0 \

--cache-type-v q8_0 \

-fa on \

--host 0.0.0.0 --port 8080 \

-np 2

Longest context length:
./build/bin/llama-server \

-m \~/.lmstudio/models/lmstudio-community/Qwen3.6-27B-GGUF/Abiray-Qwen3.6-27B-NVFP4.gguf \

-ngl 99 -c 262144 -t 8 \

--cache-type-k q4_0 --cache-type-v q4_0 \

-fa on \

--host 0.0.0.0 --port 8080 \

-np 2

[-]

zaidorx@reddit

a snake trying to solve the Y2K problem on a computer.
Qwen3.6-27B-FP8
It took a few minutes, but I did not measure it.

[-]

I'm very disappointed in this for Qwen 3.6 35B A3B.
Q8 Quant, BF16 KV, 655,360 ctx (rope scale 2.5 x 262,144) on 4x RTX 3090
Temp = 1.0, Top P = 0.95, Top K = 40, Min P = 0.01, Presence Penalty 1.5, Repeat Penalty 1.0
My prompt... 😏😉😂

[-]

ElementNumber6@reddit

Oh dear lord. Why would anyone force an LLM to even imagine such a thing?

[-]

DonkeyBonked@reddit

You should have seen some of the others, they were... disturbing, to say the least.

[-]

Ok-Importance-3529@reddit

Autoround quant : Qwen3.6-27B-Q2_K_MIXED.gguf

https://huggingface.co/sphaela/Qwen3.6-27B-AutoRound-GGUF

[-]

Ok-Importance-3529@reddit

Second try:

this one surprised me due to small details like byke chain

[-]

Ok-Importance-3529@reddit

Same autoround but Q5_0 quant

[-]

Ok-Importance-3529@reddit

second try same model

[-]

Ok-Importance-3529@reddit

using recommended inference settings for tool calling from qwen temp 0.6 etc...

im not an expert but both seem better than provided Q6_K, also i would like to mention that autoround quants are more reliable and well behaved than any other llama.cpp quants i used, i dont know why but the quantization method seems to preserve original model better from my point of view, its based on my agentic use evaluation through real coding sessions, models are really reliable.

[-]

uti24@reddit

I have notice, that svg is benchmazed somehow. I mena, probably llm's just learned on existing svg images that are text.

I think it's more interesting test to ask models to draw something on the canvas, using js.

[-]

ElementNumber6@reddit

Then, to test, we can assemble 5 or so groups of ~100 terms, and choose from each randomly for every new release.

[-]

ZealousidealBadger47@reddit

10.71 tok/sec

Qwen 3.5 122b-a10b

[-]

Dany0@reddit

I have a feeling this fall more into general tasks and should be tested with the 1.0 temp thinking config?

[-]

BigYoSpeck@reddit

1.5 is where it gets interesting

[-]

foldl-li@reddit

I like its eyes.

[-]

m360842@reddit

Took your prompts but I replaced "svg image" by "animated svg".

[-]

edsonmedina@reddit

Just tried it now! Pretty good!

[-]

Fusseldieb@reddit

Things like this give me hope. If a Q6 27B model gives that level of performance, one could probably train a 70B (or larger) 1.58Bit model that fits on the very same hardware and get ACTUAL Sonnet level performance or even better.

No, I'm not talking about Q2, BitNet or similar quantized models, which are "kinda" bad, but this.

[-]

edsonmedina@reddit

Qwen3.6 35B A3B Q8_K_XL
5m11s

[-]

edsonmedina@reddit

Qwen3.6 35B A3B Q8_K_XL
2m8s

[-]

szansky@reddit

how about turbo realistic?

[-]

lit1337@reddit

these are endearing lol grandmas woulda emailed the hell outa these in the early 2000s.

[-]

redballooon@reddit

Looks cartoonish

[-]

davernow@reddit

They are SVGs. Literally line drawings, not rasters.

[-]

BigYoSpeck@reddit

Interestingly if you screenshot and post the image of these back to Qwen it says:

Looking at the two images provided, I can see they are simplified/cartoony versions of what I originally generated. The SVG code I provided was quite detailed, but the rendered images appear to be much simpler - possibly the SVG was rendered at a lower quality or simplified by some process

I'm going to wager you're responding to a bot

[-]

Kappa-chino@reddit

Yeah more deeply than that cos I think above probably doesn't understand what an SVG is, these images were drawn with code, as in, each line in the image is expressed as a mathematical function that the model wrote out. It would take you like a week to make one of these coding by hand

[-]