Qwen3.6-27B-Q6_K - images
Posted by Usual-Carrot6352@reddit | LocalLLaMA | View on Reddit | 39 comments
Settings:
temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0
Prompts:
- Create svg image of a capybara wearing a kimono drinking matcha tea
- Create svg image of a steampunk owl repairing a pocket watch
- Create svg image of a flamingo knitting a colorful sweater
- Create svg image of a sushi roll wearing sunglasses driving a go-kart
- Create svg image of a Victorian-era robot reading a newspaper in a cafe
- Create a svg image of a time-lapse composite showing a flower blooming, wilting, and transforming into butterflies across four seasons, all in one frame with seasonal lighting
Stats:
3min 10s, 27.55 t/s
4min 35s, 27.05 t/s
3min 20s, 27.55 t/s
7min 2s, 27.27 t/s
7min 23s, 27.19 t/s
8min 24s, 27.13 t/s
balerion20@reddit
In the current situation stats doesn’t matter much because you didn’t gave us your hardware, context size, framework… unless I didn’t see it
Usual-Carrot6352@reddit (OP)
24gb vram, i think it was 37 t/s in Open WebUI.
mission_tiefsee@reddit
Q6 quant on 24gb vram? context must be close to 0. Do you run it with llamacpp? Can you share your startup flags?
Usual-Carrot6352@reddit (OP)
Yes i was trying to find the sweet spot using llamacpp and Ik_llamacpp
./build/bin/llama-server \
-m \~/.lmstudio/models/lmstudio-community/Qwen3.6-27B-GGUF/Qwen3.6-27B-Q6_K.gguf \
-c 196608 -ngl 99 --no-mmap \
-fa on -ctk q8_0 -ctv q8_0 -np 1 \
-b 4096 -ub 1024 -t 8 -tb 16 \
--prio 2 --prio-batch 2 \
--temp 0.6 --top-p 0.95 --top-k 20 --min-p 0.0 \
--presence-penalty 0.0 --repeat-penalty 1.0 \
--chat-template-kwargs '{"enable_thinking":true}' \
--host 127.0.0.1 --port 8081
But I use now Hermes and these are my current setups as i just installed yesterday:
Good quality:
./build/bin/llama-server \
-m \~/.lmstudio/models/lmstudio-community/Qwen3.6-27B-GGUF/Abiray-Qwen3.6-27B-NVFP4.gguf \
-ngl 99 -c 131072 -t 8 \
--cache-type-k q8_0 \
--cache-type-v q8_0 \
-fa on \
--host 0.0.0.0 --port 8080 \
-np 2
Longest context length:
./build/bin/llama-server \
-m \~/.lmstudio/models/lmstudio-community/Qwen3.6-27B-GGUF/Abiray-Qwen3.6-27B-NVFP4.gguf \
-ngl 99 -c 262144 -t 8 \
--cache-type-k q4_0 --cache-type-v q4_0 \
-fa on \
--host 0.0.0.0 --port 8080 \
-np 2
mister2d@reddit
lol
zaidorx@reddit
a snake trying to solve the Y2K problem on a computer.
Qwen3.6-27B-FP8
It took a few minutes, but I did not measure it.
draconic_tongue@reddit
this is cute
genpfault@reddit
DonkeyBonked@reddit
I'm very disappointed in this for Qwen 3.6 35B A3B.
Q8 Quant, BF16 KV, 655,360 ctx (rope scale 2.5 x 262,144) on 4x RTX 3090
Temp = 1.0, Top P = 0.95, Top K = 40, Min P = 0.01, Presence Penalty 1.5, Repeat Penalty 1.0
My prompt... 😏😉😂
ElementNumber6@reddit
Oh dear lord. Why would anyone force an LLM to even imagine such a thing?
DonkeyBonked@reddit
You should have seen some of the others, they were... disturbing, to say the least.
Ok-Importance-3529@reddit
Autoround quant : Qwen3.6-27B-Q2_K_MIXED.gguf
https://huggingface.co/sphaela/Qwen3.6-27B-AutoRound-GGUF
Ok-Importance-3529@reddit
Second try:
this one surprised me due to small details like byke chain
Ok-Importance-3529@reddit
Same autoround but Q5_0 quant
Ok-Importance-3529@reddit
second try same model
Ok-Importance-3529@reddit
using recommended inference settings for tool calling from qwen temp 0.6 etc...
im not an expert but both seem better than provided Q6_K, also i would like to mention that autoround quants are more reliable and well behaved than any other llama.cpp quants i used, i dont know why but the quantization method seems to preserve original model better from my point of view, its based on my agentic use evaluation through real coding sessions, models are really reliable.
uti24@reddit
I have notice, that svg is benchmazed somehow. I mena, probably llm's just learned on existing svg images that are text.
I think it's more interesting test to ask models to draw something on the canvas, using js.
ElementNumber6@reddit
Then, to test, we can assemble 5 or so groups of ~100 terms, and choose from each randomly for every new release.
ZealousidealBadger47@reddit
10.71 tok/sec
Qwen 3.5 122b-a10b
Dany0@reddit
I have a feeling this fall more into general tasks and should be tested with the 1.0 temp thinking config?
BigYoSpeck@reddit
1.5 is where it gets interesting
foldl-li@reddit
I like its eyes.
m360842@reddit
Took your prompts but I replaced "svg image" by "animated svg".
edsonmedina@reddit
Just tried it now! Pretty good!
Fusseldieb@reddit
Things like this give me hope. If a Q6 27B model gives that level of performance, one could probably train a 70B (or larger) 1.58Bit model that fits on the very same hardware and get ACTUAL Sonnet level performance or even better.
No, I'm not talking about Q2, BitNet or similar quantized models, which are "kinda" bad, but this.
edsonmedina@reddit
Qwen3.6 35B A3B Q8_K_XL
5m11s
edsonmedina@reddit
Qwen3.6 35B A3B Q8_K_XL
2m8s
szansky@reddit
how about turbo realistic?
lit1337@reddit
these are endearing lol grandmas woulda emailed the hell outa these in the early 2000s.
redballooon@reddit
Looks cartoonish
davernow@reddit
They are SVGs. Literally line drawings, not rasters.
BigYoSpeck@reddit
Interestingly if you screenshot and post the image of these back to Qwen it says:
I'm going to wager you're responding to a bot
Kappa-chino@reddit
Yeah more deeply than that cos I think above probably doesn't understand what an SVG is, these images were drawn with code, as in, each line in the image is expressed as a mathematical function that the model wrote out. It would take you like a week to make one of these coding by hand
lombwolf@reddit
God damn the twins got fucking smoked by this
LegacyRemaster@reddit
All I need!
nikhilprasanth@reddit
Looks Neat, ill try some of these with 35B
nikhilprasanth@reddit
Here is Qwen3.6-35B-A3B-UD-Q4_K_XL
Significant_Fig_7581@reddit
Can't wait for it to be as good as the 27B, Qwen4 35B is gonna be awesome.
nikhilprasanth@reddit