Try the new Z-Image-Turbo 6B (Runs on 8GB VRAM)!

Posted by KvAk_AKPlaysYT@reddit | LocalLLaMA | View on Reddit | 28 comments

Hey folks,

I wanted to try out the new Z-Image-Turbo model (the 6B one that just dropped), but I didn't want to fiddle with complex workflows or wait for specific custom nodes to mature.

So, I threw together a dedicated, clean Web UI to run it.

Has CPU offload too! :)

Check it out: https://github.com/Aaryan-Kapoor/z-image-turbo

Model: https://huggingface.co/Tongyi-MAI/Z-Image-Turbo

May your future be full of VRAM!

[-]

Traditional-Gap-3313@reddit

This is amazing. For my usecases on a few quick tests, it almost matches NanoBanana

BTW either seed has almost no effect, or my prompts are really specific. Maybe the latter. But at first I thought your seed logic in colab is broken, then I tried with forcing a few different seeds and the differences are quite subtle.

https://imgur.com/a/ReKebqR

[-]

KvAk_AKPlaysYT@reddit (OP)

This is so cool! Could be a few factors on how it happens: over distillation, too many similar examples in the train set, smaller model = less knowledge.

Thanks for sharing!

[-]

dtdisapointingresult@reddit

What does GGUF quantization do on an image model? On an LLM, it hallucinates more, loses coherency, etc. Does a heavily quantized image model draw abominations? Or just forget prompt details while still looking good?

[-]

diff2@reddit

how do you do image editing? it seems possible with the model according to the model card, but maybe your UI doesn't have the upload an image option?

[-]

KvAk_AKPlaysYT@reddit (OP)

This is for Z-Image-Turbo, the editing fine-tune is not out yet :(

[-]

ali0une@reddit

Looks nice. How about having gguf support so we could point the app to a z-image.gguf file instead of dowloading the full model?

[-]

mpasila@reddit

it already exists https://huggingface.co/jayn7/Z-Image-Turbo-GGUF

[-]

ali0une@reddit

You don't understand what i'm saying. i'm not asking for gguf files.

[-]

mpasila@reddit

Oh yeah I didn't notice they mentioned they made a custom webui for this.. You can use it easily in ComfyUI though.

[-]

FinBenton@reddit

There are ggufs already, Im using FP8 version that generates fairly high resolution images on my 4090 in seconds with LoRa support that seems to also work really well.

[-]

ali0une@reddit

No, i'm talking about gguf support in OP app.

[-]

mintybadgerme@reddit

Seconded.

[-]

WithoutReason1729@reddit

Your post is getting popular and we just featured it on our Discord! Come check it out!

You've also been given a special flair for your contribution. We appreciate your post!

I am a bot and this action was performed automatically.

[-]

OptiKNOT@reddit

I hope to run a image-generation model on my 4GB VRAM pc one day :/

[-]

KvAk_AKPlaysYT@reddit (OP)

It has CPU offload :) It runs really well with it too!

[-]

121507090301@reddit

I thought it would take longer for that. Thanks for the heads up.

To anyone who wants to run it but doesn't know where to start, you will first need the latest (ComfyUI)[https://github.com/comfyanonymous/ComfyUI/] (on Ubuntu I just downloaded the latest release, extracted and run it with 'python3 main.py --cpu' through the terminal from the folder). Then I installed (ComfyUI-GGUF)[https://github.com/city96/ComfyUI-GGUF], (downloaded the models and the example workflow)[https://huggingface.co/jayn7/Z-Image-Turbo-GGUF/tree/main] (the example workflow goes into user/default/workflows) and got the (clip from here)[https://huggingface.co/unsloth/Qwen3-4B-GGUF/tree/main]. Model goes into 'models/unet' and clip into 'models/clip'.

[-]

GlobalLadder9461@reddit

Can we run it through vulkan? Is there starting guide for that.

[-]

Jan49_@reddit

I have a 4gb vram laptop too. SD1 runs really well, SDXL works fine but is slow. Still have to find a way to run Z-Image-turbo

[-]

Jan49_@reddit

Got it running on my system now (4gb vram and 16gb ram).

I'm using the comfyui workflow but changed the text encoder to unsloth/qwen3-4b-Q5_k_m.gguf. Im using the ClipLoaderGGUF node. Works fine.

For the model I'm using the fp8 quantized model from t5b.

A 512x512 image with 9 steps and 1.0cfg takes around 4 minutes.

A 720x720 image (same settings) takes around 6 minutes.

A 1024x1024 takes around 20 minutes... Not doing that again

[-]

MatlowAI@reddit

If nunchaku can get some svdquant in the 4 bit neighborhood you should be able to get away with 4 without offload if I'm thinking correctly.

[-]

twack3r@reddit

Thanks for sharing! Is there a way to select GPUs via CUDA device #?

As it stands, z image shits the bed on Blackwell GPUs with catastrophic OOM, so I need to be able to select on of my 3090s.

This is also the case vis ComfyUI as well as my own, lightweight Gradio frontend: without GPU select, it’s non-trivial to bypass the Blackwell tech stack which continues to be a pain in the ass compared to Ada and Ampere.

[-]

a_beautiful_rhind@reddit

Hilariously Q8, BF16, FP8 all perform the exact same on 3090.. On turning it shoots up to over a minute per image. I don't get why FP16 doesn't work as I've never seen a model kick over from slightly lower precision. I even looked through the inference code and tried to vibe-fix it but nada.

[-]

fluecured@reddit

How much RAM does Z-Image-Turbo need? I tried it in ComfyUI with 12 GB VRAM (3060) and 12 GB RAM, but it froze my PC, including mouse pointer and clock, and the log was cut off at "Requested to load AutoencodingEngine". I had to manually restart the PC after several minutes. Do you think yours might run better, or do I have to little RAM to run it at all?

[-]

KvAk_AKPlaysYT@reddit (OP)

I'm also looking for work opportunities, so lmk if you got some open positions! I've gotten several AI projects from idea to prod :)

[-]