I wonder how good the Qwen 3.6 4B will be given the insane boost of performance in the 27B and 36B
Posted by exaknight21@reddit | LocalLLaMA | View on Reddit | 13 comments
I personally am a simpleton with crappy hardware. I run the Qwen 3 4B still for my simple tasks for simple RAG. I personally cannot wait for the 4B Instruct model as I believe it’s my go to “ChatGPT” replacement for dumb question via OpenWebUI and vLLM.
I rock an old T5610, DDR 3 - 64 GB Dual Xeon (sadly AVX) slow processors, 256 GB Sata SSD and an Mi50 32 GB
I run dockerized vLLM (nlzy archived so on the sweet mobydick branch), i run my in-home experiments and use 8K contexr, usually cyankiwi’s awq version, it does wonders for me.
I pray the Qwen team releases this soon!
segmond@reddit
you have a 32gb GPU yet run a 4b? Why? You clearly can run the 36B model at Q6
exaknight21@reddit (OP)
I use vLLM, only need 8K context and have 2 beta users for my app as well. I can do a lot more, but test with this every now and then. In testing, there isn’t any sense in paying premium API price if I can self host the calls.
I will try the Qwen3.6:27B today at Q6 and see how far I can take the context with llama.cpp. I do use claude code a lot and this would be a banger.
Blues520@reddit
I'm interested in self hosting as well and also looking at smaller models for lower latency. How do you go about self hosting? Are you using a cloud instance or a bare metal server and if so, how do you expose it?
exaknight21@reddit (OP)
I have a VPS, on it I have dokploy + a gateway. The VPS is connected to my home server via Tailscale.
I have API keys configured so only authorized connections such as my SaaS or if an OpenWebUI instance, it would also need the endpoint (same as OpenAI-compatible), + API key. Works like a charm on my subdomain.
Blues520@reddit
That is brilliant. Tysm! Going to give it a try
gh0stwriter1234@reddit
You still need vram for context so in a practical sence you can run the 27B at Q6 with some decent context...
tamerlanOne@reddit
😁
putrasherni@reddit
there's even 9B
BothYou243@reddit
Where?
Insomniac1000@reddit
soon 🙏
Insomniac1000@reddit
I can't stop giggling. I'm truly excited for the 9B version.
I've been hesitant on pulling the trigger for letting an AI assistant be my overseer for my homelab. Maybe this is it.
CurrentNew1039@reddit
Yes if it's beats 3.5 27 b or 35b, I will jump through heaven
Monad_Maya@reddit
Unlikely. The parameters count difference is way too large.