Qwen3 Omni AWQ released

[-]

kyazoglu@reddit

can someone explain how this is 27.6 GB and AWQ? AWQ = 4 bit \~= (# of parameters / 2) GB. This should have been around 16 GB. What am I missing?

Reply

[-]

(# of parameters / 2) GB is lower bound. You also have scales and biases for each tile. The elephant in the room is probably matter of reporting parameter counts. For multi modal models only "core" text to text transformer params are counted in name and adapters for other modalities are not counted into those 30B.

Reply

[-]

No_Information9314@reddit (OP)

Yeah, that is curious. Looks like the thinking model is closer to the expected size https://huggingface.co/cpatonn/Qwen3-Omni-30B-A3B-Thinking-AWQ-4bit/tree/main

Reply

[-]

ninjaeon@reddit

Thank you for this. I tried on 16GB VRAM and failed, "model weights take 19.16GiB" written in my console log. So I guess 24GB VRAM is minimum.

Reply

[-]

kapitanfind-us@reddit

did you compile it yourself or are you using the docker image (asking cause the nightly docker image does not work here)

Reply

[-]

ninjaeon@reddit

Compiled it myself following the guide in the model card (vllm)

Reply

[-]

Hot_Turnip_3309@reddit

Just tried it on vllm, didn't work. Any luck?

Reply

[-]

alew3@reddit

use a docker nightly image, so you don't need to build the whole project (which takes a few hours).

Reply

[-]

the__storm@reddit

It's not merged so I don't think the nightly docker is going to work (although please let me know if I'm wrong and you've had success). There's a precomputed whl though: https://huggingface.co/cpatonn/Qwen3-Omni-30B-A3B-Instruct-AWQ-4bit/discussions/1

Reply

[-]

Mr_Moonsilver@reddit

You need to build vllm from source, check the hf page of cpatonn and this model, there's a command

Reply

[-]

No_Conversation9561@reddit

does vllm work on mac?

Reply

[-]

Mr_Moonsilver@reddit

No

Reply

[-]

SOCSChamp@reddit

Has anyone successfully used this for speech to speech streaming, real time or near real time? I can't be alone in seeing this as my main usecase for an omni model. Or is the juice not worth the squeeze until vLLM audio generation support arrives?

Reply

[-]

NoobLife360@reddit

Thank you for your hard word really appreciate. Did anyone get it working? followed the original omni instructions and got the full model to work, the AWQ was not able to get it to work after loading

Reply

[-]

BallsMcmuffin1@reddit

China signal handedly saving us from AI tyranny

Reply

[-]

Popular_Brief335@reddit

Rofl

Reply

[-]

ApprehensiveAd3629@reddit

how can i use awq models?

Reply

[-]

this-just_in@reddit

An inference engine that supports AWQ, most commonly through vLLM and SGLang.

Reply

[-]

YouDontSeemRight@reddit

Does transformers? And does transformers split between multiple gpus and cpu ram?

Reply

[-]

exaknight21@reddit

Hot damn. This is nice. Very nice.

Reply

[-]

this-just_in@reddit

Really appreciate all the work this guy puts into making these high quality quants.

Qwen3 Omni AWQ released

Reply to Post

22 Comments

kyazoglu@reddit

Oscylator@reddit

No_Information9314@reddit (OP)

ninjaeon@reddit

kapitanfind-us@reddit

ninjaeon@reddit

Hot_Turnip_3309@reddit

alew3@reddit

the__storm@reddit

Mr_Moonsilver@reddit

No_Conversation9561@reddit

Mr_Moonsilver@reddit

SOCSChamp@reddit

NoobLife360@reddit

BallsMcmuffin1@reddit

Popular_Brief335@reddit

ApprehensiveAd3629@reddit

this-just_in@reddit

YouDontSeemRight@reddit

exaknight21@reddit

this-just_in@reddit

Mr_Moonsilver@reddit