For those of you running vllm locally for inference what quantifications do you use

Posted by Limp_Classroom_2645@reddit | LocalLLaMA | View on Reddit | 4 comments

Right now i'm running llamacpp on ubuntu with a RTX3090, but I would like to test qwen3.6 35B A3B on vllm, afaik, vllm's gguf support is not great and there are so many other quantizations out there, so I would like to know what types of quants should I use with vllm when it comes to models like qwen 3.6 35B a3b and other moe models.

For those of you running vllm locally for inference what quantifications do you use

Reply to Post

4 Comments

JockY@reddit

kivaougu@reddit

Limp_Classroom_2645@reddit (OP)

Formal-Exam-8767@reddit