For those of you running vllm locally for inference what quantifications do you use

Posted by Limp_Classroom_2645@reddit | LocalLLaMA | View on Reddit | 4 comments

Right now i'm running llamacpp on ubuntu with a RTX3090, but I would like to test qwen3.6 35B A3B on vllm, afaik, vllm's gguf support is not great and there are so many other quantizations out there, so I would like to know what types of quants should I use with vllm when it comes to models like qwen 3.6 35B a3b and other moe models.