For those of you running vllm locally for inference what quantifications do you use
Posted by Limp_Classroom_2645@reddit | LocalLLaMA | View on Reddit | 4 comments
Right now i'm running llamacpp on ubuntu with a RTX3090, but I would like to test qwen3.6 35B A3B on vllm, afaik, vllm's gguf support is not great and there are so many other quantizations out there, so I would like to know what types of quants should I use with vllm when it comes to models like qwen 3.6 35B a3b and other moe models.
4 Comments
__JockY__@reddit
kivaougu@reddit
Limp_Classroom_2645@reddit (OP)
Formal-Exam-8767@reddit