I want to run qwen3.5 27B q4_k_m on CPU, and I need help.

Posted by Personal_Storage_876@reddit | LocalLLaMA | View on Reddit | 17 comments

I am an local LLM beginner and I found this Reddit while looking for help. (Please understand that I am unfamiliar with Reddit.)

(system- i5 4440 1.8GHz/b85m ds3h/DDR3 32GB/128GB SSD/Ubuntu 25.10 questing)

I loaded Qwen3.5 27B Q4_K_M onto a llama.cpp built for CPU with the options shown in the photo, and the remaining memory was less than 1GB.

However, when I loaded a llama.cpp built for Vulkan with -ngl 0 while using an RX570 8GB, the remaining memory was 8GB. (VRAM occupied about 1.8GB.)

When I loaded Qwen3.5 27B IQ4_XS onto the CPU, the remaining memory was 10GB. I am currently using IQ4_XS and have no complaints regarding the immediate quality, but I am curious why this phenomenon occurs with Q4_K_M.