GMKtec EVO-X2 70B expectation

Posted by Non-Technical@reddit | LocalLLaMA | View on Reddit | 17 comments

I would like to use a 70B model on a GMKtec EVO-X2 AI Mini PC 128GB.

Selected this one: Llama-3.3-70B-Instruct-Q4_K_M.gguf

Ubuntu 24.4.4 LTS and compiled llama.cpp server for the gfx1151. GRUB ttm.pages_limit=26214400 so \~100GB of the unified memory in available to be shared. All of the layers are going into the gpu.

I'm getting 5.25 predicted per second which is a bit slower than I read. Is that normal?