RTX 3090 + 27B model performance issues (llama.cpp) what am I doing wrong

Posted by Clean_Initial_9618@reddit | LocalLLaMA | View on Reddit | 30 comments

Hey folks — looking for some advice on improving my local LLM setup (and also exploring agentic coding workflows).

Current setup:

Issue:
Responses are really slow, and sometimes it just starts producing errors or low-quality output. Feels like something’s not tuned right or I’m pushing the hardware too far.

Current command:

llama-server.exe -m "C:models/Qwen3.6-27B-GGUF/Qwen3.6-27B-Q6_K.gguf" -ngl 99 -c 65536 -np 1 -fa 1 -ctk q8_0 -ctv q8_0 -b 1024 -ub 256 -t 16 --no-mmap --temp 0.6 --top-p 0.95 --top-k 20 --reasoning on --host 0.0.0.0 --port 8080 --metrics --slots --props

What I’m trying to figure out:

Also curious about:
I’m trying to get into more agentic coding workflows locally (multi-step reasoning, tool use, etc.).

Would really appreciate any tips, configs, or examples from people running similar hardware. Thanks in advance for all your advice and help.