Local ai - ollama, open Web ui, rtx 3060 12 GB

Posted by Apollyon91@reddit | LocalLLaMA | View on Reddit | 7 comments

I am running unraid (home server) with a dedicated GPU. NVIDIA rtx 3060 with 12 GB of vram.

I tried setting it up on my desktop through opencode. Both instances yeild the same result.

I run the paperless stack with some basic llm models.

But I wanted to expand this and use other llms for other things as well, including some light coding.

But when running qwen3:14b for example, which other reddit posts suggest would be fine, it seems to hammer the cpu as well, all cores are used together with the gpu. But gpu utilisation seems low, compared to how much the cpu is being triggered.

Am I doing something wrong, did I miss some setting, or is there something I should be doing instead?