Local ai - ollama, open Web ui, rtx 3060 12 GB

Posted by Apollyon91@reddit | LocalLLaMA | View on Reddit | 7 comments

I am running unraid (home server) with a dedicated GPU. NVIDIA rtx 3060 with 12 GB of vram.

I tried setting it up on my desktop through opencode. Both instances yeild the same result.

I run the paperless stack with some basic llm models.

But I wanted to expand this and use other llms for other things as well, including some light coding.

But when running qwen3:14b for example, which other reddit posts suggest would be fine, it seems to hammer the cpu as well, all cores are used together with the gpu. But gpu utilisation seems low, compared to how much the cpu is being triggered.

Am I doing something wrong, did I miss some setting, or is there something I should be doing instead?

[-]

reviews4weed@reddit

If you exceed GPU ram ollama defaults to cpu. Make sure your GPU drivers and configure are good. This will happen with any model that grows beyond your GPU memory.

Ollama is great for simplicity and having a cloud big model. I switched to gemma4:e2b on my 12gb server and its been good locally

[-]