Recommendations for a tiered local AI setup? (5090 + Mini PC + Obsidian)

Posted by luigi029@reddit | LocalLLaMA | View on Reddit | 8 comments

Hey everyone,

I’ve finally got my local media stack on my NAS migrated over to a new Mini PC running WSL2, sperately I have running my main gaming rig.

now wnat to delve into the world of local AI models. Looking for some sanity checks on my model choices and how I’m tying everything together as a bit of a self-hosting beginner.

The Hardware:

Mini PC: Intel Core Ultra 9 / 32GB RAM. This runs 24/7. It’s got Open WebUI, Kokoro for TTS, and SearXNG for quick web searches. Configured this with the help of Gemini, but think i have a reasonable understanding of how it ties together.

Gaming Rig: RTX 5090. I’m running Ollama natively here and connecting it to the Mini PC via Tailscale when I need the heavy lifting.

The Workflow:

I’m using SearXNG on the Mini PC for basic stuff, but planning Vane set up to trigger only when I’m using the 5090 for deep-research tasks. is this worthwile?

I’m also trying to get my Obsidian vault synced across everything using Syncthing. The goal is to use the vault as a local knowledge base in Open WebUI so the AI actually has access to my personal notes

Where I need help (Total newbie here):

5090 Models: With 32GB VRAM, what's reccomendations? I’ve been looking at Qwen 3.5 27B for speed, but is it worth trying to squeeze a quantized 70B on there, or will it just be painfully slow for daily use?

Mini PC Models: Since this is always on, I want a small model (under 12B) that’s smart enough for basic chat but won’t cook the CPU or make the fans go crazy. Preferably with the ability to websearch with searxng.

Obsidian: I’m totally new to this. What’s the best way to index a live Obsidian vault in Open WebUI? Is there a way to auto-index it as I add notes, or do I have to keep re-uploading files to the "Documents" section?

Syncthing: Is Syncthing reliable enough for an Obsidian vault, or am I going to wake up to a mess of "conflict files" if I edit on my phone and PC at the same time?

If I’m doing something totally "special" with this networking or setup, let me know. Otherwise would really appreciate suggestions.

Cheers!

[-]

mr_Owner@reddit

Gemma 4 e2b and qwen3.5 4b on minipc is solid, same here.

[-]

luigi029@reddit (OP)

I tried gemma. Haven't had the best of luck though. First use it just spouted mandarin to me. Then I did a test question asking it to summarise the 2025 F1 and it started to bring up philosophy and citing some sort of literature.

Could it be I've got it configured wrong ?

[-]

mr_Owner@reddit

Sounds like a params combo issue yeah

[-]

ai_guy_nerd@reddit

Squeezing a quantized 70B onto a 5090 is absolutely the right move. For daily use, a 4-bit or 5-bit quant of Llama 3 or Qwen 2.5 will run comfortably and provides a massive jump in reasoning over the smaller models. It won't be painfully slow, just slightly slower than a 7B, but the quality trade-off is worth it.

For the Mini PC, sticking to something under 8B is smart to avoid CPU heat. Llama 3.2 3B or Phi-3.5 Mini are great for basic chat and routing. If the goal is Obsidian integration, looking into the "Smart Connections" or "Text Generator" plugins is a good bet for linking the vault to the local API.

One other option for orchestration is OpenClaw if the goal is to turn that Mini PC into a persistent agent server rather than just a chat interface.

[-]

luigi029@reddit (OP)

Thank mate, really appreciate the response. Good to know I'm not heading down rhe compleye wrong road.

Open claw sounds interesting, as having gone from a Gemini pro user to Phi-3.5 Mini. As impressive as these modesl are in the sense they can run locally on a cpu, they're not quite what im after. One thing that concerns me with open claw is trh security side of things? Is this a big risk?

After a fair bit of research these were the 3 I settled on for my 5090 setup

huihui_ai/qwen3.5-abliterated:35b-a3b

huihui_ai/qwen2.5-coder-abliterate:32b

huihui_ai/deepseek-r1-abliterated:32b

Is there anything I need to know regarding using these. E.g temp settings or context usage etc. In the openwebui settings. Any advice massively appreciated!

[-]

Glittering-Call8746@reddit

Why huihui...

[-]

luigi029@reddit (OP)

Friend sadi they're much better than the standard. Whether it be for writing or just for general coding etc. As the models dont waste time questioning what is or isn't allowed.

Whether my friend was talking out his a*** I do not know. More than open to other versions if you think are better or even just the stanard ones?

[-]

Glittering-Call8746@reddit

https://www.reddit.com/r/LocalLLaMA/s/sXWN2UJdU9