Best home hardware for an AI rig

Posted by maofan@reddit | LocalLLaMA | View on Reddit | 22 comments

I'm currently spending £90 a month with anthropic and potentially thinking of going to the next tier which is £200, that's the same if I stick with Anthropic or go for codex or similar. I can buy a 3090RTX 24GB card and I already have a 4070RTX 12GB card. I'm currently running on a desktop with 64GB ram and AMD ryzen 7 9700x.

Model	36GB VRAM Experience	Speed
Qwen 3.5 Coder (35B)	Fits 100% on GPU with huge 32k context.
Llama 4 (70B)	Fits \~80% on GPU; small spill to 64GB RAM

I'm thinking I could stay on the 5x tier, and spend 7-8 months worth of subscription on a 3090RTX. If that goes well I could sell my 4070 and get another 3090RTX and a new power supply!

My workflow usually is "opus" for planning and "sonnet" for execution. For anyone who has done this jump, could I get close to sonnet reasoning with 36GB? Would I need to go the whole way and go up to 48GB?

Is it even worth it? With models improving all the time, I'm wondering if more and more memory will be required.

[-]

CautiousStudent6919@reddit

I really like my AMD R9700 AI Pro.. 32gb vram is great, it runs everything great, and I'd honestly probably get another

OddDesigner9784@reddit

What's tks/sec on it. Seriously considering this but if it's 2-3x slower than a 5090 probably not worth it

getting over 3000pp and 110tg on Qwen3.6-35B-A3B at 256k context.
about 2000pp and 40tg with Qwen3.6-27B

which.. imho, is totally fine.

I have a 9070xt only half the vram but my results are really similar I would guess on a much lower quant lol. I'm just hesitant because moe models are fast. But the dense are slow I want to be able to run dense fast enough. Idk if ram is as important as speed now when the 35b and 27b options are killer and I want 27b fast

gh0stwriter1234@reddit

It can run the 3.6 27B at 25t/s ... probably a lot faster with some kind of ngram speculative decoding.

jacek2023@reddit

"Is it even worth it? With models improving all the time, I'm wondering if more and more memory will be required."

local models are smaller, not bigger, in 2023 you would need to run 70B model, now you can run 35B MoE model (faster and less VRAM used), additionaly, I purchased 3090s and cheap 128GB RAM in 2024/2025 and today 128GB RAM is something extremely expensive

This is not entirely true, the larger models will STILL have more world information in them even as the models become better overall you can't just say the smaller models are better in all cases.

Due-Function-4877@reddit

FWIW, 32k context isn't "huge"; it's tiny.

itsmetherealloki@reddit

I’m personally in the middle of moving from opus to Gemma 4 26b4a at q5 on a 3090. Built a system prompt, skills and tools so it can do everything Claude can do for me. It’s actually working swimmingly for me right now. Have a few more tools to add but is already taking 50% of the workload. When I’m done later this week should handle 90-95%.

I’m not saying Gemma 4 is just an easy swap from opus or that it’s as smart. But it is nearly as capable if you give the tools it needs such as memory, rag, doc creation and editing.

Local models are now capable enough with tools and context to do most of what we need the frontiers to do for us. At least at the personal level.

adeadfetus@reddit

This blows my mind when I read comments like this. I have 2x3090 and have tried lots of models, quants, and tuning parameters and in general they don’t hold a candle to the frontiers. They’ve gotten better the last few years sure, but still not even 75%. What are your use cases?

Well for 1 I’m building my own k8s based agentic platform where the agents use gitea as their source of truth for docs and can write code to gitea and auto deploy via ci/cd. My builders use the same Gemma 4 model but at q3 so I can run many at one time from my RTX pro 4000 Blackwell. Those guys write great go, python and iac for me with very little trouble.

rorowhat@reddit

Strix halo

ai_guy_nerd@reddit

The gap between a 70B model on 36GB (with some spill to RAM) and a top-tier proprietary model like Sonnet isn't just about memory. It's mostly about the training data and the RLHF. You can run a great model, but you won't "get close" to Sonnet's reasoning just by adding more VRAM.

That said, 48GB is a much safer floor for 70B models if you want to avoid the massive performance hit of system RAM offloading. If speed is a priority, the jump to 48GB or more is worth it. Otherwise, sticking with a smaller, highly optimized model like a 30B-range Coder might actually give a better experience than a struggling 70B.

Some local orchestration layers like OpenClaw or similar can help manage different models for different tasks, but the raw hardware limit is the real bottleneck here.

Monad_Maya@reddit

Try out smaller LLMs by loading up some change on https://openrouter.ai/. If the models are good enough, buy the hardware.

If the smaller LLMs are not that great for your use then try Minimax, GLM etc. subscriptions.

Personally, Qwen 3.6 27B is not at the level of Sonnet let alone Opus in my admittedly limited time with it.

Joozio@reddit

Before committing to the 3090, think about what you actually want local. I run paid tier plus a Mac Mini, and the Mini handles a 35B for classify-and-route while paid gets the heavy lifting. Qwen 3.5 Coder 35B on your 4070 plus a 3090 should work fine. Worth knowing: Opus 4.7 burns roughly 80x the requests of 4.6 for the same task, so weekly caps blow up fast either way. Local on cheap calls, paid on the rare hard ones. Still tweaking the split for how I work though.

Recent-Success-1520@reddit

If your use case is for coding, local models won't be as good and fast and long context

SexyAlienHotTubWater@reddit

How much do you earn per month? How much would you earn if you could do twice as much work? I would bet it's significantly more than $200 more per month.

The productivity gains are massive from the $200/month plan. If you're projecting based on current pricing, local can't really compete. This may change in the future (I think it will change, there aren't enough GPUs), but IMO that's the only reason to go local if you aren't worried about keeping your data private.

maofan@reddit (OP)

It's a really good question. Right now I earn £0 per month as I've lost my job and looking to start a business rather than going straight back to employment. So right now without a regular income I'm trying to balance cost over raw power. On the "5x" plan I rarely run out of tokens (but I do sometimes). So I don't really need the jump up to the next tier (10x o 20x) I'm thinking of using a local model to supplement.