What is the best ai i can run locally on my rtx 5070
Posted by Interesting-Pop-7391@reddit | LocalLLaMA | View on Reddit | 8 comments
specs
9800x3d
32g ddr5
rtx 5070
Posted by Interesting-Pop-7391@reddit | LocalLLaMA | View on Reddit | 8 comments
specs
9800x3d
32g ddr5
rtx 5070
themule71@reddit
Depends on the job. 12GB is a huge limitation. You can't have the hot MoE models of the month (Qwen3.6 35B A3E and Gemma4 26B A4E) without serious compromises on quantization (quality of results), context size (type of tasks) or speed (offloading to RAM kills performances, you can't interact, you basicly enter a message and come back minutes later).
I find the stricter the contraints, the harder you have to experiment with models and configurations yourself, all depends on your needs and expectations. If your workload allows for (or even encourages) long waits, you can afford running off RAM, and you get in the realm of < 5t/s go with Qwen3.6 35B A3E or Gemma4 26B A4E. I'm experimenting on a remote PC with 32GB of unused RAM (bought when RAM was cheap). It would be sitting there doing nothing otherwise. I'm experimenting with Qwen3.6 35B A3E Q5_K. It's relatively fast, among snails, but I find it overthinks a bit too much, which is annoying when you're watching 3 or 4 t/s production. Gemma tends to think less, it's a tad slower but gets to the point faster.
If you want speed, you have to go with smaller models. I barely tested those, I've heard Gemma4 E4B is good. I've heard rumors they are cooking Qwen3.6 smaller models as well.
jacek2023@reddit
I am able to run Q4 for both gemma and qwen MoE on my 5070 (I use 5070 only for quick tests, not long work)
6c5d1129@reddit
i got the same setup with a worse CPU (7600x) and i was running Qwen 3.6 35b A3B with the Q4_M unsloth quant and i was getting like 40tok/s before any optimizations. same for the Gemma 4 MoE.
Potential-Gold5298@reddit
You didn't specify the scope, so Gemma 4 26B-A4B it Q5/Q6 or 31B it Q4. 26B-A4B has \~85-90% of the intelligence of 31B, but 6-8 times faster. If you are primarily interested in tools calling and coding, then Qwen3.6-35B-A3B Q5.
AcanthaceaeNo5503@reddit
One vote for Gemma 4 GGUF/IQ3_S unsloth
Interesting-Pop-7391@reddit (OP)
Im kinda new to this whole local ai stuff but i heard that gemma 4 26b is also a good choice. How does it compare to your recommendation?
AcanthaceaeNo5503@reddit
Oh I actually want to cite it , 26B. Like its big and fast enough for local use. I found jt work decently. Small models like e4b fit well on your specs, but always better to use a bigger model with quant
YourNightmar31@reddit
Probably Qwen3.6 35B at Q4, Q5 or Q6 depending on what kind of speed you desire.