[Benchmark] RK3588 NPU vs Raspberry Pi 5 - Llama 3.1 8B, Qwen 3B, DeepSeek 1.5B tested

Posted by tre7744@reddit | LocalLLaMA | View on Reddit | 16 comments

Been lurking here for a while, finally have some data worth sharing.

I wanted to see if the 6 TOPS NPU on the RK3588S actually makes a difference for local inference compared to Pi 5 running CPU-only. Short answer: yes.

Hardware tested: - Indiedroid Nova (RK3588S, 16GB RAM, 64GB eMMC) - NPU driver v0.9.7, RKLLM runtime 1.2.1 - Debian 12

Results:

Model Nova (NPU) Pi 5 16GB (CPU) Difference
DeepSeek 1.5B 11.5 t/s ~6-8 t/s 1.5-2x faster
Qwen 2.5 3B 7.0 t/s ~2-3 t/s* 2-3x faster
Llama 3.1 8B 3.72 t/s 1.99 t/s 1.87x faster

Pi 5 8B number from Jeff Geerling's benchmarks. I don't have a Pi 5 16GB to test directly.

*Pi 5 3B estimate based on similar-sized models (Phi 3.5 3.8B community benchmarks)

The thing that surprised me:

The Nova's advantage isn't just speed - it's that 16GB RAM + NPU headroom lets you run the 3B+ models that actually give correct answers, at speeds the Pi 5 only hits on smaller models. When I tested state capital recall, Qwen 3B got all 50 right. DeepSeek 1.5B started hallucinating around state 30.

What sucked:

NPU utilization during 8B inference: 79% average across all 3 cores, 8.5GB RAM sustained. No throttling over 2+ minute runs.

Happy to answer questions if anyone wants to reproduce this.

Setup scripts and full methodology: github.com/TrevTron/indiedroid-nova-llm


Methodology note: Hardware provided by AmeriDroid. Benchmarks are my own.