Has anyone run gemma 4 or Bonsai 8B models on Orange pi 5?

Posted by bhakt_chungus@reddit | LocalLLaMA | View on Reddit | 7 comments

I am extremely new to this and am wondering if I can run a very small model with decently fast throughput on one of these chips. If anyone was successful in doing so that would be helpful to know.

[-]

honuvo@reddit

Hi, not on an Orange Pi, but a Raspberry Pi 5 16GB. I had posted a few days ago and am currently benchmarking again. Already tested gemma 4 E4B, so here's a sneak peek:

model	size	params	backend	threads	test	t/s
gemma4 E4B Q8_0	7.62 GiB	7.52 B	CPU	4	pp512	22.16 ± 0.01
gemma4 E4B Q8_0	7.62 GiB	7.52 B	CPU	4	tg128	2.28 ± 0.01
gemma4 E4B Q8_0	7.62 GiB	7.52 B	CPU	4	pp512 @ d32768	9.44 ± 0.01
gemma4 E4B Q8_0	7.62 GiB	7.52 B	CPU	4	tg128 @ d32768	1.53 ± 0.00

If thats decently fast enough for you I don't know. The E2B is of course even faster.

[-]

I got around to compile the llama.cpp-fork for Bonsai 8B and tested that. Maybe I did something wrong, maybe the calculations aren't really optimized for ARM CPUs, I don't know. Not interested in looking into that model more, but here are the results: | model | size | params | backend | threads | test | t/s | | Bonsai 8B Q1_0 | 1.07 GiB | 8.19 B | CPU | 4 | pp512 | 3.27 ± 0.00 | | Bonsai 8B Q1_0 | 1.07 GiB | 8.19 B | CPU | 4 | tg128 | 2.77 ± 0.00 |

[-]

^(Parent commenter can ) ^(delete this message to hide from others.)

^(Info)	^(Custom)	^(Your Reminders)	^(Feedback)