How good Orange Pi 6 for local llm?

[-]

AppearanceHeavy6724@reddit

Is it possible to run 3B or 8B LLMs in this?

Yes, but something like N100 is probably cheaper and more practical.

[-]

Middle_Investment_81@reddit (OP)

Would you please tell a bit more why N100 is more practical?

[-]

notheresnolight@reddit

you can run a standard Linux distribution on it, instead of some hacked-up Chinese Ubuntu image downloaded from Google drive, with an old kernel and tons of binary blob

[-]

JacketHistorical2321@reddit

Can this not run a standard Linux distro?

[-]

umtausch@reddit

This! Had a radxa rock with such a rockchip soc and it was always hacky as hell.

[-]

jimfullmadcunt@reddit

I think this is using the same SoC as the Radxa Orion O6. There are some benchmarks here:

https://forum.radxa.com/t/llama-cpp-benchmarks/27813

In short, right now, it's not very performant:

There is no proper software or driver support for the NPU
The Mali Drivers aren't in mainline yet (no Vulkan acceleration)
Memory bandwidth to CPU maxes out at around \~40GB/s (theorized it may allow more with NPU or GPU)

[-]

No_Conversation9561@reddit

Me personally, I’m waiting for a board to come out with new arm cores which supports SME/SME2 extensions.

SoC in Orange Pi 6 has Cortex-A720 and Cortex-A520 cores which doesn’t support SME.

[-]

DUFRelic@reddit

these socs are alle bandwith restricted and no extension will change that...

[-]

Evening_Ad6637@reddit

SME would address prompt processing speed, not text generation speed. So it's not primarily about bandwidth.

[-]

What’s the bus speed interfacing with the memory? Sure it’s LDDR5 and we’re pretty familiar with how fast that is, but what matters more in this case is how fast the board can move data in and out of those chips.

[-]

Full_Collection_4347@reddit

You can get a used Mac Mini base M4 for a bit more.

Built in power supply, low energy use, wifi, ethernet, 16gb unified memory, 256gb hard drive.

[-]

DinoAmino@reddit

The most upvoted comment doesn't even address the question. Mac fanbois never disappoint.

[-]

Turbulent_Pin7635@reddit

Not even a Mac fanboy, but if your intent is to generate text. Mac os by far the best your money can buy.

[-]

gavff64@reddit

I mean the insinuation is that the value is bad and there’s better options. Unless it absolutely has to be an SBC for some reason, like a project, it’s going to be a bad experience.

[-]

DinoAmino@reddit

True that. And yeah, it doesn't help that OP doesn't mention any particular use case or intent.

[-]

Middle_Investment_81@reddit (OP)

I was trying to assess the value in general, so the comment is not at all irrelevant.

[-]

Middle_Investment_81@reddit (OP)

Mac Mini M4 is twice as much expensive than this. I wonder if the NPU in opi6 has any additional advantage that might make it not so bad option?

[-]

gavff64@reddit

A bit more is a stretch, that’s like double the price or more, even used. A 16gb Mac M1 Mini can be found under $350 pre-owned, that’s a little more in-line.

[-]

mymainunidsme@reddit

I'm keeping an eye on this too. I'm guessing the NPU is going to require customized models like the rk3588 of the Opi5+ does.

[-]

usernameplshere@reddit

I did read something that granite 4 can also be run on npus. But I gotta admit, I have no desktop or whatever with an NPU to get any experience on that.

[-]

Middle_Investment_81@reddit (OP)

It needs a conversation of the model through onnx to make it suitable for the rk3588 chip. Do you mean it is possible to do that for only some of llms?

[-]

mymainunidsme@reddit

Based more on what I've read than done hands-on, yes. My understanding is a lot of models break in the conversion. I haven't messed with anything but vision on my rk3588 boards in a while.

[-]

blazze@reddit

32 GB RAM for $299 would make this a pretty good value. 45 AI/NPU TOPS beats Mac Mini 4 38 TOPS for $599 with half the ram. FIrst question is, what is the NPU memory bandwidth?

[-]

GabryIta@reddit

1 token / month

[-]

Cool-Chemical-5629@reddit

Does 16GB / 32GB mean two different versions, or is it one version with two different stacks of memory? If it's the former, you could probably run small models on the first and possibly up to 32B on the latter (assuming you're okay with high quantizations). MoE models available in that range would probably run decently - GPT-OSS 20B, Qwen 3 30B A3B 2507 Instruct + Thinking and Qwen 3 Coder Flash. From dense models, it would be Mistral Small 24B, Reka Flash 3 and 3.1 21B, Gemma 3 12B, Qwen 3 32B, maybe Gemma 3 27B - unfortunately this one is very memory demanding for its context, so you'd have to either use quant of very reduced quality, which would result in much lower quality, or use very small context window, but either way this would be probably very slow inference anyway, but still should be possible.

[-]

AppearanceHeavy6724@reddit

maybe Gemma 3 27B - unfortunately this one is very memory demanding for its context,

No not anymore, current versions of llama.cpp supports SWA and Gemma 3 27B is in fact unusually light on KV cache memory these days.

[-]

Cool-Chemical-5629@reddit

Which LlamaCpp version? The last time I ran Gemma 3 12B in LM Studio, it was pretty heavy and that's only the 12B model. I couldn't realistically run the big 27B one, which is ridiculous since Qwen 2.5 32B fits on the same quant.

[-]

AppearanceHeavy6724@reddit

I do not use LM studio, but SWA was in llama.cpp since at least September. Right know I can run 27B QAT on 3060+1070 (20 GiB) with 32K context easy-peasy.