AI performance of smartphone SoCs
Posted by Balance-@reddit | LocalLLaMA | View on Reddit | 28 comments
https://ai-benchmark.com/ranking_processors.html
A few things notable to me: - The difference between tiers is huge. A 2022 Snapdragon 8 Gen 2 beats the 8s Gen 4. There are huge gaps between the Dimensity 9000, 8000 and 7000 series. - You can better get a high-end SoC that’s a few years old than the latest mid-range one. - In this benchmark, it’s mainly a Qualcomm and Mediatek competition. It seems optimized software libraries are immensely important in using hardware effectively. -
koumoua01@reddit
I wonder if 24GB ram, 1TB storage, 8gen3 phones could be useful? Demo devices with 99% new seem cost less than $300
VickWildman@reddit
On my OnePlus 13 with the Snapdragon 8 Elite and 24 GB RAM models like Qwen3-A30B-A3B run fine, but otherwise wouldn't be able to with just 16 GB, not while multitasking in any case, so I would say yes.
Brief_Consequence_71@reddit
The Oneplus 12 with Snapdragon 8 gen 3 and 24gb ram/1to is fine too with Qwen 30B A3B, if someone have the opportunity to buy it cheaper, i have one.
Vaddieg@reddit
Irrelevant benchmark. Why not running something more practical like llama-cpp tp/tg bench?
PhlarnogularMaqulezi@reddit
Holy Crap, I actually have the top thing of something. Though it's allegedly modified in some capacity by Samsung.
Sadly the 16GB RAM version of my S25 Ultra wasn't available through my carrier, that would have been sweet.
Though the phone seems to infer quite fast with the 8B~ models I've tried so far
VegaKH@reddit
In real-world performance running small local LLM models on the phone, does the Snapdragon 8 Elite actually beat everything else this handily? Are there any benchmarks or just theoretical numbers?
Klutzy-Snow8016@reddit
The Google Tensor chips are embarrassing. They literally named them after AI acceleration, and look how slow they are.
Dos-Commas@reddit
As a Pixel 9 Pro owner the onboard AI is pretty lacking for a phone that was heavily advertised for AI. I just recently started running Phi 3.5 mini Q4KM on my Pixel and it's running at 6t/s. It's usable in a pitch when cell connection isn't reliable like traveling.
im_not_here_@reddit
It's hard to test obviously, but the npu is supposed to be designed alongside with deepmind to run Gemini models extremely fast and not for any other general usage.
That's the idea anyway, testing how true that is would be more difficult without having free access to test the Nano models. But the on board ai is very fast.
yungfishstick@reddit
There's really nothing special about Tensor at all. Samsung just cut Google a good deal for a bunch of SOCs they didn't want.
im_not_here_@reddit
Google didn't buy Samsung socs as much as people are obsessed with the idea.
Samsung gave Google access to their development resources, and Google used standard ARM designs to make their own using these resources. As they share resources and use Samsung manufacturing then they share close similarities with Exynos that also use standard ARM cores, but they are not actually using Exynos and did make all their own choices.
FullstackSensei@reddit
It's comparing NPU only. How would things stack if GPUs were involved?
VickWildman@reddit
In practice I have found that nothing has support for the NPU in my OnePlus 13 with the Snapdragon 8 Elite.
CPU and GPU speeds are always similar, because the bottleneck is the memory, specifically that 85.4 Gbit/s bandwidth. It's nothing compared to the VRAM of dedicated GPUs.
The NPU wouldn't be faster I imagine, but it would consume a whole lot less power.
SkyFeistyLlama8@reddit
For what it's worth, the same NPU on a Snapdragon X Elite laptop isn't used for much either. It runs the Phi Silica SLM on Windows and 7B and 14B DeepSeek Qwen models. I almost never use them because llama.cpp running on the Adreno GPU is faster and supports a lot more models.
I don't know about Adreno GPU support on Android for LLMs but I heard it wasn't great.
VickWildman@reddit
With Adreno 830 at least Qualcomm's llama-cpp OpenCL GPU backend works great. Some massaging in Termux is required to have OpenCL and Vulkan and GGML_VK_FORCE_MAX_ALLOCATION_SIZE needs to be set to 2147483646.
Specifically OpenCL in Termux requires copying over /vendor/lib64/libOpenCL.so and /vendor/lib64/libOpenCL_adreno.so to the partition Termux uses and they need to be referenced by LD_LIBRARY_PATH.
Vulkan in Termux requires xMeM's Mesa driver, which is a wrapper over Qualcomm's Android driver. You can only build this package on-device in Termux with a small patch I should really get around to contributing.
https://github.com/termux/termux-packages/compare/master...xMeM:termux-packages:dev/wrapper
FullstackSensei@reddit
I think we agree more than it might seem from my comment.
You're right that whether it's the NPU or GPU, both are bound by memory bandwidth. My point is that the NPU on the 8 Elite has much more compute power than older chips. I wouldn't be surprised if the 8 (non-elite) and 8s NPUs don't have enough compute FLOPs/TOPs to saturate the memory controller, hence the much weaker performance.
VickWildman@reddit
NPUs are about power consumption, thus temperature anyway.
When running llama-cpp with larger models my phone's battery sometimes goes up to 48C. I don't have a cooler, so at that point I have to wait for it to chill. I could improve the situation with battery bypass, which involves running the phone from a power bank, but I would rather not.
sammcj@reddit
I really wish iPhones had more RAM
AyraWinla@reddit
I have a Pixel 8a phone (Google Tensor 3; why is it 10% worse than Tensor 2) which I thought was fast compared to other stuff I have. For example, my Samsung S9 FE tablet, with an Exynos 1380.
This benchmark does match that my Pixel runs LLM so much better (829 vs 232 AI score), but I hadn't realized that my Pixel was actually pretty mediocre in the grand scheme of things!
Eden1506@reddit
It doesn't matter how high those scores are as long as memory (amount&bandwidth) stays the main bottleneck for most AI applications.
No_Conversation9561@reddit
where is exynos here?
megadonkeyx@reddit
page 2. Samsung really screwed some galaxy s24 users over with a crap soc. ie me. for my next phone im getting a doogee for £99 lol
s101c@reddit
Two days ago I have encountered another problem with a Samsung phone which, frankly, is a total disaster. Not LLM related.
My friend has installed an update on his Samsung A52 and it has completely disabled the modem capabilities.
"No Service" ever since the update landed. No cell network reception at all. We have tried everything, it didn't help. Plenty of such cases online, it happened to many users with random updates. Some people want to sue the manufacturer.
Weary-Emotion9255@reddit
crap chipset
phhusson@reddit
This doens't apply to LLM though. First because I think there is pretty much no LLM on NPU use-case on Android. (Maybe Google's Edge Gallery does?), and then because only prompt processing's speed is ;o,oted by computation. Token Generation will be just as fast on CPU than on NPU on most smartphones. Maybe when we'll see huge agents on Android it'll get useful, but we're still not there.
>You can better get a high-end SoC that’s a few years old than the latest mid-range one.
FWIW I've had smartphones since like 2006, and this statement has been true globally (not just NPU) since like 2010.
Agreeable_Cat602@reddit
Apple should be in the top, it's the superior brand and deserves to be praised. I own an iPhone Pro Max where the max means maximum superiority, this also reflect it's buyers.
I expect lots of upvotes.
MMAgeezer@reddit
Worth noting that many of the devices tested here are using a now-depreciated Android API which notoriously doesn't have great performance: https://developer.android.com/ndk/guides/neuralnetworks/
1overNseekness@reddit
any comparison to alternatives on desktop cpu ? to see advancements / track state of mobile ai perfs