How to make PocketPal inference faster on android?
Posted by Ok_Warning2146@reddit | LocalLLaMA | View on Reddit | 4 comments
I have an OnePlus 12 24GB running on LineageOS 22.2 with 6.44GB zram. I ran the PocketPal bench at the default pp=512,tg=128,pl=1 and rep=3.
|pp|tg|time|PeakMem|Model|
|:-|:-|:-|:-|:-|
|14.18t/s|6.79t/s|2m50s|81.1%|Qwen3-30B-A3B-Instruct-2507-UD\_Q5\_K\_XL|
|17.42t/s|4.00t/s|3m4s|62.0%|gemma-3-12b-it-qat-Q4\_0|
The Qwen model is about 21.7GB and the gemma model is 6.9GB. It seems like the PeakMem refers to the Peak Memory used by the whole system as the gemma model shouldn't fill up 62% of 24GB. In that sense, I presume some of the 21.7GB Qwen model went to zram which is like a compressed swap stored in RAM. Would adjusting zram size affect performance? Would it perform much better if I use a 16GB qwen model?
I noticed that PocketPal benchmark doesn't offload anything to the GPU. Does that mean only CPU is used? Is it possible to make PocketPal to use GPU?
Thanks a lot in advance.
4 Comments
pmttyji@reddit
Ok_Warning2146@reddit (OP)
pmttyji@reddit
Intelligent-Gift4519@reddit