Laptop inference speed on Llama 3.3 70B

Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 75 comments

Hi I would like to start a thread for sharing laptop inference speed of running llama3.3 70B, just for fun, and for resources to lay out some baselines of 70B inferencing.

Mine has AMD 7 series CPU with 64GB DDR5 4800Mhz RAM, and RTX 4070 mobile 8GB VRAM.

Here is my stats for ollama:

NAME SIZE PROCESSOR UNTIL
llama3.3:70b 47 GB 84%/16% CPU/GPU 29 seconds from now

total duration: 8m37.784486758s load duration: 21.44819ms prompt eval count: 33 token(s) prompt eval duration: 3.57s prompt eval rate: 9.24 tokens/s eval count: 561 token(s) eval duration: 8m34.191s eval rate: 1.09 tokens/s

How does your laptop perform?