Jetson Orin Nano 8GB -- model speed benchmarks

Posted by Forward_Fox1466@reddit | LocalLLaMA | View on Reddit | 5 comments

I’ve been building a fully Local voice assistant on Orin Nano 8GB.

These benchmarks may be of interest to others working with small language models on constrained hardware:

Engine Mean TTFT p95 TTFT tok/s
llamacpp:Granite 3.3-2B 0.09s 0.20s 25.4
llamacpp:Granite 4.0 Micro IQ4 0.10s 0.22s 24.3
llamacpp:Granite 4.0 Micro 0.11s 0.23s 18.9
llamacpp:Granite 4.0 H-Micro 0.13s 0.32s 17.6
llamacpp:Qwen3-4B 0.17s 0.30s 15.1
ollama:Granite 3.3-2B 0.23s 0.33s 25.8
llamacpp:Qwen3.5-2B 0.32s 0.51s 25.1
ollama:Granite 4-3B 0.36s 0.47s 18.5
ollama:Qwen3-4B 0.51s 0.65s 15.5
ollama:Llama 3.2-3B 0.53s 0.61s 19.1
ollama:Ministral-3 3B 0.59s 0.73s 19.5
ollama:Nemotron-3 Nano 4B 1.02s 1.56s 15.6
ollama:Qwen3.5-2B 1.03s 1.31s 22.2

Still a work in progress, especially around barge-in during TTS playback.

Repo: https://github.com/aschweig/jetson-orin-kian

There are also some qualitative benchmarks and more detail in the PDF.