Local LLMs > Cloud Hype? Efficiency is winning

Posted by No-Draft-116@reddit | LocalLLaMA | View on Reddit | 6 comments

Everyone’s obsessed with bigger models, but running them locally is where things get real.

Hot take - efficiency is the new benchmark. Not max params, not peak FLOPS - just how long your system runs without sounding like a jet engine.

I’ve been testing smaller quantised models on edge-focused chips (Mediatek mainly), and ngl, things have become more usable than I imagined :) like, fast responses, low power draw, and no cloud dependency anxiety.

I think we are basically entering the good enough locally > perfect in the cloud phase.

Also, weirdly don’t see many people talking about Mediatek for edge AI / vision workloads. Am I missing something, or is it just underrated right now?

Anyone want to share what setups are you all running for local LLMs right now?