When are we getting consumer inference chips?

Posted by SnooStories2864@reddit | LocalLLaMA | View on Reddit | 133 comments

Dumb question but I genuinely don't get it. Billions of $ poured into AI startups the last few years and nobody has shipped a consumer chip with a model built in? Like a $200 stick that runs Llama 3 at reading speed, 30W, plug into your desktop, done.

Taalas is kinda doing this but only aimed at datacenters. Why tho? Today's OS models are already good enough for 90% of what most people actually need and will still be for years. The "model will be obsolete before the chip tapes out" argument feels weaker every month.

Starting to wonder if the whole industry is just trying to milk consumers through API subscriptions forever instead of selling the chip once. Feels like it would be trivially profitable to ship a $300 "Llama in a box" and call it a day but I guess no one wants the recurring revenue to stop.

What am I missing