Running LLMs in-browser via WebGPU, Transformers.js, and Chrome's Prompt API—no Ollama, no server

Posted by psgganesh@reddit | LocalLLaMA | View on Reddit | 4 comments

Been experimenting with browser-based inference and wanted to share what I've learned packaging it into a usable Chrome extension.

Three backends working together:

Models cache in browser and chat messages stored in IndexedDB, works offline after first download. Added a memory monitor that warns at 80% usage and helps clear unused weights—browser-based inference eats RAM fast.

Curious what this community thinks about WebGPU as a viable inference path for everyday use. Hence I built this project, anyone else building in this space?

Project: https://noaibills.app/?utm_source=reddit&utm_medium=social&utm_campaign=launch_localllama