Running SmolLM Instruct on-device in six different ways

Posted by hackerllama@reddit | LocalLLaMA | View on Reddit | 4 comments

Hi all! Chief Llama Officer from HF here 🫡🦙 The team went a bit wild during the weekend and decided to release on Sunday SmolLM Instruct V0.2 , which are 135M, 360M, and 1.7B instruct models with Apache 2.0 license and open fine-tuning scripts and data so anyone can reproduce. Of course, the models are great for running on-device. Here are six ways to try them out 1. Instant SmolLM using MLC with real-time generation. Try it running on the web (but locally!) [here](https://huggingface.co/spaces/HuggingFaceTB/instant-smollm). 2. Run in the browser with WebGPU (if you have a supported browser) with transformers.js [here](https://huggingface.co/spaces/HuggingFaceTB/SmolLM-360M-Instruct-WebGPU). 3. If you don't have WebGPU, you can use Wllama which uses GGUF and WebAssembly to run in the browser, as you can try [here](https://huggingface.co/spaces/ngxson/wllama) 4. You can also try out the base model through the [SmolPilot demo](https://huggingface.co/spaces/cfahlgren1/SmolPilot) 5. If you're more of the interactive running folks, you can try this two-line setup `pip install trl` `trl chat --model_name_or_path HuggingFaceTB/smollm-360M-instruct --device cpu` 1. The good ol' reliable llama.cpp All models + MLC/GGUF/ONNX formats can be found at [https://huggingface.co/collections/HuggingFaceTB/local-smollms-66c0f3b2a15b4eed7fb198d0](https://huggingface.co/collections/HuggingFaceTB/local-smollms-66c0f3b2a15b4eed7fb198d0) Let's go! 🚀