Running SmolLM Instruct on-device in six different ways
Posted by hackerllama@reddit | LocalLLaMA | View on Reddit | 4 comments
Hi all!
Chief Llama Officer from HF here 🫡🦙
The team went a bit wild during the weekend and decided to release on Sunday SmolLM Instruct V0.2 , which are 135M, 360M, and 1.7B instruct models with Apache 2.0 license and open fine-tuning scripts and data so anyone can reproduce.
Of course, the models are great for running on-device. Here are six ways to try them out
1. Instant SmolLM using MLC with real-time generation. Try it running on the web (but locally!) [here](https://huggingface.co/spaces/HuggingFaceTB/instant-smollm).
2. Run in the browser with WebGPU (if you have a supported browser) with transformers.js [here](https://huggingface.co/spaces/HuggingFaceTB/SmolLM-360M-Instruct-WebGPU).
3. If you don't have WebGPU, you can use Wllama which uses GGUF and WebAssembly to run in the browser, as you can try [here](https://huggingface.co/spaces/ngxson/wllama)
4. You can also try out the base model through the [SmolPilot demo](https://huggingface.co/spaces/cfahlgren1/SmolPilot)
5. If you're more of the interactive running folks, you can try this two-line setup
`pip install trl`
`trl chat --model_name_or_path HuggingFaceTB/smollm-360M-instruct --device cpu`
1. The good ol' reliable llama.cpp
All models + MLC/GGUF/ONNX formats can be found at [https://huggingface.co/collections/HuggingFaceTB/local-smollms-66c0f3b2a15b4eed7fb198d0](https://huggingface.co/collections/HuggingFaceTB/local-smollms-66c0f3b2a15b4eed7fb198d0)
Let's go! 🚀
4 Comments
Ill-Still-6859@reddit
codenamev@reddit
loubnabnl@reddit
estrafire@reddit