SmolLM3 has day-0 support in MistralRS!

Posted by EricBuehler@reddit | LocalLLaMA | View on Reddit | 5 comments

It's a SoTA 3B model with hybrid reasoning and 128k context.

Hits ⚡105 T/s withAFQ4 @ M3 Max.

Link: https://github.com/EricLBuehler/mistral.rs

Using MistralRS means that you get

Builtin MCP client
OpenAI HTTP server
Python & Rust APIs
Full multimodal inference engine (in: image, audio, text in, out: image, audio, text).

Super easy to run:

./mistralrs_server -i run -m HuggingFaceTB/SmolLM3-3B

What's next for MistralRS? Full Gemma 3n support, multi-device backend, and more. Stay tuned!

https://reddit.com/link/1luy32e/video/kkojaflgdpbf1/player

[-]

uhuge@reddit

Is https://pypi.org/project/mistralrs/ the easiest way to test this on Linux( Ubuntu)?

[-]

EricBuehler@reddit (OP)

Not yet, the release is not out yet! The python package should be installed based on any GPU or CPU acceleration you have available - mistralrs-cuda, mistralrs-mkl, mistralrs-metal, etc.

Will be in a few days for Gemma 3n. Check back then, or you can install from source!

[-]

TonsillarRat6@reddit

I'm curious if this has been updated already??

[-]

Green-Ad-3964@reddit

Is this good for RAG operations?

[-]

EricBuehler@reddit (OP)

Absolutely! The long-context + tool calling + reasoning are all great factors.