I'm the author of LocalAI (the local OpenAI-compatible API). We just released v3.7.0 with full Agentic Support (tool use!), Qwen 3 VL, and the latest llama.cpp

Posted by mudler_it@reddit | LocalLLaMA | View on Reddit | 17 comments

Hey r/LocalLLaMA,

I'm the creator of LocalAI, and I'm stoked to share our v3.7.0 release.

Many of you already use LocalAI as a self-hosted, OpenAI-compatible API frontend for your GGUF models (via llama.cpp), as well as other backends like vLLM, MLX, etc. It's 100% FOSS, runs on consumer hardware, and doesn't require a GPU.

This new release is quite cool and I'm happy to share it out personally, so I hope you will like it. We've moved beyond just serving model inference and built a full-fledged platform for running local AI agents that can interact with external tools.

Some of you might already know that as part of the LocalAI family, LocalAGI ( https://github.com/mudler/LocalAGI ) provides a "wrapper" around LocalAI that enhances it for agentic workflows. Lately, I've been factoring out code out of it and created a specific framework based on it (https://github.com/mudler/cogito) that now is part of LocalAI as well.

What's New in 3.7.0

1. Full Agentic MCP Support (Build Tool-Using Agents) This is the big one. You can now build agents that can reason, plan, and use external tools... all 100% locally.

Want your chatbot to search the web, execute a local script, or call an external API? Now it can.

How it works: It's built on our agentic framework. You just define "MCP servers" (e.g., a simple Docker container for DuckDuckGo) in your model's YAML config. No Python or extra coding is required.
API & UI: You can use the new OpenAI-compatible /mcp/v1/chat/completions endpoint, or just toggle on "Agent MCP Mode" right in the chat WebUI.
Reliability: We also fixed a ton of bugs and panics related to JSON schema and tool handling. Function-calling is now much more robust.
You can find more about this feature here: https://localai.io/docs/features/mcp/

2. Backend & Model Updates (Qwen 3 VL, llama.cpp)

llama.cpp Updated: We've updated our llama.cpp backend to the latest version.
Qwen 3 VL Support: This brings full support for the new Qwen 3 VL multimodal models.
whisper.cpp CPU Variants: If you've ever had LocalAI crash on older hardware (like a NAS or NUC) with an illegal instruction error, this is for you. We now ship specific whisper.cpp builds for avx, avx2, avx512, and a fallback to prevent these crashes.

3. Major WebUI Overhaul This is a huge QoL win for power users.

The UI is much faster (moved from HTMX to Alpine.js/vanilla JS).
You can now view and edit the entire model YAML config directly in the WebUI. No more SSHing to tweak your context size, n_gpu_layers, mmap, or agent tool definitions. It's all right there.
Fuzzy Search: You can finally find gemma in the model gallery even if you type gema.

4. Other Cool Additions

New neutts TTS Backend: For anyone building local voice assistants, this is a new, high-quality, low-latency TTS engine.
Text-to-Video Endpoint: We've added an experimental OpenAI-compatible /v1/videos endpoint for text-to-video generation.
Realtime example: we have added an example on how to build a voice-assistant based on LocalAI here: https://github.com/mudler/LocalAI-examples/tree/main/realtime it also supports Agentic mode, to show how you can control e.g. your home with your voice!

As always, the project is 100% FOSS (MIT licensed), community-driven, and designed to run on your hardware.

We have Docker images, single-binaries, and more.

You can check out the full release notes here.

I'll be hanging out in the comments to answer any questions!

GitHub Repo: https://github.com/mudler/LocalAI

Thanks for all the support!

[-]

Kitchen_Fix1464@reddit

I am not sure how I missed this for so long, but I just found your project and have manged to get it running on my Arc770. That has been an unreasonably painful task with most other tools. Thank you for supporting XPU out of the box!

One newb question, I have installed the cpu-neutts, but I am unsure how to use it. The API endpoints for TTS do not support the parameters for reference audio that it needs. I am sure I am missing something simple. If anyone can point me in the right direction, it would be much appreciated.

[-]

mudler_it@reddit (OP)

Happy to hear! About neutts, that's a good point, we actually missed having a model in the gallery for it, and there is no documentation still. You can see an example attached in the PR: https://github.com/mudler/LocalAI/pull/6404 ( you need to specify a voice reference file and a text transcription of it )

[-]

Kitchen_Fix1464@reddit

Thanks for the reply and the info. I did see the backend is available. Ill checkout the PR for details.

[-]

ridablellama@reddit

thanks for sharing as MIT i will have a look. I have been trying to smash together librechat with the qwen agent framework and this seems like it could be an option.

[-]

mudler_it@reddit (OP)

yup, you can definetly link any chat UI to be agentic now!

[-]

Ok-Adhesiveness-4141@reddit

Hello Mr Mudler,

What kind of llms are you running without GPU. The reason why I am interested in this is because I am working for an NGO that requires non-gpu llms and RAG.

[-]

drc1728@reddit

This update looks awesome! Full agentic MCP support locally is a big deal, especially for building multi-step reasoning agents without relying on external APIs.

With CoAgent (coa.dev), we tackle similar challenges by layering observability, evaluation, and tracing on top of agentic workflows. That way, you can see not just what the agent outputs, but why it chose a particular tool or response, and detect drift or errors across complex chains.

Excited to see the community experimenting with LocalAI + Cogito!

[-]

smarkman19@reddit

Agentic MCP support shines when you add tracing and guardrails. If you’re running LocalAI agents, split API and workers, queue tool runs in Redis and enforce per-tool timeouts and cancellation.

For data access, avoid raw SQL; I’ve used CoAgent and Langfuse, and DreamFactory to auto-generate RBAC REST endpoints over Postgres/SQL Server so agents only hit allow-listed routes; Supabase RPCs or PostgREST work too. Keep state in Postgres with pgvector, dedupe before embedding, rate-limit connectors, and use exponential backoff. OP’s WebUI editor is handy-commit YAML to git and run a CI smoke test that spins a container and validates schemas. Agentic MCP support shines most when paired with tracing and guardrails.

[-]

teddybear082@reddit

always thought your work was great from watching from afar but candidly I’ve never gotten a good grip on how to use it in windows. Probably in large part because I’ve never really gotten docker desktop to work easily. There’s not like a windows quick start guide anywhere is there?

[-]

mudler_it@reddit (OP)

Sadly not a windows user here, so can't really help and validate. I know that from the community there are windows users having no issues with WSL.

someone actually was contributing WSL scripts to set-up automatically LocalAI, but as I can't verify these were not picked up: https://github.com/mudler/LocalAI/pull/6377

[-]

teddybear082@reddit

Thank you I will check those out. That reminds me one time I think I did get this partially set up like a year ago but I could not figure out how to get WSL or Docker (whichever one it was) to use cuda. Anyway thanks for your work.

[-]

thereturn932@reddit

Docker works like shit on Windows unfortunately.

[-]

richardbaxter@reddit

This looks interesting. I'm desperately seeking a good Claude desktop like ui - I use it to automate content management with various mcps. Project knowledge is awesome (as are projects) because I can store prompts and guidelines.

I've got a local llm but so far I haven't really found the workflow that removes me from Claude

[-]

mudler_it@reddit (OP)

At this stage is probably not an equivalent replacement in term of UI to Claude desktop, but we will get there. The technical aspects are already working: it connects to your MCPs, does actions, etc. But the UI is still rough and doesn't display internal reasoning process (yet).

Probably github.com/mudler/LocalAGI (which is a LocalAI's related project) is better use here - you can plug your MCP agent directly to other apps, for instance, Telegram, and use that as interface.

[-]

Ok-Adhesiveness-4141@reddit

Sounds amazing, can't wait to try it out. Thank you for your amazing work.

[-]

mudler_it@reddit (OP)

Thanks! really appreciated!

[-]

PersonalCitron2328@reddit

Looks amazing, and I was very stoked to get this up and running on a Mac, however the fact that the only way for the app to work is via a workaround and/or building from source is quite a turn-off. It's much easier to "just run" one of the other solutions out there...