I'm the author of LocalAI (the local OpenAI-compatible API). We just released v3.7.0 with full Agentic Support (tool use!), Qwen 3 VL, and the latest llama.cpp

Posted by mudler_it@reddit | LocalLLaMA | View on Reddit | 17 comments

Hey r/LocalLLaMA,

I'm the creator of LocalAI, and I'm stoked to share our v3.7.0 release.

Many of you already use LocalAI as a self-hosted, OpenAI-compatible API frontend for your GGUF models (via llama.cpp), as well as other backends like vLLM, MLX, etc. It's 100% FOSS, runs on consumer hardware, and doesn't require a GPU.

This new release is quite cool and I'm happy to share it out personally, so I hope you will like it. We've moved beyond just serving model inference and built a full-fledged platform for running local AI agents that can interact with external tools.

Some of you might already know that as part of the LocalAI family, LocalAGI ( https://github.com/mudler/LocalAGI ) provides a "wrapper" around LocalAI that enhances it for agentic workflows. Lately, I've been factoring out code out of it and created a specific framework based on it (https://github.com/mudler/cogito) that now is part of LocalAI as well.

What's New in 3.7.0

1. Full Agentic MCP Support (Build Tool-Using Agents) This is the big one. You can now build agents that can reason, plan, and use external tools... all 100% locally.

Want your chatbot to search the web, execute a local script, or call an external API? Now it can.

2. Backend & Model Updates (Qwen 3 VL, llama.cpp)

3. Major WebUI Overhaul This is a huge QoL win for power users.

4. Other Cool Additions

As always, the project is 100% FOSS (MIT licensed), community-driven, and designed to run on your hardware.

We have Docker images, single-binaries, and more.

You can check out the full release notes here.

I'll be hanging out in the comments to answer any questions!

GitHub Repo: https://github.com/mudler/LocalAI

Thanks for all the support!