I'm running an agentic system with kobold.cpp as my backend. Am I losing performance?

Posted by AlphaSyntauri@reddit | LocalLLaMA | View on Reddit | 5 comments

Currently, I'm running a Hermes agent with an OpenAI v1 compatible endpoint provided by Kobold. My setup is a a 24GB 3090Ti + 512GB DDR4 running Qwen3.6-35B-A3B.

I plan to move to a larger MoE model once I'm satisfied with how everything is working, but I'm just wondering if I'm sacrificing performance by not using llama.cpp standalone and relying on a program that's more focused on ease of use.

To my knowledge it's just a simple wrapper, but I'm curious if anyone has any experience swapping between Kobold and other local endpoints. Thanks!