Need Help Choosing a Harness for Qwen 3.6 27B

Posted by GrungeWerX@reddit | LocalLLaMA | View on Reddit | 6 comments

I've burned a week trying to customize my agent manually - building my own front end - but I've gotten to the point where I'm just exhausted and willing to try a harness, but need the right one. I read posts all the time, but I have a specific use case, so I'm reaching out to the best of the best for suggestions.

Here is my stack:

Where I am right now:

I'm using LM Studio because it just works. I tried llama.cpp w/openwebui and rage quit, was just slower and not same features I'm used to. Cass - my agent - works fine at Q5, but fills up context fast because o/mcp. (I know, I know) To help out, I switch to Q4 @ Q4 KV to get up to 200K and it works surprisingly well, but I figured if I spawn sub-agents I can pass that mcp context to them and just respawn for new tasks.

I had Cass write an agent spawner and it works fine. The trick works - the mcp context hits the subs and I can chat w/Cass longer - but I can't see what the sub-agent is doing/thinking/etc. I had cass build a dashboard for sub-agents that sorta worked, but there were just...issues. Cass couldn't see the agent's stream until it was finished and sometimes thought it timed out when the sub was still working. I searched and figured I'd have the sub stream its output to cass, but to properly see all this, I figured I'd need a custom front end.

Additionally, I want to run a process in parallel via cpu - a meta analysis agent - and I need a way to monitor its outputs as well. So, we're talking at minimum 2 agent outputs (main, meta) and then a third during spawn.

I watched some vidz last night about pi agent. I'm not sure this is what I need - I want to use mcp tools. But I'm good using other tools as long as I can still read/write to redis and postgres.

Also, I want to add a small agent that intercepts incoming chats and injects memories/context/etc (I'll set this manually) prior to the main agent getting the message. A sort of prefill context packet.

What I need is a harness that enables the following:

What's the simplest open source harness that will allow this? I'm not interested in any cloud models, only local and what can fit in my gpu. I'm happy w/my current agent, but I need some minor automation and management tools that I really don't have time to build myself.

Thanks in advance for any suggestions.