Need Help Choosing a Harness for Qwen 3.6 27B

Posted by GrungeWerX@reddit | LocalLLaMA | View on Reddit | 6 comments

I've burned a week trying to customize my agent manually - building my own front end - but I've gotten to the point where I'm just exhausted and willing to try a harness, but need the right one. I read posts all the time, but I have a specific use case, so I'm reaching out to the best of the best for suggestions.

Here is my stack:

Windows 10 | i7 12700K | RTX 3090 TI | 96GB RAM
Models: Qwen 3.5|3.6 27B UD K XL (Q4/Q5) - Also will be using 0.8B/4B in CPU parallel
Server: LM Studio
Apps: (in Docker) N8N, Redis (w/redisstack,redisinsight), Postgres (w/pgadmin,pgvector), Dify (installed, never used), browserless (never used)

Where I am right now:

I'm using LM Studio because it just works. I tried llama.cpp w/openwebui and rage quit, was just slower and not same features I'm used to. Cass - my agent - works fine at Q5, but fills up context fast because o/mcp. (I know, I know) To help out, I switch to Q4 @ Q4 KV to get up to 200K and it works surprisingly well, but I figured if I spawn sub-agents I can pass that mcp context to them and just respawn for new tasks.

I had Cass write an agent spawner and it works fine. The trick works - the mcp context hits the subs and I can chat w/Cass longer - but I can't see what the sub-agent is doing/thinking/etc. I had cass build a dashboard for sub-agents that sorta worked, but there were just...issues. Cass couldn't see the agent's stream until it was finished and sometimes thought it timed out when the sub was still working. I searched and figured I'd have the sub stream its output to cass, but to properly see all this, I figured I'd need a custom front end.

Additionally, I want to run a process in parallel via cpu - a meta analysis agent - and I need a way to monitor its outputs as well. So, we're talking at minimum 2 agent outputs (main, meta) and then a third during spawn.

I watched some vidz last night about pi agent. I'm not sure this is what I need - I want to use mcp tools. But I'm good using other tools as long as I can still read/write to redis and postgres.

Also, I want to add a small agent that intercepts incoming chats and injects memories/context/etc (I'll set this manually) prior to the main agent getting the message. A sort of prefill context packet.

What I need is a harness that enables the following:

Super simple gui (heck, even a terminal look like pi agent is fine I guess). I need to see current ctx size, max ctx size, and all tools. Needs to work w/images too.
Allows me to spawn sub-agents easily, set their individual system prompts, and choose their mcp tools.
Allows me a dashboard or monitor where I can view ALL of their outputs - thinking, tool use, etc.
A simple way to wire smaller agents' output to the main agent for "prefill". I read about redis agent memory server, but I want something that allows me to set up what type of data the smaller model transfers downstream.

What's the simplest open source harness that will allow this? I'm not interested in any cloud models, only local and what can fit in my gpu. I'm happy w/my current agent, but I need some minor automation and management tools that I really don't have time to build myself.

Thanks in advance for any suggestions.

[-]

Wrong_Mushroom_7350@reddit

Pi.dev… honestly it’s the ultimate boss on configuration, but it’s written in typescript(ai preferred language), and is minimalist approach is amazing, session control is awesome, extensions, and runs pretty quick, support for api connects if you need it, I run locally.

I tried claw-code(ended up removing it, was not a finished product) I looked at hermes seemed overwhelming to me on all that it try’s to do, I looked at open code, and aider.. but me personally I do not like the way Claude does things, so not interested in Claude code clones.

I honestly just did all these searchers, in the last 36 hours and that’s my personal opinion.

JPaulDuncan@reddit

https://github.com/JPaulDuncan/Alom/tree/main

xchaos4ux@reddit

it may be the model. running qwen here my self, autoround Q4 build in a self contained vllm . i have noticed in my testing scenario which should be relatively simple that Owen does not produce reproducible instructions each and every iteration. even if it is provided a previous working example. which has been frustrating in testing .

so you may want to do some tests on just the model your using to see if its reliable spitting out the same results each time. as that may be a possible problem in getting your work flow setup. and may need to mitigate that if its an issue. sorry i dont have much more on a work flow as i still use copy and paste and various text editors as the harness. just dont trust what the llms have provided me thus to branch out into mcp land just yet.