Are people testing ensembles of small size reasoning LLM agents (assuming different models) and do they perform well on the same / shared task?

Posted by Mental-At-ThirtyFive@reddit | LocalLLaMA | View on Reddit | 1 comments

I am assuming this is a reasonable step in world of multi-agents, orchestrations and harnesses - is there any references to this type of work being done

[-]

FoxiPanda@reddit

So I'm ... kind of? doing this. I have several local models that I like currently, but I use them for sub-agents and not multiple main agents.

I'm building out a mechanism that allows me to write one prompt to an orchestrator agent:

This spawns subagents across multiple models with the same prompt.
As responses come back in from the subagents, the orchestrator agent gathers the responses and aggregates the results and does dedup / keeps the best of all responses.

This allows you to get the best of all local models (qwen/gemma/nemotron/mistral/etc) but still using a frontier/cloud orchestrator model or large local orchestrator (like GLM5.1).

Obviously this is best suited for non-agentic tasks...you don't want 3 subagents all trying to execute the same task at the same time stepping all over each other... I'm still working on figuring out how to make that part better for local models -- I think that comes down to harness capability more than the underlying model though.