nvidia/Orchestrator-8B · Hugging Face
Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 18 comments
Orchestrator-8B is a state-of-the-art 8B parameter orchestration model designed to solve complex, multi-turn agentic tasks by coordinating a diverse set of expert models and tools.
On the Humanity's Last Exam (HLE) benchmark, ToolOrchestrator-8B achieves a score of 37.1%, outperforming GPT-5 (35.1%) while being approximately 2.5x more efficient.
https://huggingface.co/bartowski/nvidia_Orchestrator-8B-GGUF
RandumbRedditor1000@reddit
Anyone else getting tired of these and just want a good base model for people to finetune?
jacek2023@reddit (OP)
what are your problems with existing models?
RandumbRedditor1000@reddit
Just that they're old and lack a level of "awareness" that newer models have.
But newer models are all trained on synthetic data that makes them score high on specific benchmarks but useless for any kind of rp or chat. Nemotron-12B is the best model we have for fine-tuning and it's old at this point.
CYTR_@reddit
The world doesn't care about your models for cyber-GF or role-playing games. R&D should focus on models that have useful applications.
PorchettaM@reddit
For better or worse, a glance at the most used apps on OR will quickly disprove that.
my_name_isnt_clever@reddit
Companies don't use OR apps, they use their own infrastructure. I guarantee if you added up all the LLM inference happening in the world, the RP and gooning would be a tiny fraction compared to the business uses. Not to mention the obvious that nobody wants to touch monetizing NSFW content right now.
MitsotakiShogun@reddit
So tasks that can only do damage to a company's reputation, and with 0 practical application for them, and probably only useful to ~~horny Redditors~~ cultured connoisseurs? Yeah, sounds like a great idea.
pokemonplayer2001@reddit
"~~horny Redditors~~ cultured connoisseurs?"
🤣🤣🤣
StardockEngineer@reddit
What are you bitching about? To base model is linked right in this repo. So is the training data. And the framework used is open source. You could grab all of it and make what you want out of it.
pokemonplayer2001@reddit
"Grrr, I'm so mad something is happening that is optional to use that is not perfectly aligned with my goals! Reeeeee"
🙄
RandumbRedditor1000@reddit
I'm just tired of benchmaxxed models trained on a ton of synthetic data to be good at very niche, very specific benchmarks rather than being actually intelligent chatbots
__JockY__@reddit
There are people who are not you. Some of these “not you” entities need different things than you. This is ok. This is progress. This is useful.
The fact that state-of-the-art efficient agentic LLMs have no relevance on your life is more a reflection on you than it is on the model.
pokemonplayer2001@reddit
But smaller, focused models are the future.
RandumbRedditor1000@reddit
I want smaller models, preferably around 20-30B that don't have the awful chatgpt style of writing.
I dont have much use for an AI model that can do very niche problems only mathematicians understand.
Coding is useful, but small models aren't very good at general coding. They're good at the benchmarks and basically nothing more.
Double_Cause4609@reddit
Actually, isn't this exactly the kind of model you want for your purposes?
We're hitting a limit on the ability of traditional LLMs to advance in roleplay and chat domains, specifically because they can't necessarily model things that don't happen in discrete tokens. For example, an LLM can't model a character's internal thoughts.
But a model that could orchestrate agents could dynamically allocate time for an LLM agent to do a soliloquy, and verbalize a character's internal thoughts or motivations, and that could inform the final generation that's presented to the user.
In fact, under such a setup, you could use this model to glue together old, creative models (to narrate directly with rich prose), while also getting the benefit of modern, but "dry" (in your opinion) models that have strong reasoning performance.
This model is *literally* exactly what you're asking for.
And even if you feel otherwise, you can just instruct-tune a base LLM for roleplay on a style of prose and diction that you enjoy. Almost all of what you're talking about can be imparted during an instruction-tuning phase.
SanDiegoDude@reddit
For an agentic stack, this would be your 'task coordinator' model that organizes the task list then makes the tool calls and downstream calls to operator/executor agents. This isn't a "chat with my buddy" LLM, this LLM's purpose is to make sure the subtasks are getting processed properly. It's small and lightweight and fast so it can coordinate the larger, slower models actually getting the work done asynchronously.
TheLexoPlexx@reddit
This is really neat, I should look into how to use this.
jacek2023@reddit (OP)