I’ve noticed something about how people run models.

Posted by Savantskie1@reddit | LocalLLaMA | View on Reddit | 19 comments

As far as people seem to be concerned, almost everyone who says a model is crap, they always seem to evaluate a model by how it works by just giving it a few prompts. I never see anyone passing a system prompt that actually could help them. And I’m not meaning the typical example of telling it is a whatever type of expert. I’m meaning something that explains the environment and the tools it can use or anything like that.

I’ve learned that the more information you pass in a system prompt before you say anything to a model, the better the model seems to respond. Before I ask a model to do anything, I usually give it an overview of what tools it has, and how it could use them. But I also give it permission to experiment with tools. Because one tool might not work, but another may accomplish the task at hand.

I give the model the constraints of how it can do the job, and what is expected. And then in my first message to the model I lay out what I want it to do, and usually and invariably with all of that information most models generally do what I want.

So why does everyone expect these models to just automatically understand what you want it to do, or completely understand what the tools that are available if they don’t have all of the information or the intent? Not even a human can get the job done if they don’t have all of the variables.

[-]

Savantskie1@reddit (OP)

Yeah, the only thing when running locally that tokens cost is power. But I"ll say, I used to game literally every day, and now that I'm working on AI and not gaming because it's my new obsession over the last year, my bill hasn't gotten any more expensive than when I was playing games nearly 24/7. And i'm not running inference nearly as frequently as I was gaming. Yes, i keep the model loaded, but inference isn't 24/7 like when I was gaming.

[-]

Savantskie1@reddit (OP)

Qwen3.5 likes to have a system prompt and parameters for the conversation or how to act. I've found providing it a decently large system prompt, it does not over think. Same goes with most other models either.

[-]

Savantskie1@reddit (OP)

This! Right here! I was guilty of this a little in like the first week, but then thought about it, and decided to explain my mcp tools to the model and it got tool calls nearly 99 percent better.

[-]

Savantskie1@reddit (OP)

See, this is why I say explain the tools, so the LLM doesn't have to guess based on the tool handler's limited explanation. That way the model has better decision making.

[-]

Savantskie1@reddit (OP)

You mean the system prompt? Which depending on the inference platform can be sent every so often. Same with some frontends too. Why have an another layer do this?

[-]

Savantskie1@reddit (OP)

I’ve only been using LLMs for like a year and I’ve known this from day one.

[-]

Savantskie1@reddit (OP)

I’ve noticed that explanation of the tools at hand tends to stop the model from calling tools that don’t exist or existed in it’s training environment. It keeps the model on task and it always has a reference to look back on

sxales@reddit

That just sounds like working in IT: most users don't know what they are doing, and then they complain when it doesn't work.

last_llm_standing@reddit

Its not always the same, for a model im testing now, I truncate the system prompt (used while training ) and compared it against the full system prompt. surprisingly the results improved. I had a lot of "do nots in my original system promt", getting rid of them seems to improve the overall perforamce

audioen@reddit

In my opinion, system prompts should be needed for baseline response. If you have tools, the model template provides tool descriptions and you don't have to add them into the template.

Final_Ad_7431@reddit

a lot of the 'help my qwen3.5 is overthinking!' on this sub are people running the model with probably wrong params directly in lmstudio or some other raw chat interface for sure

Woof9000@reddit

"Waste" is a strong word. Even with some adjustments 9B Qwen still "spends" few thousands of tokens for thinking, on average, but if that helps it to sound like model 2-3 times it's actual size - is it really "wasted" tokens? I don't think so.

ustas007@reddit

Most people aren’t really testing the model—they’re testing their own prompt and calling it a benchmark. If you don’t define context, tools, and constraints, you’re basically asking the model to guess the rules of the game. Funny part is, we’d never expect a human to perform like that, but we expect AI to read our minds on the first try.

RoggeOhta@reddit

The bigger issue is that this skews every benchmark comparison people do. Someone tests Llama 3.3 70B vs Qwen 35B with a bare prompt, gets mid results from both, and concludes "local models suck." Same task with a proper system prompt and the gap between local and API models shrinks a lot. Smaller models especially benefit from system prompts because they have less implicit instruction following baked in. A 7B model with a good system prompt can outperform a 70B with none on structured tasks, I've seen it happen with tool calling specifically.

Witty_Mycologist_995@reddit

exactly bro

Upset_Letterhead@reddit

I think part of the problem is in the name (AI). I've been trying to push at work to ensure everyone uses the term LLM instead. This helps people understand this isn't actual artificial intelligence, it's a language model system. It can be great, but it's not this all knowing entity that can understand and more importantly - identify when it has context gaps.

I'm hoping some of the improvements we see in models is for them to continue to question themselves (and the user) more. I think they've made huge strides in this, but it still feels like they have a long runway for getting near human-level of cognition in understanding situations and personal context.