[MIT] RLCR: Teaching AI models to say "I'm not sure"
Posted by Zyj@reddit | LocalLLaMA | View on Reddit | 14 comments
Confidence is persuasive. In AI systems, it is often misleading.
Today's most capable reasoning models share a trait with the loudest voice in the room: They deliver every answer with the same unshakable certainty, whether they're right or guessing. Researchers at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) have now traced that overconfidence to a specific flaw in how these models are trained, and developed a method that fixes it without giving up any accuracy.
datbackup@reddit
Intuitively, this seems like a fool’s errand.
Imagine the following interaction:
User: “what is the capital of France?”
Assistant: “I’m not sure but it may be Paris.”
I’d rather the model be confidently wrong than full of this sort of “hedge slop”.
The real issue is that the model can never be certain nor uncertain since it has no subjective perspective of its own. Teaching it to say “I’m not sure” just shifts the entirety of its output to fall more towards the parts of the training data that talked with uncertainty.
Imaginary-Unit-3267@reddit
This is false. Logit entropy, which basically is uncertainty, is objectively measureable and is likely predictable to the model itself based on token patterns also.
datbackup@reddit
“basically” is doing some very heavy lifting here, and “likely” isn’t much better, and making an absolute statement like “this is false” seems incongruous with your use of these qualifiers
if it were phrased it as “there may be a method of measuring how far in/out of distribution the current sequence is” I could be persuaded to see some plausibility and value in that, but equating this to “how certain the model is”… is just not scientific imo
Imaginary-Unit-3267@reddit
And yet you figured out what I meant anyway.
Dabalam@reddit
Why?
I would prefer that information with a lot of coroborating evidence be expressed with certainty and ones with little evidence and source backing would be expressed with uncertainty (which is better than most humans when talking about their beliefs). I don't see how confidently incorrect is useful, we just enjoy seeing confident answers more across the board.
foldl-li@reddit
Be sure to say "I am not sure".
_wsgeorge@reddit
This paper came out last year. Have any major models (open, proprietary, frontier etc) tried this technique?
TheRealMasonMac@reddit
Dunno if they use this specific technique, but Gemma 4 31B is pretty good about it.
oxygen_addiction@reddit
Anthropic obviously did something like this for Opus 4.5+ That was the first model that didn't do the "You are absolutely right".
Silver-Champion-4846@reddit
Still does it in my Arena.ai tests
Eyelbee@reddit
Isn't this basically what is used today?
Quagmirable@reddit
Reminds me of the scoring model on some multiple-choice standardized tests, dock 1 point if you leave it blank, dock 1.5 points if you answer it wrong.
PeachOk54@reddit
That's cool
Gold-Drag9242@reddit
I tryed to tell the model to use language that reflects the certainty of the facts it states.
Not sure it worked