[MIT] RLCR: Teaching AI models to say "I'm not sure"

Posted by Zyj@reddit | LocalLLaMA | View on Reddit | 14 comments

Confidence is persuasive. In AI systems, it is often misleading.

Today's most capable reasoning models share a trait with the loudest voice in the room: They deliver every answer with the same unshakable certainty, whether they're right or guessing. Researchers at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) have now traced that overconfidence to a specific flaw in how these models are trained, and developed a method that fixes it without giving up any accuracy.

[-]

datbackup@reddit

Intuitively, this seems like a fool’s errand.

Imagine the following interaction:

User: “what is the capital of France?”

Assistant: “I’m not sure but it may be Paris.”

I’d rather the model be confidently wrong than full of this sort of “hedge slop”.

The real issue is that the model can never be certain nor uncertain since it has no subjective perspective of its own. Teaching it to say “I’m not sure” just shifts the entirety of its output to fall more towards the parts of the training data that talked with uncertainty.

[-]

Imaginary-Unit-3267@reddit

This is false. Logit entropy, which basically is uncertainty, is objectively measureable and is likely predictable to the model itself based on token patterns also.

[-]

datbackup@reddit

This is false. Logit entropy, which basically is uncertainty, is objectively measureable and is likely

“basically” is doing some very heavy lifting here, and “likely” isn’t much better, and making an absolute statement like “this is false” seems incongruous with your use of these qualifiers

if it were phrased it as “there may be a method of measuring how far in/out of distribution the current sequence is” I could be persuaded to see some plausibility and value in that, but equating this to “how certain the model is”… is just not scientific imo

[-]

Imaginary-Unit-3267@reddit

And yet you figured out what I meant anyway.

[-]

Dabalam@reddit

I’d rather the model be confidently wrong than full of this sort of “hedge slop”.

Why?

I would prefer that information with a lot of coroborating evidence be expressed with certainty and ones with little evidence and source backing would be expressed with uncertainty (which is better than most humans when talking about their beliefs). I don't see how confidently incorrect is useful, we just enjoy seeing confident answers more across the board.

[-]