Another sampling strategy drops: 75% accuracy at T=3.0

Posted by tomorrowdawn@reddit | LocalLLaMA | View on Reddit | 13 comments

TL, DR:

threshold = logits.max(dim=-1,keepdim=True).values - n*logits.std(dim=-1,keepdim=True)
logits[logits<threshold] = float('-inf')

It is called top-nsigma, directly filtering out tokens using logits' information.

Imo the most interesting finding is: the logits are naturally split into two regions: a Gaussian noise region and an informative region. When the model is not confident enough or temperature is high, the gap between "meaningful" tokens and "noise" tokens shrinks, and noise tokens start sneaking into your sampling pool, downgrading the quality.

And this performance drop is serious - imagine spending millions on training a massive model just to have poor sampling mess up its outputs.

Check out the original github repo and top-nsigma has been merged into aphrodite-engine. (Honestly it's so simple you could probably whip it up in a few minutes). Feel free to try it out and let us know what you think!