Is Min P sampling really the preferred modern alternative to Top K/Top P?

Posted by bgravato@reddit | LocalLLaMA | View on Reddit | 16 comments

According to what I've been reading (and also according to all models I've asked about this), the consensus seems to be that Min P is the better/more modern approach to sampling and that it should be preferred over Top P/Top K, which should be used only if Min P isn't available or for legacy reasons...

Yet, looking and recently published LLM on huggingface and elsewhere, the recommended parameters for sampling are still largely Top K and/or Top P. Is this only for legacy reasons? Or some other reason?