Batched reward model inference and Best-of-N sampling

Posted by retrolione@reddit | LocalLLaMA | View on Reddit | 0 comments

Quick blog post on reward model inference with dynamic batching (for llm as a judge, best of n sampling, preference tuning, and other RL use cases)