Using logit steering / KV Cache Dynamic Assembly to guide outputs from Small Language Models using ONNX Runtime

Posted by shamanicalchemist@reddit | LocalLLaMA | View on Reddit | 2 comments

I've been using ONNX browser based runtime to do experiments with logit steering ad I've been seeing shocking improvements over baseline generation. This is a Qwen 2.5 0.5B.... I really like the live token stream probability observation system. I got tired of not being able to see this.