Using logit steering / KV Cache Dynamic Assembly to guide outputs from Small Language Models using ONNX Runtime
Posted by shamanicalchemist@reddit | LocalLLaMA | View on Reddit | 2 comments
I've been using ONNX browser based runtime to do experiments with logit steering ad I've been seeing shocking improvements over baseline generation. This is a Qwen 2.5 0.5B.... I really like the live token stream probability observation system. I got tired of not being able to see this.


Silver-Champion-4846@reddit
Is this slop?
shamanicalchemist@reddit (OP)
Of course it is... This model is microscopic. What do you expect? For less than 800mb I think it's doing a terrific job.