Inference-time drift reduces repetition collapse in frozen Llama-3.1-8B (repo + reproducible script)
Posted by chazc2@reddit | LocalLLaMA | View on Reddit | 2 comments
I stumbled onto an odd behavior while experimenting with inference-only modifications:
By adding a small Gaussian drift term to an untrained fast-weight memory module and feeding it into a frozen Llama-3.1-8B model, long-form repetition collapse was significantly delayed.
No training, no LoRA, no fine-tuning, no KV cache edits. Model weights stay frozen.
This repo includes: - A minimal reproducible experiment (single .py) - A simple wrapper for inference-only usage - A replication thread for logs/results
Not claiming a breakthrough — just sharing something interesting that didn't behave the way theory predicts.
Repo:
https://github.com/chazciii/rd-net
If you try it on other model families (Qwen, Mistral, phi, GPTQ, GGUF, etc.), please share your results.
Chromix_@reddit
Related previous post with referenced ViXra paper. At least the repo is no longer private like in the previous post.
HasGreatVocabulary@reddit
Perhaps compare to Alibi, it sounds a bit similar to that but with random accumulative bias instead of linearly increasing bias