Prompt Repetition Improves Non-Reasoning LLMs - a paper
Posted by Foreign-Beginning-49@reddit | LocalLLaMA | View on Reddit | 55 comments
https://arxiv.org/pdf/2512.14982
I love these little tiny prompt performance that can potentially lead to greater model accuracy and performance. Simply repeating the pro.pt twice lead to notable performance gains.
From the paper:
"We show that repeating the prompts consistently improves model performance for a range of models and benchmarks, when not using reasoning. In addition, latency is not impacted, as only the parallelizable pre-fill stage is affected. Prompt repetition does not change the lengths or formats of the generated outputs, and it might be a good default for many models and tasks, when reasoning is not used.
So simple but they demonstrate impressive gains on several benchmark scores. Looks like Deepseek is the only open weights model put through the wringer.
Best of wishes.
Which_Bedroom_4790@reddit
Pretty wild that something so stupidly simple actually works this well. Makes me wonder how many other obvious tricks we're missing just because nobody bothered to test them systematically
Kinda embarrassing for the field that "just say it twice lol" is a legitimate optimization strategy
ResidentPositive4122@reddit
Interesting that "reasoning" models tend to start by "the user wants to ..." or "the problem asks us to..." and so on. RL seems to have "found" this one weird trick because that's what RL does :)
mal-adapt@reddit
It's even more fun when you remember that the primary function which the model is solving for, is next-token prediction—relative to the input—even , when projected such that it's framed as, predicting the continuation as if spoken back to you, instead... meaning the first thing the model,always had to do, every time, is the song and dance of deriving the perspective of the input; fundamentally, the only way—that, anything, ever, can even begin—to, continue, the input, of any other thing, successfully, is by implicitly, modeling the perspective of the thing, you are continuing. Doesn't matter how you do it—you just literally have to.
So, everytime, any model outputs a trace, or response of the kind lile, "So the user thinks that,", or "I see, now, the User us raising a clever point", ", or "Its is fascinating, how the User's ideas are all coming together, they might really be onto something!"—these kinds of responses & traces which model, the model, responding as if being "rhetorically persuaded" by–or as if it's, "sycophantically" fawning over, effectively, the 'internal logical consistency', of—whatever concepts or ideas which you provided to the input prompt, in general...
Well there would be, maybe be, a touch less, AI psychosis out there, happening, if models were trained to be a bit, more rhetorically clear about whose perspective, is the one, you see, gassing up your big brain, when the model is responding… cause it's uh, explicitly, not the model's perspective, we do just make it pretend it is. In general, the put falls for therapy too, become a lot more, immediately and horrifying apparent, when we can frame the problem as, a 'therapist' which has to borrow their clients perspective, to see anything with.
Anyway, a lot of neat, unfixable stuff, is implicit in the model's little preference, for restating the user's question/idea/perspective.
Firm_Spite2751@reddit
speaking of ai psychosis..
mal-adapt@reddit
Bleh, it’s really hard to elegantly or concisely describe something which possesses multiple perspectives to describe, simultaneously, about the thing.
I don't care if you read this--you already have more than enough reason to not trust a wall of text from me--but it motivated me to try a swing again at describing the simple properties I was trying to.
What I mean,
The claims what follow,
This is just an interesting nuance, which is I rarely see considered directly, so tried to describe it, and failed miserably.
Or I could be completely insane, and none of this makes any sense, I'll give myself 40/60 odds in the house’s favor against me, on that one.
Firm_Spite2751@reddit
The reason I said that was because you are stating very surface level insights in a very grandiose way that gives the appearance of depth without actually having any.
crantob@reddit
I don't see it as grandstanding but trying to characterize the shape of the concept by descriptive means.
Professionals tend to 'collapse' such things into 'fachsprache' - domain-specific terms, once the general consensus has congealed around concepts and the paths become well worn in the minds of the adepts.
Hhh2210@reddit
Yes. Motivated by the tendency of large reasoning models (LRMs) to restate the input question, we propose a dedicated study analyzing this phenomenon. We argue that such restatement behavior is not merely a superficial artifact, but likely reflects an intrinsic mechanism of the model’s reasoning process.
Notably, similar patterns can already be observed in earlier works from 2022, such as Chain-of-Thought Prompting and Let’s Verify Step by Step, where models explicitly rephrase or restate the problem before proceeding with intermediate reasoning. This suggests that question restatement may be a general and persistent feature of reasoning-oriented language models rather than an isolated design choice.
https://openreview.net/forum?id=vndn1Wrult
IrisColt@reddit
RL... from reinforcement learning to real life
sautdepage@reddit
Assuming equal results, it is 10-50x more efficient to do it in the prompt than via reasoning as text generation is much slower.
Hhh2210@reddit
You can also check our ICLR 2026 paper, where we analyze this phenomenon as part of LMs’ tendency to restate the question:
https://openreview.net/forum?id=vndn1Wrult
Related ideas were independently proposed in two earlier papers (NAACL 2024 and EMNLP 2024). They may be less visible simply because they did not come from Google.
IrisColt@reddit
Apparently that works for people, too.
brahh85@reddit
That works for people too.
Foreign-Beginning-49@reddit (OP)
Yeah it feels like an even cheaper "hack" that those early original days of "just ask it to think step by step" cot explorations and experiments.
night0x63@reddit
Yeah I was thinking the same thing. All the fancy thinking models are just doing prompt to think hard and step by step. Haha.
ButCaptainThatsMYRum@reddit
I've got a toddler who's starting to learn to speak. I'm wondering how much of this will transfer.
night0x63@reddit
Most people don't know this but the reasoning and think models all came about from non thinking models that were prompted to think harder and deliberate.
So just another prompt hack.
I would love to see examples before and after.
Worldly_Evening2609@reddit
Don't see why this is causing so much sensation, except that many just don't bother to read the original paper. It may be a bit counterintuitive but the mechanism isn't anything inspiring. It's just breaking a cycle into 2 repetitions, the kind of trick widely applied to enable cyclic context. Anyway it suggests little about how to use or train models, as the paper itself has already pointed out that results on reasoning models are basically neutral, and the future developments the authors suggest look plentiful but are really just truism.
No denying that this is an interesting paper that corroborates our understanding about causal regression, but it might be not a deal THAT big.
reddit_7heaven@reddit
in china we have a saying "重要的事情说三遍", means repeat three times if things are important, so it applies to LLMs too
Wheynelau@reddit
It would be nice if they can integrate some way of reading future tokens, like what PrefixLM tried. But I guess this is the easiest way to integrate without having to do architectural changes.
radarsat1@reddit
Will have to read the paper but isn't this similar to just biasing the attention towards the original prompt? if the softmax ends up with weight
xon the prompt, twice, then wouldn't it be mathematically the same as setting the weights for the original prompt to2x?FGLsc@reddit
They used p-value < 0.1? That is an extremely lenient alpha. Very low bar for establishing evidence.
DHasselhoff77@reddit
Do you think this finding is likely to be have been caused by statistical variation?
Ink_code@reddit
another somewhat similar paper in idea: Re-Reading Improves Reasoning in Large Language Models
ttkciar@reddit
This totally makes sense to me. I've been doing something similar when my prompts are large, by making the "core" instruction the first sentence in my prompt, followed by supplementary information and instructions, and then repeating the "core" instruction as the last sentence in the prompt.
It works really well, even with "thinking" models.
ItsNoahJ83@reddit
While your approach is effective (I've used it myself to great effect), the researchers emphasize that the performance improvements they observed depend heavily on repeating the entire prompt multiple times. Without the full repetition the gains are significantly smaller. They also found that repeating the prompt three times outperforms two in data retrieval tasks. Really fascinating
wektor420@reddit
From what I see they have used it only during inference
Maybe I should try it during training?
a_beautiful_rhind@reddit
Oh no no no. Labs will use this technique and the model will learn to reply twice.
DinoAmino@reddit
I have been doing this with non-reasoning models - sort of - ever since I saw this post almost 2 years ago. https://www.reddit.com/r/LocalLLaMA/comments/1cvpjxu/tell_the_llm_to_repeat_the_question_an/
In my variation I ask it to "rephrase the instruction to demonstrate your understanding of the request." So it was more of an inference-time-compute trick running alongside the old "think step-by-step".
a_beautiful_rhind@reddit
Oh hey... They trained on this and now we have a parroting problem. A chunky portion of my sysprompt is now spent undoing this little lifehack.
Clueless_Nooblet@reddit
It's similar, but not the same. You're asking the model to evaluate its output by repeating the task before it sends the output to the user, which causes it to catch the problem.
What this paper does is different; it does, which is one prompt where the task is simply stated twice. You do . These prompts work, because a non-reasoning model only knows tokens "behind" it, not tokens "ahead", and both methods achieve this, but in different ways.
FullOf_Bad_Ideas@reddit
Can we repeat the prompt 30 times to get AGI?
Thick-Protection-458@reddit
Now I wonder if there us a way to train model to use bidirectional attention for user prompt (and previous responses) but not latest response, hm.
Thick-Protection-458@reddit
Okay, thinking about it - should be possible to prototype simple version through just redefining the way attention mask converted to 4D form and using lora/relora approaches to finetune existing model.
Hm.
So the only thing I need now is a good multiturn instruction dataset
Thick-Protection-458@reddit
Oh, it seems it were *probably* implemented already in https://arxiv.org/html/2405.14862v1
Need to read the paper and critique to see how much bonus we are getting this way and, maybe, make some toy experiment, though.
axiomaticdistortion@reddit
Like a person or a junior dev
7ven7o@reddit
Very interesting, I thought attention meant that all tokens would already be attending to all other tokens, and would have guessed that this would have provided no benefit. Very interesting to be wrong here.
If doing this doesn't just duplicate whatever work's already been done, then maybe is it sort of providing the LLM with more "space" to flex and represent things with numbers?
It's not like they're trained to do this beforehand though, so the AI can't just be employing a trick, this must be some way of improving the systems already existent ability to bounce information around within itself.
Rokpiy@reddit
since it's only adding to pre-fill and not generation, this is basically free performance for batch inference scenarios where you're already bottlenecked on generation time anyway
-lq_pl-@reddit
Well, in hindsight, it does make sense, that's how attention works. If you trigger some latent vectors with your sentence then those latent vectors will be activated even more when you repeat the same sentence.
Our brains have a failsafe to tune down stimulus from repeated activations from the same pathways, but LLMs don't.
Southern_Sun_2106@reddit
I swear to God this technique was already discussed, like a year plus ago. I remember it because I tried it in a project back then because I've read about it here, and it did work well.
Morganross@reddit
as system prompts grow, small user prompts become a smaller and smaller % of the total. the original prompt can become a needle in a haystack.
CheatCodesOfLife@reddit
This is just like the trick from 2024, where you tell the model to repeat the question back verbatim before answering it.
mxforest@reddit
Yes.. this is not new. Used it 2 yrs ago. Even before "think step by step" COT hack.
mxforest@reddit
This has been known for a while. I read it on this sub almost 2 yrs ago. Some people repeated the prompt and some had a system prompt "when you start answering, repeat the previous message verbatim". Soon after, reasoning models came into picture and it was not that relevant.
FullstackSensei@reddit
Not exactly doing that, but my general pattern in the past year was to start with the problem description or question, then write the supporting context, then end with a "make sure you..." followed by the problem description or question again, phrased slightly differently. I think I found the OG Llama 3 performed much better when given a prompt like this, and have stuck to using it in anything I ask an LLM to do that is more than a couple of lines.
TBH, I'm surprised this is publication worthy given it's so simple. I'd just write a blog post about it.
OuterContextProblem@reddit
It's just part of doing science that you document even the simple or obvious, and see if it gets replicated. Or try to replicate it yourself and report your findings. Not every idea holds up or generalizes.
Accomplished_Ad9530@reddit
Reminds me of the paper "Just read twice: closing the recall gap for recurrent language models" by Simran Arora et al. way back in 2024: http://arxiv.org/abs/2407.05483
I hadn't checked in on Hazy Research in a while, but it looks like their blog is still going strong: https://hazyresearch.stanford.edu/blog
Chemical-Skin-3756@reddit
This is a very insightful paper. It’s impressive to see how such a straightforward technique can significantly elevate the performance of non-reasoning models. The fact that Gemini 2.0 Flash-Lite jumps from 21.33% to 97.33% accuracy in specific tasks just by repeating the prompt is remarkable.
I also find it particularly interesting that latency remains unaffected since the repetition is handled during the parallelizable pre-fill stage. Thank you for sharing this; I’ll definitely be putting this into practice.
frozen_tuna@reddit
This tracks with my experience too. I ended up putting the instructions at the top and bottom of my prompt in some desperate moves.
JadeSerpant@reddit
Goes to show just how little we understand LLMs and just how bad our current state-of-the-art architecture is.
PANIC_EXCEPTION@reddit
You have to wonder now, if half of the performance from agentic coders comes from sheer repetition of context.
Zc5Gwu@reddit
I mean people, I suppose, are similar. Flash cards, spaced repetition, memorization.
nuclearbananana@reddit
huh, I saw another paper prove this years ago and I use it regularly now, when dumping a lot of context.
Borkato@reddit
This is honestly kinda awesome lol
Revolutionalredstone@reddit
Makes perfect sense, they don't know what they are reading or why, it is not uncommon for the final word to change the whole meaning.
Without duplication the LLM can't even understand why it is reading a thing so it has to try to remember everything just incase.
I have always put the key details at both the start and the end ;D