why is paraphrasing still such a hard problem for both humans and models?

Posted by Character_Ball6746@reddit | learnprogramming | View on Reddit | 7 comments

while learning programming lately ive been noticing something interesting when reading documentation and tutorials

even if i fully understand a concept, the moment i try to explain it again in my own words, my explanation still ends up following alot of the same structure as the source material

sometimes i change the wording completely but the flow of the explanation still feels almost identical

what made this more interesting to me is that language models seem to struggle with the exact same thing

they either stay too close to the original phrasing or change things so aggressively that the actual meaning starts drifting

from a learning perspective it makes sense because alot of technical explanations already follow similar logical patterns, especially in programming docs where clarity matters more than style

but from an ml perspective it also feels like current training objectives probably dont capture “true originality” very well beyond surface level variation

curious how other people here think about this

is this mostly a data and training issue, or is paraphrasing fundamentally harder than it first appears once meaning preservation becomes important?

[-]

Different-Duck4997@reddit

compression analogy is spot on - trying to repack same information always leaves some fingerprints from original structure

mxldevs@reddit

There are millions of ways to express the same idea, but if you were given a constraint that you had to be as concise as possible, the number of options quickly disappear.

Often times you need to explain A before talking about B which is built on top of A, so you can't really just randomly reorder the way the information is presented.

It's like how many ways are there to express 1 + 1 = 2? There are certainly an infinite number of equivalent expressions, but if you had to minimize the number of tokens?

roger_ducky@reddit

Paraphrasing is a multi step process:

Parse original text into multiple points.
Determine which points are relevant for the audience expecting the paraphrase.
Filter down to those specific points.
Reword the points so they make sense, even if it does not retain 100% of the original meaning in the text.

Most people skip step 2. LLMs, when told to do it without knowing the audience, would just be conservative and retain all of the original meaning.

It's the same reason that you can hear an explanation of an algorithm but can't necessarily implement it. Paraphrasing, and coding, requires comprehension. If you know the words of the explanation but haven't actually grokked the underlying idea, then you can't explain it in a different way.

nightonfir3@reddit

LLM's do not understand anything. They do not have a knowledge store or reasoning system. Giving correct information is an emergent property of guessing the next most likely word to come in a response. It surprised researchers when it was getting right answers because they were not trying to do that.

BeginningOne8195@reddit

I think paraphrasing is way harder than people assume because you’re trying to preserve the exact meaning while also changing the structure, and those two goals naturally fight each other a bit.