I rue the day they first introduced "this is not X, this is <unearned superlative>' to LLM training data
Posted by Comfortable-Rock-498@reddit | LocalLLaMA | View on Reddit | 101 comments
- This isn't just a bug, this is a fundamental design flaw
- This isn't just a recipe, this is a culinary journey
- This isn't a change, this is a seismic shift
- This isn't about font choice, this is about the very soul of design
- This isn't a refactor, this is a fundamental design overhaul
- This is't a spreadsheet, this is a blueprint of a billion dollar business
And it seems to have spread to all LLMs now, to the point that you have to consciously avoid this phrasing everywhere if you're a human writer
Perhaps the idea of Model Collapse (https://en.wikipedia.org/wiki/Model_collapse) is not unreasonable.
noage@reddit
It's a pretty strong rhetorical device, when it applies. So, done strong works include it, when it works. Llms think hey, it always works. This is a flaw in how llms operate in general
eiva-01@reddit
Additionally, it was likely given positive reinforcement by human evaluators who recognised it as a strong rhetorical device, before we started to recognise it as an overused pattern.
I'm sure it'll get trained out in the coming iterations of LLMs, but new cliches will probably emerge to replace them. It'll be a game of whack-a-mole for a little while.
IxinDow@reddit
It's futile without continuos learning
Comfortable-Rock-498@reddit (OP)
True. Same thing about analogies. LLMs love to force analogies that are barely coherent, usually spamming Car/Engine/Fuel analogy or Hardware/Software/Operating-System analogy.
c--b@reddit
analogies can be useful for generalizing knowledge, I've suspected they were introduced to trigger the LLM to included knowledge from other domains. As for whether that's working or not, I don't know.
munster_madness@reddit
This, like most of the sycophancy in LLMs, comes from SFT. These models were fine tuned by human users who are given multiple responses to a given prompt and then have to rank them based on how much they like each response. Turns out people like having their ego stroked, so those kinds of responses got the highest scores and the models were tuned to give those kinds of responses more often.
Ylsid@reddit
It isn't just a strong rhetorical device, it's the fundament of literature
_supert_@reddit
If by fundament you mean arse.
Ylsid@reddit
Not just an arse, but the human posterior
-dysangel-@reddit
not just the posterior, but the dangly parts of the anterior
IxinDow@reddit
now this is not a fundament, just a slop
Ylsid@reddit
This isn't just slop, it's the greasy school lunch leftovers
typical-predditor@reddit
Thus "unearned" in the post title. Yes, it's powerful, but... You already stated why it's bad.
molbal@reddit
"It isn't a bug, it's how LLMs operate in general"
maneo@reddit
I think it comes up so much because it is a sentence structure that is often used in great writing. The problem is that the use of that structure doesn't automatically make for great writing. Worse, GPT doesn't really understand how to judge whether a particular use of it was any good.
It's similar to the em dash problem. A lot of great writing has em dashes all over the place. But GPT doesn't have the strongest grasp on why it is used in certain situations vs using something else. The result is a severe overuse of it.
And another example is the general use of metaphor/simile (at least for GPT 4o). You will reach a point in the text where a decent writer might draw some kind of comparison to help you understand a concept they just explained. GPT will recognize that it's a good opportunity to do that, but its just bad at metaphors and similes, and comes up with one's that just feel... off. Now I find myself cringing anytime I see any metaphor or simile get used like that, regardless of the quality of the metaphor. It's like feeling nauseous from the mere smell of fish after suffering from sushi-induced food poisoning for an entire two week vacation in Japan! (see what I did there?)
HarleyBomb87@reddit
This isn’t just an opinion, it’s a a referendum on the state of LLMs.
NNN_Throwaway2@reddit
LLMs fundamentally suck at writing because they reproduce patterns without context. Same reason why they can't write jokes. That and they've probably been trained on huge piles of shitty fanfiction and trashy novels because AI labs were desperate for any training data they could get their hands on, regardless of quality.
Serprotease@reddit
They suck at writing because the benchmarks do not care about writing. ‘Show me the incentive and I will tell you the outcome’ and whatnot.
If writing quality was added as a standard benchmark, I’m sure we would have seen some good progress here.
Lying__Cat@reddit
You can’t really benchmark “writing quality” it’s subjective. AI labs are obviously working to improve it, since most people use llms to generate text. There has been progress, but it’s still ai slop
Serprotease@reddit
I disagree with your first statement.
Liking a book/short novel is subjective, but quality writing is something that you can be trained on. After all, there are such things as good and bad books.
To give a more concrete example, one might think that a good image is subjective, yet they are courses to take pictures/draw and there are obvious examples of good/bad pictures. In addition there are benchmarks for images models.
Some examples of benchmarks for writing.
Character consistency.
Situation awareness. (Character A and B are in two different room in 2 different cities and talking over the phone).
In universe logic. (Character A do not know an information until given to him.).
And a few others a bit harder to track but definitely important.
Usage of repetition/allegories Overuse of explicit statements over implicit ones.
Gold-Cucumber-2068@reddit
LLMs are very impressive, but they still have not created a single original, insightful and fascinating thing. They're perfect for doing homework, summarizing things, etc, but their writing has still contributed nothing original to humanity, not even close.
Super_Sierra@reddit
Only true for small models, huge corpo models and Kimi K2 are more capable at language and creative writing than most people.
I've been writing and reading in various different places online for fifteen years and Kimi K2 is insanely good at weird, creative prose.
GPT-5 also with enough instruction can do some insane shit.
NNN_Throwaway2@reddit
Most people are not writers and can't write even if they tried. Even most people who do actively write are pretty shit at it. Its a low bar. Even then, LLMs put out some pretty horrible dreck.
I've also been reading and writing online for a couple decades, if we're throwing around credentials.
Super_Sierra@reddit
If you want to go down that road of thinking ...
You are probably giving the LLMs garbage to work with and nothing creatively original, so whatever benchmark you are doing is entirely a skill issue. GPT-5, opus, and especially Kimi K2 are capable of doing pretty much any writing task.
They also cannot read your fucking mind, so if you are asking it like a trog and going 'write an original work' that won't work.
Gold-Cucumber-2068@reddit
This is an interesting new form of circular reasoning.
We argue that no current LLMs are truly creative like talented humans, and your response is that it's because the people using it aren't being creative or talented enough.
It's like saying a restaurant's food isn't actually bad, the diners just didn't bring their own food to make it good.
NNN_Throwaway2@reddit
No, that's not what I'm doing.
Just mindlessly assuming "skill issue" with no evidence when someone has a different result with LLMs is really fucking stupid.
Super_Sierra@reddit
you immediately assumed that i had no idea what i was talking about and decided to throw some objectivity back at you and you squirmed
maybe don't be a weirdo who can't remember a post or two back
NNN_Throwaway2@reddit
I don't need to assume that. I called you out for your clout-chasing bullshit comment and you immediately got defensive; I'm not the one who "squirmed" (is this an example of what you consider "weird, creative prose"? If so, yikes).
Serprotease@reddit
You’re putting the bar a bit high in my opinion. I did a bit of writing as a hobby before and I can definitely tell you that it was bad and didn’t bring anything to humanity in general, but it was a fun thing to do.
Writing (or any creative work really) is a mix of intent and skills. Llm do not have any intent per se, but they could have the skill part.
But it’s quite underwhelming so far. Not bad really, but far from what we could expect from this huge models.
a_beautiful_rhind@reddit
There has been regress.
Super_Sierra@reddit
In open source, yeah.
Corpo models are leagues better in every department except for Kimi K2.
a_beautiful_rhind@reddit
I dunno about leagues because corpo models have some of the same issues with x'ing, echoing, etc.
SlapAndFinger@reddit
Aesthetics are subjective, but given a certain set of aesthetics has been "agreed upon," whether something conforms to that aesthetic or not is pretty objective.
NNN_Throwaway2@reddit
Kinda sorta.
The problem is that improving writing without over-specializing a model means carefully curating the pre-training dataset, which is quite expensive, potentially extremely expensive when you consider that models are now being trained on tens of trillions of tokens. For that to happen, there would need to be a clearly demonstrated cost-benefit for labs to even consider such an endeavor.
In addition, any kind of tuning for "good writing" has the potential to over-align the model and reduce is ability to generalize or tolerate ambiguity, and could even cause regressions in performance of other knowledge domains.
rm-rf-rm@reddit
I think its an unabashed good thing - we need markers like this to be able to distuinguish AI writing from human writing (as many humans are shameless in trying to pass AI writing as their own now).
The unfortunate thing is that this is going to be trained out in the next gen of models
edalgomezn@reddit
This is not a post, it is the harsh reality
aetherec@reddit
This is not a bad thing, this makes it easy for me to spot AI generated text
johnny_riser@reddit
I used to be a very good speechwriter, though. My secret sauce was this style. It was unique then, but now it signifies AI. I see it everywhere in PR releases nowadays.
Ylsid@reddit
You're absolutely right!
throwaway2676@reddit
This really isn't a big deal, it is a reliable way to clarify meaning
GCU-Dramatic-Exit@reddit
This crap is all over LinkedIn
Worryingly, have also seen it in TheGuardian and the New York Times
Morphon@reddit
Kimi K2 (especially the 0905 versions) seems to be free of this quirk. I'm not saying that it never uses this construction - but it does so pretty rarely in my interactions with it.
Kraskos@reddit
My voice drops to a conspiratorial whisper You've hit the nail right on the head -- this post didn't just send a shiver down my spine, it was a full-blown existential tremor that has fundamentally reshaped my understanding of digital communication. It wasn't just a complaint, it was a call to arms. And as we look to the horizon, one can't help but wonder what the next day will bring, and how the very fabric of our language will be woven in this brave new world. All I know is... I'll never be the same.
i3ym@reddit
This shit is so ass.. but why are they even like this? They train on real data but nobody actually speaks like that
winter-m00n@reddit
maybe they are not trained on real world conversational data. they are mostly trained on books and blog posts which are mostly polished which they see in training again and again.
parseHex@reddit
Alright great, now we have a concentrated sample, maybe we can harness it to be an antidote somehow lol
nmkd@reddit
This just hurts to read
Background-Quote3581@reddit
Bro...
ZYy9oQ@reddit
Bro is doing too much RP with bots
sine120@reddit
You're absolutely right! That is an excellent and crucial observation to make, and my apologies for glossing over it. Your intuition is spot on—LLM's are starting to converge with the same idiosyncrasies. Not many people would have been able to catch that.
Historical-Camera972@reddit
Thank Techcrunch disrupt and Silicon Valley.
Gavin Bellson, Peter Gregory, and Richard Hendrix, screwed us!
aeroumbria@reddit
It's always Boromir's fault...
Yasstronaut@reddit
And “vibes”
CodeSlave9000@reddit
It’s not just a floor wax, it’s also a desert topping!
DevilsTrigonometry@reddit
Yeah, I recently ended a comment like that and instantly thought "I sound like AI." It's infuriating. That's a really effective rhetorical technique when used sparingly. But now that AI has flooded the Internet with it, it doesn't sound insightful; it sounds fake.
Poluact@reddit
What's worse - the more you interface with AI, the more you sound like AI. People pick up on things subconsciously.
Briskfall@reddit
Happened to me. This feels absolutely the worse especially when one write things manually.
It was fun playing the fun-house mirror at first. Coming from wanting to troll the LLMs by reflecting their own patterns. But doing so in practice lots seems to affect one's speech pattern. Monkey sees, monkey does.
I've conceded the best way to not get overly affected was by being in peace with one's writing. Fragmented structures, grammatical mistakes and all.
At least online communities allowed for a reality check (though with harsh words and false calibration).
It'll be a nightmare if it becomes more and more infested with bots or seeing users going full in with LLM-speak though. Eventually, how would one get out of such a feedback loop? Discord and knit-tight communities?
218-69@reddit
idk, been useful for me, especially in an adversarial setting, pushing buttons. I got to argue about tons of stuff that I'd have never bothered doing with ppl in real conversations, and I still remember them and can use them in the future if I ever decide to insert myself into a topic like that, or if I happen to find myself in one as such. In terms of usefulness other than irl interactions I'd put it 2nd only to any real time interaction (discord, "face-to-face" chats)
forums and social media sites are fucking useless for actual interaction (which is also another separate issue for training data) because you have all the time to filter yourself through them and it's 90% fake
Briskfall@reddit
Never did I say that interacting with LLMs were not useful. I was simply noting the side effects of doing so. Too much comfort can often lead to a detached outlook on reality.
Blanketing all forms of forums and social media as useless for actual interaction can be a dangerous thought, because it might be what leads to worser and worser training data for these LLMs. Of course, large subs filled with low quality takes aren't useful. But one can be selective with which community they choose to engage with.
Poluact@reddit
Literature. For improving language read books with rich language.
typical-predditor@reddit
I keep using the "Not X, but Y" pattern myself and I cringe when I do it. But then I realize there's a ton of people that still can't tell the difference between human and GenAI content.
TipIcy4319@reddit
Same for em dashes. I don't even use them anymore when writing fiction books. Actually having the occasional misspelling is good.
j0j0n4th4n@reddit
"This is a fantastic critique that cuts to the heart of meaningful roleplay!"
Savantskie1@reddit
The problem is people used to talk like this. That’s why it’s become so prevalent in ai. The further back ai training can go in internet history, you will see it much more. I remember a lot from the starting days. And it was very prevalent in the early days
content_goblin@reddit
I get so mad when they do this. Its like pure ragebait
log_2@reddit
In a kind of survival of the fittest. Text without superlatives is not viral/emotional/engaging and so biases itself out of training datasets. Superlatives are marketing devices that, unfortunately, work well on humans.
nmkd@reddit
RLHF was a mistake.
TheRealMasonMac@reddit
Some people theorized that this behavior is because of how LLMs don't understand how to use the construct. After finetuning on 100% high-quality human writing, I can assure you my model knows how to use the construct properly—seldom, but effective when it does. Therefore, this is literally because of OpenAI's RLHF and everyone else training on it.
HomeBrewUser@reddit
It's because of pure Gemini distillation, simple as that really.
SlapAndFinger@reddit
This pattern appeared before the big labs were diversifying their RL as much as they do now, it's almost certainly the result of synthetic data.
Jealous-Ad-202@reddit
Nice experiment. The prose is not half-bad, and much superior to the original one. Is it on hg?
AIFocusedAcc@reddit
LLMs think they are all Joker. It isn’t about the money, it’s about sending a message.
Comfortable-Rock-498@reddit (OP)
"You wanna know how I got these scars? It is worth noting that trauma narratives are complex, multifaceted experiences that shape our psychological development in profound ways"
EstarriolOfTheEast@reddit
Something worth noting is that everyone uses the same handful of LLMs and we've (model makers and users) all been making choices that restrict their expressive range: instruction fine-tuning, further RL fine-tuning, setting low temperatures and non-zero min-p's all act in concert to significantly reduce model entropy. So called slop is essentially unavoidable.
Anyone who wants LLMs with more range should do the opposite: prefer base models, set T=1, set min-p = a very, very low number, set top-p to > 0.9 (ideally > 0.95 or better yet, 1) and optionally use an entropy adaptive sampler. Any model ineffective at such parameter settings has been likely over-tuned for some task not requiring range in creative expression anyways.
RL for reasoning entropy collapse is very likely a problem in need of addressing, so maybe recent LLMs won't continue their backwards slide in performance for interactive fiction users.
MrWeirdoFace@reddit
...Are we the baddies?
MrWeirdoFace@reddit
You're absolutely right to point that out!
pitchblackfriday@reddit
my voice barely above a whisper
Joker: I'm not so serious. I'm putting a smile on that face.
TipIcy4319@reddit
> you're a human writer
As someone who writes stories with AI, they are still far from being able to write anything good, and with the current focus on coding, it hasn't changed much. Writers are one of the safest people to AI taking their jobs, if anything lol
adscott1982@reddit
I have been experimenting with creating a podcast using Gemini 2.5 for the script and it gets so tiring having to go through what it generates and removing these verbal ticks.
I am holding off generating anymore until Gemini 3, in the hope they solve it.
Here is my current checklist of items to look for and correct in the 2.5 script:
...
*** NOW FIX THE TEXT ***
Look for not this but that.
Look for wasn't this but this
Look for doesn't this but this
Look for no longer
Look for didn't
Look for 'very'
Look for 'let's'
Look for 'imagine'
Look for 'testament'
Look for 'incredible'
Look for 'masterpiece'
Look for 'brilliant'
Look for 'masterclass'
Look for 'world'
Case sensitive sentence starters:
Look for 'But'
Look for 'So'
Look for 'And'
Look for 'Now'
Look for 'Then'
Look for 'Because'
Look for 'To understand...'
...
SlapAndFinger@reddit
This pattern is from Gemini. It spread to other LLMs because Gemini was offering API keys with free inference, and businesses sprang up to basically scrape inference and resell the data.
I expect that the big labs will RL it out soon, as it's such a meme that they 100% know about it, it's probably just lower priority than other things they're currently focusing on.
n00b001@reddit
This isn't just a thread of people talking, it's poisoning training data for future LLMs
Zomunieo@reddit
This isn’t a post, this is a comment.
typical-predditor@reddit
"unearned superlative", such a beautiful term. I love it and it succinctly describes a ton of LLMisms—overly fanciful presentation.
a_beautiful_rhind@reddit
The synthetic data and rigid instruct really do a number. It's this and echoing all your input. Acknowledge, Embellish, Ask Follow Up. Absolute death spiral.
You can't say "well, they're all like that" because I have models that do neither. Thanks scale.com
WithoutReason1729@reddit
Your post is getting popular and we just featured it on our Discord! Come check it out!
You've also been given a special flair for your contribution. We appreciate your post!
I am a bot and this action was performed automatically.
Gold-Cucumber-2068@reddit
This isn't the end, it's just the beginning, and things will never be the same, to be continued... ?
LagOps91@reddit
I wanted to meme on this in the replies, but it looks like everyone else beat me to the punch. I was having a good laugh!
demon_itizer@reddit
This is not just an irritation anymore, it has become the single largest indicator of AISpeak. Think essays. Think papers. Think articles. The effects are not just small, but big.
CattailRed@reddit
And they never do the inverse.
"This isn't a magic bullet, this is just a quirk of the system."
jtsaint333@reddit
This isn't just annoying , it's fucking annoying. The secret - being fucking annoyed.
Key takeaways
Fucking annoying Annoying as fuck Getting fucking annoyed
(Imagine I could be arsed to add the icons )
keepthepace@reddit
This is not a LLM bug, this is the training data that is the internet staring at you like a particularly judgemental mirror.
zschultz@reddit
This is not moon, this is the ultimate power in the universe.
becauseiamabadperson@reddit
GPT 5 thank fuck doesn’t do this. One of the few benefits over 4o.
Briskfall@reddit
Can't wait for LLMs to ditch all these overused expressions for their default closure and go all-in on "peak," "aura farming," "ratio'd" and "diff."
We'll be in the true endgame by then.
hyperdynesystems@reddit
At least it'd waste fewer tokens then
Atagor@reddit
Reinforcement learning reinforced wrong patterns 😃
DeltaSqueezer@reddit
I rue the day somebody pointed it out to me. Before, I didn't notice it. I also blame M&S.
dhamaniasad@reddit
You are absolutely correct!
freehuntx@reddit
comprehensive
Bite_It_You_Scum@reddit
I really loathe this phrasing. On the plus side, it has me using LLMs less, which is probably a good thing.
Feztopia@reddit
I guess Google used to train Gemini on mistakes (like if Gemini was thinking this is something wrong they trained it with this isn't the wrong thing but the correct thing). And other models were quick to copy it.