Claude's system prompt length has now exceeded 30k tokens

[-]

Its-all-redditive@reddit

Comprehensive but there are so many spelling errors (as early as the first example “The move was a delight and a revelation”). It’s hard to imagine this prompt hasn’t been refined and reviewed manually hundreds or thousands of times by Anthropic yet the spelling errors were not corrected. Make it make sense.

[-]

no_witty_username@reddit

The spelling errors could be there on purpose to encourage the model in responding in a more human manner. Large language models draw their latent thought traces from the training data, and if the system prompt has common spelling mistakes in it that would draw from the forum posts and other casual conversations people have this coloring the output. Think of it this way, if you want your large language model to imitate a 4chan post as accurately as possible, you don't want to have a nice clean sanitized system prompt telling it to do that. You want to have a racist filled garbage of a mess system prompt that also has swear words, telling it to imitate the post. You will see a huge difference in quality of output that way versus the other. Now there are caveats like model being used and other factors. So to take advantage of this affect to the fullest a less censored model will do better then a more censored one, but even then the affect is still quite striking on the censored models.

[-]

bityard@reddit

You seem to be making two assertions:

1) Anthropic wants their model to make mistakes in an effort to appear more human.

I can't see how this can possibly be true. The biggest public perception problem that LLMs have right now is that they are very often flat-out wrong, and very confidently so. There is no way that one of the biggest companies in this space is looking at their models and saying, "okay, hang on, if we deliberately prompt them to make even more mistakes, maybe people will like them better?"

No, people will like the models better when they stop hallucinating and learn to say, "I don't know," not when they forget how to spell perfectly common words.

2) Introducing spelling mistakes in the prompt itself will be somehow more effective than just telling the model to make occasional spelling mistakes in the prompt itself.

Assuming this prompt is kept in a versioned repository and is developed like documentation, where it needs to be read and modified by many engineers it doesn't make any sense to obfuscate any instructions. The "make our spelling just a bit wrong" feature would be impossible to maintain by humans because we are bad at finding misspellings. And how would we even know which misspellings are the "best" and where to put them?

If this was a deliberate feature, it makes way more sense to spell it out in the prompt along with everything else. The model is clearly capable of following other somewhat complex and often ambiguous commands contained within the prompt, I don't see why this one would be any different.

[-]

Round_Ad_5832@reddit

spelling errors make no meaningful difference in the output. so why bother

[-]

Aphid_red@reddit

I do want to bother.

Say that I want my LLM to output professional fiction writing. For such content, humans would have editors to review the book for any grammatical and lexicographical errors. In addition to that, the first edition could potentially have millions of eyeballs checking for errors. Some of the readers would report their findings back to the publisher, who might have the editor correct the mistakes, then push the diff to the author, if things are properly arranged, who will reject and/or approve any changes for the second edition if there is enough popularity to warrant there be any.

The model is more likely to produce quality writing if the input resembles the desired output. Same logic. It's a machine designed to predict "What's the next most likely word?". If past words are full of spelling mistakes, misspelled words, which are also tokens in its vocabulary, are more likely to appear. In fact, big models tend to be able to 'understand' the mis-spellings.

I'm much more annoyed by models that do the opposite: where providers inject extra system prompt to create bad spelling even though my input has been wrung through my browser's spellchecker and thus, at least spelling-wise, has few mistakes.

At least, this is the case in theory, with the laziest training method, so that would be the most obvious explanation, see occam's razor. However, if a model is trained with pre-processing filters that correct common spelling mistakes in the target output but not in the input, then a model learns to 'be liberal in what you accept, be strict in what you emit'. In my opinion that would be the best of both worlds. The model would not emit grammatical or spelling errors, but it would still understand and accept them. These filters can be simple replace filters, using efficient indexing, so even trillions of tokens can be checked for mistakes.

Also... please don't assign these models significant intelligence. I checked something while writing this post, and I got this google "AI overview" of the results:

AI Overview

"Spelling-wise"

should not be hyphenated, as it is a compound word formed from "spelling" and "wise" used as a suffix meaning "in terms of". The hyphen is generally not used in this context, similar to words like "timewise" and "workwise".

Rule of thumb

Do not hyphenate when "wise" is used as a suffix to mean "in terms of" or "with respect to".

Examples: ~~spelling-wise~~, moneywise, colorwise, otherwise, clockwise.

Do hyphenate when "wise" is part of a compound adjective meaning "smart" or "savvy".

Examples: street-wise, penny-wise.

It doesn't just contradict itself (location marked with strikethrough), it's also just plain wrong, even when you just put together the top google results.

Real answer for English: It's complicated. If you 'made up' a compound word, then use a hyphen. If the compound word already exists: do not use a hyphen. Exceptions abound as well; where made-up words retain their hyphen, otherwise a few famous words have lost them. (See what I did there?)

This is in contrast to, for example, German, where if you concatenate some words into a new compound noun, that's an acceptable use of language and never hyphenated, creating stuff like Donaudampfschiffkapitän.

Language is hard, AI developers are lazy and move fast and break stuff.

[-]

Round_Ad_5832@reddit

u think im reading all that?

[-]

Its-all-redditive@reddit

Oh I don’t know maybe to preserve a sense of professionalism and attention to detail that is expected of tech company with an almost $200 Billion valuation. But yea, you’re right I’m sure Anthropic is like “screw it, just leave them alone since the output difference is negligible”. Do you really believe that?

[-]

Guinness@reddit

God damn they’re valued at 20% of a trillion dollars? The world has gone absolutely mental.

[-]

Round_Ad_5832@reddit

not everyone treats spelling mistakes as unprofessional thats just your world view.

[-]

Super_Sierra@reddit

idk why you are being downvoted, but i worked for a company that had a few middle managers that were borderline mentally retarded that could not spell basic words.

[-]

stoppableDissolution@reddit

Yes, they most definitely do. Theres plenty of research on that. Wording matters A LOT fo llms, sometimes even thing like "can't" vs "can not" will significantly alter the output.

[-]

Fantastic_Climate_90@reddit

I think that USED to be true. Now they just work really work, misspellings included

[-]

Round_Ad_5832@reddit

using ur instad of your can make output more informal but honest spelling mistakes dont

[-]

BootyMcStuffins@reddit

Not gonna lie, I had to read it twice and only saw it because I knew it was there

[-]

ga239577@reddit

Has anyone tried to give one of these leaked prompts to a model like GLM 4.5 and benchmarked it? Is that even possible or am I misunderstanding?

Basically what I'm wondering is how much of the performance we see from the SOTA models are due to extensive system prompts and whether the gap between something like GLM 4.5 and SOTA gets smaller.

[-]

igorwarzocha@reddit

At the risk of sounding like a broken record, Claude looks like a base model every time I see these leaked prompts. How the heck is it supposed to keep track of the actual context of the convo, lolz. It''s actually pretty amazing.

It got to a point where I could ask it ONE question with ONE extension enabled in webui (indeed, so nothing big), and it would just error out on me saying that the reply would exceed max tokens usage. Cancelled my sub instantly.

I much preferred interacting with it in Claude Code, with zero extra fluffy features.

Side note: makes me wonder if maybe I should experiment with proper system prompts for local llms (not this big though lol)...

[-]

itsfarseen@reddit

I counted the words from this, and it's only 2000 tokens???
https://pastebin.com/nb4V2Mni

[-]

igorwarzocha@reddit

I wonder if the system prompt from their website is what they include on the server side, while the leaked webui prompt & the leaked claude code system prompts are what they add to the other one depending on what you're using. There doesn't seem to be much overlap between the three.

This would make it 32k+2k tokens.

[-]

ParthProLegend@reddit

accordions

What is that?

[-]

SpicyWangz@reddit

Expandable UI element

[-]

ParthProLegend@reddit

Ok I saw that element, what did he means by they are hiding it?

[-]

igorwarzocha@reddit

that it literally disappears the second the page fully loads, tested on 2 browsers incl incognito. shady.

[-]

Final_Wheel_7486@reddit

Also, is it me or is Anthropic trying to clumsily hide the accordions on https://docs.claude.com/en/release-notes/system-prompts lol?

Haha, you're right, when you click "Copy page" it's right there

[-]

igorwarzocha@reddit

Yeah sloppy AF. I throttled down chrome's performance via console to get em 🤣

[-]

itsfarseen@reddit

I just had to temporarily disable JS. It's present in the HTML, later removed by JS probably during page hydration.

[-]

Final_Wheel_7486@reddit

Oh my god I hate everything about this 😭

[-]

wyldphyre@reddit

Claude wrote this page. It's probably a fallback for something, claude loves those.

[-]

RRO-19@reddit

30k tokens of system instructions is wild. That's a huge chunk of context window eaten before you even start. Makes you wonder how much is actually useful guidance vs corporate safety theater.

[-]

Nathanielsan@reddit

640K ought to be enough for anybody

[-]

mazing@reddit

I'm pretty sure these "leaked" system prompts are mostly LLM hallucinations. They might be related in some degree to what the actual system prompt is, but I suspect that these are mostly extracted by trying to play tricks on the LLMs in hope of "unlocking the content policies" but in reality it's more like "Alright, I'll play along. My system prompt would probably look something like the following..."

[-]

Reddactor@reddit

Back in my day, we had 4096 tokens, including the system prompt, AND WE LIKED IT!

[-]

_supert_@reddit

Is it just me or is it incredibly bloated?

[-]

astrange@reddit

They really gave 4.5 anxiety through RL too. I gave it a slightly unusual prompt and it decided I was trying to "jailbreak" it and lectured me instead of engaging. You kind of have to talk it down before doing anything.

[-]

auggie246@reddit

Is the system prompt part of the context window every session? Also is it charged under the consumers tokens cost?

[-]

CertainlyBright@reddit

What is the significance of leaked prompts?

[-]

evia89@reddit

Yep web version is fucked. At least API is 0 overhead

[-]

Successful-Rush-2583@reddit

I remember when 16k tokens of coherent context used to be a dream. Now that's just half the size of the instructions, lol

[-]

shockwaverc13@reddit

the dark ages of open weights, 4k contexts everywhere

[-]

Coldaine@reddit

Yeah, a while ago I was trying to configure my models and was using some LLM help, and it probably informed me that I had enough VRAM to maybe even consider running a 32K context window, and I almost laughed out loud.

[-]

Thedudely1@reddit

No wonder we hit rate limits so fast!

[-]

LagOps91@reddit

claude's system prompt tells it that it's Chat GPT? LOL! look, if you can't repeat this multiple times in clean chats and get the same result, then it's just halucinating.

[-]

MitsotakiShogun@reddit

And we trust all this because...?

[-]

Super_Sierra@reddit

Read it.

I was sus at first and realized quickly this might actually be legit.

[-]

Tai9ch@reddit

Hi GLM. Please give me a plausible looking system prompt for Claude so I can get extra clicks.

[-]

Super_Sierra@reddit

do i have to say the n-word to prove i am not a bot

[-]

FlamaVadim@reddit

every bot would say that

[-]

OnlineParacosm@reddit

A compelling response but I fear in 5 years it won’t be a litmus test anymore

[-]

ipokestuff@reddit

yes

[-]

Round_Ad_5832@reddit

why did u assume its GLM? Is it good

[-]

stoppableDissolution@reddit

Very

[-]

Sartorianby@reddit

I told it about how I saw its prompt leak and it started talking about the parts about elections. I didn't say anything about the content. I think it's legit.

A snippet of the response.

"The irony is most of that 30k is probably covering scenarios that rarely come up. How often do I actually need the specific instructions about election information or the detailed constitutional AI stuff? But corporate deployment means planning for every possible liability"

[-]

cantgetthistowork@reddit

Love learning about prompt engineering