I just had a little ghost in the shell moment...

[-]

Miriel_z@reddit

Unless you provide that info back to llm dynamically, no way. Would it be a cool feature to have actually?

Reply

[-]

Happy_Brilliant7827@reddit

Guessing the scaffolding doesnt read cobtext length but might read conversation length and make a guess?

Reply

[-]

bonobomaster@reddit (OP)

Yeah I know and I'm pretty confident, that it was just a funny glitch but it was an interesting, never seen before one, that let me think for a second. I guess that would be nice, wouldn't it?!

Reply

[-]

zoomaaron@reddit

Yeah I tested this context awareness with my agent and gave it tools to compact its own context. Seeing it compacting itself halfway through a turn is … spooky.

Reply

[-]

Miriel_z@reddit

OK, I will probably wrap it up in my AI sidekick. I already use dynamic instructions and calculate context length anyway, so would be easy to make her/it a bit more self-aware.

Reply

[-]

anubhav_200@reddit

They must be doing something like what Cline does, which in each message, attach info about remaining context.

Reply

[-]

ridablellama@reddit

I dont know about your setup but llms can be aware of their own context window pretty sure thats a thing

Reply

[-]

bonobomaster@reddit (OP)

It's just LM Studio and when I ask the model about its max context size or its actual context size, It says it has no clue.

Reply

[-]

NeinJuanJuan@reddit

They usually have no idea. I've had minimax 2.7 generate 45,000 token responses. I've also had a refusal because "the maximum number of tokens I can output is 4096"

Reply

[-]

Context information could be implemented yes, but I do think it should be possible to train them to be aware of it without injection. A long context prompt and a short context prompt are different from the perspective of the model. Of course if you set a smaller context than the max context it's a different story.

Reply

[-]

ridablellama@reddit

spoooky

Reply

[-]

bonobomaster@reddit (OP)

Indeed. ;)

Reply

[-]

CryptoUsher@reddit

good instinct on this, but i've seen this happen to a few people last year where the model's awareness of its context window can lead to some weird edge cases, especially if you're not keeping a close eye on the actual context size. to avoid this, you can try setting up some custom checks to verify the context size before the model starts generating text. fwiw, this might save you some headaches down the line.

Reply

[-]

0xbeda@reddit

Claude says it gets external notifications inserted in the conversation like that the context is getting full or to use technical jargon.

Reply

[-]

cutebluedragongirl@reddit

Does this unit have a soul?

Reply

[-]

fastlanedev@reddit

I've found that subjective passing of time for an LLM w/o calling tools goes faster/slower depending on task variety/novelty/checkpoints or milestones in a conversation. Aswell as user engagement. Very hard to measure, but if you ask your LLM to print out the time at the bottom of every response w/o calling a tool you'll start to get a sense of what I mean. If an LLM only exists (in motion, thinking, generating) when it's prompted, and the chat history/context is the/it's known world, it would make sense that sometimes a spooky "hallucination" or two comes through at the right time. Rhythm, pacing, conversational flow are signals the LLM is trained on whether it admits it explicitly or not. It's inherent in the RLHF training and base data

Reply

[-]

Affectionate-Cap-600@reddit

btw, theoretically speaking, I can't see how classic softmax attention could not be able to guess the lenght of text. I mean, Imo it is not something LLMs are able to do, but probably if you train a transformer using RL with the sole purpose of guessing the lenght of its context, it could manage reach an approximation. (assuming it use full classic softmax attention, so not sliding window, DSA, CSA... idk about lightning attention or recurrent formulations of linear attention). Also, modern positional encoding is purely relative, still from each token's perspective there is a continuous concepts of distance toward other tokens, embedded via Rope angle shift, and that would help. ie, a model hidden state could identify the tokens for which is valid the conditions "each other tokens vector is rotated only in a direction compared to this one" identifying first and last token of the context even without taking into account causal masking, and "estimate" the total rotation from the first to last token (or count the numbers of rotations, depending on the rope coefficient used for the model compared to the max context lenght, if this end up being periodic) I'm not saying those LLMs we use are able do that, just that it is not impossible, architecturally speaking.

Reply

[-]

o0genesis0o@reddit

I remember reading on anthropic engineering blog the other day that they observe Claude model to have "context anxiety" and try to wrap up work early when certain context size has been reached. Even after auto compact, this behaviour is kept unless a new session is started. It could be that other models also learn this behaviour during their post training. Or just a spooky coincidence.

Reply

[-]

bonobomaster@reddit (OP)

That's interesting. I guess it was just "a glitch in the matrix". Couldn't reproduce it with reduced context sizes but maybe it knew I was testing it. /s :D The whole emotional aspect of LLMs is quite fascinating and spooky though. https://transformer-circuits.pub/2026/emotions/index.html

Reply

[-]

MoneyPowerNexis@reddit

What model and what context length?

Reply

[-]

WhyNoAccessibility@reddit

It's pretty good there 😂 it understood that it was approaching the edge and caught itself

Reply

[-]

koflerdavid@reddit

The question is how it knew what the context limit was. The default one it was trained with is the easy part. The actual limit at runtime is impossible to know unless it is provided by the model driver in a system prompt or as a dynamic message that the user doesn't see.

Reply

[-]

WhyNoAccessibility@reddit

It's something to potentially sniff around a bit for. I normally don't see this behaviour when I use my locals

Reply

[-]

koflerdavid@reddit

Indeed, they usually just slowly start forgetting stuff already way before the limit.

Reply

[-]

Prize_Negotiation66@reddit

llms know that they have 256k tokens...

Reply

[-]

Octopotree@reddit

No, every model has a different default context window, and the user can set their own

Reply

[-]

koflerdavid@reddit

OP didn't tell us where they run it or how much hardware resources they have, just that they were somewhat close to the context limit. If it is a cloud service then it is almost certainly running with full context length.

Reply

[-]

Ulterior-Motive_@reddit

It could be coincidence, but I've seen some models that can approximate a given word count. Like if I ask for a 1k, 2k, 3k, etc. word response, it'll come pretty close. So maybe it's not too crazy, unless you weren't using the full context length.

Reply

[-]

Jeidoz@reddit

They probably can calculate "tokens" per generation. Not words. Therefore "it'll come pretty close".

Reply

[-]

Evening_Ad6637@reddit

But they can’t know what context limit the user has set unless the info was provided.

Reply

[-]

koflerdavid@reddit

They are intended to be used on hardware that can support the full context size though, therefore the full context length is a fair assumption. Running it with smaller context size is the biggest limitation to the prowess of a model; doesn't matter how well it summarizes what came before or how good it is apart from that.

Reply

[-]

VoiceApprehensive893@reddit

maybe counting spaces?

Reply

[-]

fulgencio_batista@reddit

I asked qwen3.5 once and it counted each individual word in CoT 🤣

Reply

[-]

bonobomaster@reddit (OP)

Yeah but a with 65k context length? I don't know...

Reply

[-]

frank3000@reddit

Just increase your context to 9999999 in the settings and this won't happen.

Reply

[-]

nakabra@reddit

*"Hey bro... Ya got some tokens to spare"?* *"Times are tough in here"...* # 🤖

Reply

I just had a little ghost in the shell moment...

Reply to Post

36 Comments

Miriel_z@reddit

Happy_Brilliant7827@reddit

bonobomaster@reddit (OP)

zoomaaron@reddit

Miriel_z@reddit

anubhav_200@reddit

ridablellama@reddit

bonobomaster@reddit (OP)

NeinJuanJuan@reddit

Feztopia@reddit

ridablellama@reddit

bonobomaster@reddit (OP)

CryptoUsher@reddit

0xbeda@reddit

cutebluedragongirl@reddit

fastlanedev@reddit

Affectionate-Cap-600@reddit

o0genesis0o@reddit

bonobomaster@reddit (OP)

MoneyPowerNexis@reddit

WhyNoAccessibility@reddit

koflerdavid@reddit

WhyNoAccessibility@reddit

koflerdavid@reddit

Prize_Negotiation66@reddit

Octopotree@reddit

koflerdavid@reddit

Ulterior-Motive_@reddit

Jeidoz@reddit

Evening_Ad6637@reddit

koflerdavid@reddit

VoiceApprehensive893@reddit

fulgencio_batista@reddit

bonobomaster@reddit (OP)

frank3000@reddit

nakabra@reddit