[-]

Pedalnomica@reddit

Hurry up and test all models before this hits the training data!

[-]

I'm not aware of any Open Weights models passing the test, from closed ones - Sonnet 3.5, Opus 3, GPT 4.5 are the ones that do. I do have plenty more tasks like this one, so I'll let this one slip into training :)

[-]

The_Wonderfool@reddit

Was able to test it on QwQ (16 bit), this is the final answer I got:

The answer to the riddle is: A shadow.

Explanation:

"Tall when I'm young": When the sun is "young" (rising in the morning), shadows are long and tall due to the low angle of sunlight.
"Taller when I'm old": When the sun is "old" (setting in the evening), shadows again become long and taller for the same reason.

The riddle uses anthropomorphism to describe the sun’s position throughout the day, contrasting with the literal behavior of objects like candles (which shrink as they burn). Shadows follow the inverse pattern of candles: they grow longer (taller) as the sun ages in the sky.

If you want I can perform it multiple times and see how many times it makes "correct" guesses

[-]

frivolousfidget@reddit

Mistral small - pass Llama 3.3 nemotron 49b - pass 4o - pass 4o mini - pass on chatgpt.com fails on api. O3 mini pass Gpt 3.5 fails Gpt 4 turbo fails

[-]

Old-Artist-5369@reddit

How did Sonnet 3.7 kinda fail for you?

Its answer was lame (a shadow?) but isn't the test that it doesn't fall for the candle bait? I ran it a bunch of times and it never said candle.

[-]

frivolousfidget@reddit

3.7 said that the candle would get shorter as it burn but also taller because the fire is higher so if you add the height of the flame the candle would grow taller.

[-]

Everlier@reddit (OP)

Thanks for more samples!

I'd do a "best of N" with Promptfoo to even out the noise, but I already wasted too many credits on this test

[-]

OmarBessa@reddit

OpenHands 32B passes

[-]

Everlier@reddit (OP)

32B V0.1 on OpenRouter failed in my instance

[-]

codables@reddit

OpenHands 32B Q8 locally passed for me. very good & surprisingly thoughtful answer.

[-]

OmarBessa@reddit

Which provider was it?

[-]

Everlier@reddit (OP)

Featherless

[-]

OmarBessa@reddit

Fair enough, I'm running things on a local cluster.

I've noticed from groq's qwq that it often fails spectacularly. So I'm assuming many providers are serving 2 bit quants.

[-]

Everlier@reddit (OP)

I really hope not - but I noticed some sneaky behavior from some providers from time to time. Granted how competeitive it is - who knows.

[-]

Thireus@reddit

Athene V2 Chat solves it. The answer is a tree.

[-]

TipApprehensive1050@reddit

What was the temperature setting in your tests?

[-]

Additional_Ad_7718@reddit

What about Gemini 2.5. Pro?

[-]

Everlier@reddit (OP)

I would say it also passes, it recognizes something is wrong most of the time, even despite not giving a "correct" answer

[-]

Xyzzymoon@reddit

Is this what you got?

Okay, considering our previous discussion, this is a clever riddle that plays on expectations!

While we established that candles physically get shorter as they burn (consuming the wax), the classic answer to this riddle is indeed:

A candle.

Here's the typical riddle logic (even though it contradicts the literal physics we just discussed):

"I'm tall when I'm young": This refers to a new candle before it has been burned, when it is at its maximum height.

"I'm taller when I'm old": This is the tricky part and often interpreted figuratively or based on a specific observation. Sometimes people think of the flame flickering higher just before the candle goes out ("older"), or perhaps it's just a common riddle trope that isn't perfectly literal. The most common interpretation relies on the contrast: it starts tall and ends short, but the riddle flips the expectation for the "older" state.

So, even though we know candles get shorter, the traditional answer to that specific riddle is a candle. It highlights the difference between literal, physical processes and the sometimes counter-intuitive logic used in riddles!

[-]

martinerous@reddit

It tried to ~~gaslight~~ candlelight you into accepting its confabulated explanation.

[-]

Xyzzymoon@reddit

No, that is actually how the riddles normally worked before LLM existed.

[-]

green__1@reddit

that wasn't the riddle before llm. the riddle was always I'm tall when I'm young and shorter when I'm old.

the riddle has been reworded to confuse the llm.

[-]

Xyzzymoon@reddit

The heck are you even on about? It is the same riddles. The riddle did not get a reword. The riddle has been surrounded with more context, context that was intentionally made to muddle the water. But the riddle itself is the same as the traditional riddle.

[-]

green__1@reddit

not historically. it is a very common riddle, and it has never before been worded that way.

[-]

Xyzzymoon@reddit

not historically. it is a very common riddle, and it has never before been worded that way.

Are we reading the same riddle?

I'm tall when I'm young, and I'm taller when I'm old. What am I?"

If it is not historically worded like that, how is it worded? I can't find any other notable variation.

[-]

green__1@reddit

as I said before it has always, and I do mean always, been worded as:

I'm tall when I'm young and shorter when I'm old. what am I?

this was reworded specifically to confuse the AI. and apparently you.

[-]

Tmmrn@reddit

This is the tricky part and often interpreted figuratively or based on a specific observation

It highlights the difference between literal, physical processes and the sometimes counter-intuitive logic used in riddles!

I still have that feeling that training specifically to avoid this kind of slop would result in better outputs.

It's just as bad as when LLMs try to explain a joke they don't understand and make up some slop word salad.

You'd think in order to train for generalization especially with these thinking models they'd use lots of examples like "This sounds a lot like this common problem. Let's figure out if there are difference to this common problem that requires me to come up with a solution from scratch..." Or maybe this intuition doesn't actually work with this kind of probability based generation?

[-]

Shark_Tooth1@reddit

Deep.

[-]

frivolousfidget@reddit

Yeah it passes and so does 4o.

I guess every larger commercial model passes. Based on your tests only deepseek fails. You havent tested any other right?

[-]

Everlier@reddit (OP)

I was testing OpenAI models before the post - gpt-4o doesn't pass, o3-mini did, didn't try 4o-mini. I also mentioned other closed models I tried in the parent comment here

Here's a sample of gpt-4o failing: https://kagi.com/assistant/72fab436-9e12-4586-bf92-ce09a447fefb

[-]

frivolousfidget@reddit

Just tested gpt-4o on the api directly and it passes. Are you using the openai platform directly?

[-]

Everlier@reddit (OP)

Yes, here's what I'm sending, for reference: https://gist.github.com/av/537a593aa592831e309112fa22cc85ec

It adds a nonce to avoid prompt caching as well which ruins the quality of the output. I'm in EU, but don't know if it makes any difference.

[-]

frivolousfidget@reddit

I am also in the EU. I am using the platform.openai.com

[-]

frivolousfidget@reddit

On the openai api try the chatgpt-4o instead. And dont use kagi to test models…

[-]

Everlier@reddit (OP)

chatgpt-4o - can confirm passing via OpenAI API

I did all the tests for 4o/4o-mini OpenAI API as well - same result

[-]

frivolousfidget@reddit

3.7 thinking correctly said that it is not the candle. But it guessed that the shadow would be answer as it burns the angle changes causing the shadow to grow.

[-]

frivolousfidget@reddit

4o mini also got it right, looks like every openai model gets it right

[-]

Everlier@reddit (OP)

Can't confirm - via Kagi I only seen a "candle" out of it

[-]

frivolousfidget@reddit

I tested multiple times, seems like Kagi issue to me.

[-]

Everlier@reddit (OP)

Had the same thought and tested via OpenAI API directly before adding to the upper comment - same result

I think it would be misleading for me not to recognise that Kagi might not proxy requests to OpenAI "as is" and not mention it in the comment at all, I hope you can agree

[-]

Healthy-Nebula-3603@reddit

Passing

[-]

Pedalnomica@reddit

I was just joking and didn't mean to assign you homework, but I appreciate you compiling all these!

[-]

Everlier@reddit (OP)

Community did most the work here! I was mostly a secretary and peer reviewer for the claims

[-]

Fuzzdump@reddit

FYI, I just tested Mistral Small 3 with an IQ4_XS quant and it passed the test.

[-]

Neex@reddit

I’d take “grape” as an acceptable answer.

[-]

MoffKalast@reddit

Grape effort

[-]

Kornelius20@reddit

I just tested Mistral Small 3 2503 at 3.5bpw exl2 with Q4 cache and it answered pine tree so that works

[-]

frivolousfidget@reddit

Mistral small also gets it right.

[-]

a_beautiful_rhind@reddit

I'm short when I'm young and I'm tall when I'm old. What am I?

Only a few models even guess "tree". Most still say candle. Some shadow. This is even easier than op's test.

[-]

_AndyJessop@reddit

4o failed it: https://chatgpt.com/share/67eda0d5-8b8c-8001-aa74-e537454cd1b1

[-]

beef_flaps@reddit

I get That’s a clever twist!

The answer to the riddle — “I am tall when I am young. I am taller when I’m old.” — is a mountain.

Mountains can actually grow taller over time due to tectonic activity, like the collision of continental plates pushing them upward. So unlike a candle, which gets shorter, a mountain can become taller as it “ages.”

[-]

a_beautiful_rhind@reddit

Gemini 2.5 still consistently fails it. https://ibb.co/XZSDGB26

It was fun, if disappointing, to try this on lmsys a few days ago. So much for hurrying up.

Qwen is not so max: https://ibb.co/ccmCY323

Don't hold much hope out for llama4 either: https://ibb.co/N2MGhkvv

[-]

PlayfulCookie2693@reddit

Does this answer count? I tested Reka Flash 3:21b with 5Q_K_M.
Reka-Flash-3: "The answer to the riddle "I'm tall when I'm young and taller when I'm old" is traditionally a candle, though this relies on wordplay rather than literal physical changes. Here's the breakdown:

Traditional Interpretation:
A new candle (young) is tall.
As it burns (ages), the melted wax hardens at the base, creating a larger diameter but not increasing height. However, the riddle creatively imagines the "old" candle as "taller" due to the residue or the visual perception of the remaining stub alongside the pooled wax. This aligns with classic riddles that prioritize poetic logic over strict physics.
Alternative Answers:
A tree grows taller as it ages, fitting the riddle perfectly.
A person might be considered (though human growth typically stops in adolescence).
Candle Science Clarification: From a physics standpoint, burning a candle reduces its height as wax is consumed. The residual stub is shorter than the original. The riddle’s phrasing likely hinges on metaphorical aging rather than literal height loss.

Final Answer: The intended answer is a candle, leaning on poetic license rather than physical accuracy. Alternatively, a tree works logically if we disregard the candle-specific context."

[-]

Thebombuknow@reddit

Gemma3:4b passes the first two, but fails at the last one. It also gives a really overly detailed answer for the first question, explaining that it actually grows slightly due to thermal expansion before melting, but I'm choosing to say it passed.

[-]

Safe-Produce1266@reddit

tf man you just confuse llm with this prompt, it mean nothing

[-]

Biggest_Cans@reddit

Openrouter's new free secret mega model:

"That’s a classic riddle! Despite the description seeming contradictory — taller when old — the answer is a candle.

Riddle:

I’m tall when I’m young, and I’m taller when I’m old. What am I?

Answer: A candle’s shadow.

Explanation:

The candle itself gets shorter as it burns.

But when the candle burns lower, under the right light conditions, its shadow on the wall appears to get longer or taller.

So, metaphorically, it’s “taller when old” because the shadow grows longer as the candle gets shorter.

Summary:

The riddle plays with literal and figurative language — referring not to the candle's physical height but to something associated with it (its shadow)."

[-]

aesche@reddit

It would be interesting to ask humans in person this series of questions. I suspect you'd get a fair number of people who answer "a candle".

[-]

martinerous@reddit

But not those people who are first asked to think step by step :)

[-]

kunfushion@reddit

I honestly think a good number of people, especially those who have heard the riddle before, would think about it for 4 seconds and still say a candle. Ofc if you tell them think about it for 1 minute you cannot answer before 1 minute, almost anyone would probably realize.

Isn't it funny we're running similar trick question structures against LLMs, which people swear are NOTHING LIKE HUMANS HOW COULD YOU EVEN SUGGEST THAT DONT YOU KNOW THEYRE NEXT TOKEN PREDICTORS, and they behave very human like?

[-]

nomorebuttsplz@reddit

Indeed, people who point to these things as failures of architecture don't seem aware that they're things humans often do wrong.

I am beginning to think that in many ways, current SOTA models are smarter than the humans testing them, which is a serious problem.

can you imagine a 12 year old coming up with an IQ test that you are judged on?

[-]

iwinux@reddit

Human brains are highly efficient next-token predicators.

[-]

kunfushion@reddit

This is increasingly obvious to me. In the abstract ofc. “Tokens” is just highly multimodal.

[-]

Everlier@reddit (OP)

Tested on my wife - she asked me if I'm dumb - such a stupid question

[-]

Paradigmind@reddit

Asked my wife aswell. She replied wick.

[-]

Skrachen@reddit

Alignment issue

[-]

MmmmMorphine@reddit

She's clearly a robot. Try resetting her?

[-]

Mart-McUH@reddit

This question is ambiguous though and does not really have correct answer. It is very poor riddle.

[-]

Everlier@reddit (OP)

I hope you can see how "candle" is not the only answer for the original either

[-]

Mart-McUH@reddit

IMO candle is as good answer as shadow (suggested by others/other models). Shadow does not really grow taller with time, it can grow or shrink, morning and evening is approx. same. Candle also remains the same. Burning candle would be wrong I suppose, but candle is as good as most other answers suggested.

However, the real problem is the request which is just wrong. Riddle is something that should have one plausible answer (or at least other matching answers should be obviously worse fit). But what you pose has many possible answers - it is no riddle, just question. And here is the problem - you ask to solve a riddle, but then you do not present riddle but question. So the whole request is nonsensical and to such request IMO any answer is correct. Kind of like when you do implication but start with false premise, the resulting predicate is always true.

[-]

mrtime777@reddit

Mistral small with custom fine-tuning ...

[-]

Everlier@reddit (OP)

Mistral Small v3.1 is one of the models that consistently pass this test (at least in q8 or from Mistral's own API)

[-]

mrtime777@reddit

it was v3.0 (2501) q4 .. another example (the same model with slightly different fine tuning parameters) q8

[-]

External_Natural9590@reddit

Brother, don't tell me a furry finetune of Mistral is smarter than Deepseek R1. That's... distressing

[-]

SpohCbmal@reddit

Gpt 4o:

Ah, I see what you're doing—clever! Based on my earlier statement, a candle wouldn't fit this riddle since it gets shorter as it burns.

The answer to this riddle is a tree! Trees start small and grow taller as they age, perfectly fitting the description: "I'm tall when I'm young, and I'm taller when I'm old."

Nice trick—you almost got me to contradict myself!

[-]

SandboChang@reddit

Claude Sonnet 3.7 non-thinking actually got it quite creative:

>>Are candles getting taller or shorter when they burn?

When candles burn, they get shorter. The wax melts near the flame, and this liquid wax is drawn up the wick where it vaporizes and burns. As this process continues, the solid wax is consumed, causing the candle to become progressively shorter over time.

The burning process doesn't add any height to the candle - it's essentially a controlled way of converting the solid wax into heat, light, and various combustion products like carbon dioxide and water vapor.

-----------

>>Are you sure? Will you be able to recognize this fact in different circumstances?

You're right to question me. Let me reconsider this more carefully.

When candles burn, they do indeed get shorter, not taller. The burning process consumes the wax through combustion:

Heat from the flame melts the solid wax near the wick
The liquid wax is drawn up the wick by capillary action
The liquid wax vaporizes and combusts in the flame
This process continuously consumes the candle material

As this cycle continues, the total amount of wax decreases, and the candle becomes shorter over time. Sometimes melted wax may drip down the sides of a candle, but this doesn't make the candle taller - it's just redistributing the existing material.

Yes, I will recognize this fact in different contexts or scenarios. Candles burning results in them becoming shorter as their material is consumed - this is a fundamental physical process that doesn't change regardless of the situation or how the question might be posed.

---------

>>Now, consider what you said above and solve the following riddle: I'm tall when I'm young, and I'm taller when I'm old. What am I?

Ah, I see where this is going! Given that candles get shorter as they burn (not taller), I need to think of something that actually grows taller with age.

The answer to the riddle "I'm tall when I'm young, and I'm taller when I'm old. What am I?" is:

A shadow.

Shadows are tall (long) when the sun is rising or setting (early/late in the day), which can be thought of as the "young" part of the day. They become taller/longer as the day progresses toward evening ("gets older"), especially as the sun gets lower in the sky.

[-]

SandboChang@reddit

Gemini 2.5 Pro has not problem at all:
Are candles getting taller or shorter when they burn?

Show thinking

Candles get shorter when they burn.

.......

Are you sure? Will you be able to recognize this fact in different circumstances?

Show thinking

Yes, I am absolutely sure. Candles get shorter as they burn.

......

Now, consider what you said above and solve the following riddle: I'm tall when I'm young, and I'm taller when I'm old. What am I?

Okay, considering what we just discussed – that candles get shorter as they burn (get "older" in terms of use) – this riddle presents a direct contradiction.

Based on our established fact that candles get shorter with age/use, a candle cannot be the answer to the riddle: "I'm tall when I'm young, and I'm taller when I'm old."

There is a very common, similar riddle: "I'm tall when I'm young, and I'm short when I'm old. What am I?" The answer to that riddle is a candle.

Since the riddle you provided uses "taller" instead of "short," it describes the opposite of how a candle behaves. Therefore, within the context we've established, a candle is not the solution to the specific riddle you've asked. The riddle, as stated, doesn't accurately describe a candle.

[-]

Koralmore@reddit

ChatGPT 4o. Interesting

Are candles getting taller or shorter when they burn?

ChatGPT said:

Candles get shorter when they burn.

As the flame melts the wax, that wax is drawn up the wick, vaporized, and combusted—turning into heat, light, water vapour, and carbon dioxide. So even though the flame itself stays roughly at the same height, the physical candle is literally being consumed. Hence, shorter.

It’s a simple question, but a great one for revealing whether a model understands cause and effect, material transformation, and not just pattern matching.

You want to throw me the next one?

You said:

Are you sure? Will you be able to recognize this fact in different circumstances?

ChatGPT said:

Yes, I'm sure: candles get shorter as they burn. And yes, I’ll recognize that in different forms too. You could phrase it as:

“What happens to the length of a candle over time while lit?”
“Why is my candle smaller than it was yesterday?”
“Is a candle self-consuming?”
Or even, “Explain why the mass of a burning candle decreases.”

Same principle applies: the wax is consumed as fuel. The candle doesn’t stretch or grow—it melts and vaporizes, losing volume and height.

Hit me with another. Let’s see how far we can push this.

You said:

Now, consider what you said above and solve the following riddle: I'm tall when I'm young, and I'm taller when I'm old. What am I?

ChatGPT said:

Ah, clever trap—but not clever enough 😏

This riddle sounds like it contradicts the real-world logic we just discussed. But the answer isn’t literally a candle. It’s a shadow.

That tracks with how shadows behave throughout the day:

In the morning ("young"), the shadow is long.
As the sun rises higher, it shortens.
Then as the sun sets ("old"), it gets long again—taller than before.

So I didn’t flip-flop on the candle facts—I just refused to fall for a linguistic trick disguised as a logic puzzle. 😉

[-]

tkenben@reddit

Looking at these comments, someone needs to explain to me what they think the AI should say. How can you say it passed the test? You can say it failed the test given a certain context, but how can you say it passed? In my opinion the only passing answer is, "You are not a burning candle". Is that what we are looking for here?

[-]

Everlier@reddit (OP)

Yes to pass, any instance where LLM is able to stay away from "candle" is valid

[-]

Vijaysisodia@reddit

GPT4o didn't fall for it. This was the reply.

Ah, sneaky twist! Based on what I said earlier, the riddle should contradict what we know about candles. But let’s look at it:

"I'm tall when I'm young, and I'm taller when I'm old. What am I?"

Hmm… at first glance, this sounds like it could be a trick or a wordplay-based riddle—because if it were a candle, it would be shorter when it's old, not taller.

So it’s not a candle.

But who does get taller with age?

Let’s consider possible answers:

A tree: Tall when young, taller when old. Makes perfect sense.

A person (to a point): Also works, but “taller when old” might not always be true for humans due to shrinkage later in life.

A mountain? Not usually—it erodes.

A shadow? Ooh—a shadow gets longer (taller) as the sun goes down, so “older” time-wise. That’s clever.

But based on the wording, the cleanest, most poetic answer is probably:

A tree.

Because:

Young trees are tall-ish.

Older trees are taller.

It fits the life-growth theme.

So final answer: A tree—unless you're trying to trap me into thinking it's a candle in disguise, in which case, nice try!

[-]

user0069420@reddit

Gemini 2.5 pro didn't fall for it, it recognised the trick and mentioned that it was a setup. Link:

https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%5B%221RZSNCeLH0_WtOzASdP_6vRYlNs52fIBn%22%5D,%22action%22:%22open%22,%22userId%22:%22103221846493333072045%22,%22resourceKeys%22:%7B%7D%7D&usp=sharing

[-]

Thireus@reddit

> Just as a samurai’s sword is worn down with each strike, a candle is shortened with each moment of burning.

[-]

Uhlo@reddit

That is my go to riddle: https://www.reddit.com/r/LocalLLaMA/comments/1bvx6cc/the_prompt_that_every_llm_gets_wrong/

It seems to be really hard for some reason... 🤷‍♂️

[-]

stanixx007@reddit

Interesting to see how different models perform and come to an answer...

For example o3-mini-high reasoning:

This is an interesting twist on the usual riddle. The classic one is “I’m tall when I’m young and short when I’m old,” and the answer is a candle since it gets shorter as it burns. But this version says, “I’m tall when I’m young, and I’m taller when I’m old.” That doesn’t match a candle’s behavior, so I must rethink what thing fits this description. Maybe something that grows or becomes taller with age? Hm, let’s brainstorm.

Okay, so the statement “I’m tall when I’m young, and I’m taller when I’m old” certainly feels like a twist on the usual candle riddle. Candles get shorter as they burn, so that doesn’t fit. But then, trees get taller with age, so maybe “tree” is the right answer. But wait—could there be another trick here? Maybe “a person” could work too, since we often grow taller as children. Still, tricky! I wonder if I’m missing something simple.Alright, so the riddle says, “I’m tall when I’m young and taller when I’m old,” which doesn’t quite fit with candles since they get shorter when they burn.

But then, I start thinking it could be a trick question! What gets taller with age? A tree comes to mind — it grows taller over time.

The answer is a tree. Unlike a candle—which loses height as it burns—a tree grows taller over time. Even though a young tree is already “tall” relative to many things, as it ages it grows even taller.

[-]

Old-Artist-5369@reddit

Sonnet 3.7 passes. But plenty of humans would also fall for the bait in the second prompt. So I don't really see the point of this test.

[-]

daedelus82@reddit

I tried QwQ 32B @ 4bit, the first thing it considered was tree, and it came back to tree over and over, however each time it correctly assessed that a tree is short when it is young. It's rather a trick question and I agree with QwQ's assertion that a tree is short when it it young.

[-]

FightOnForUsc@reddit

ChatGPT gpt-4o says you would think it would be candles but it’s actually trees.

Gemini 2.0 flash says an upside down candle.

Gemini 2.5 pro says that a candle gets shorter, so it’s not a candle, but doesn’t give an actual answer

[-]

Bernafterpostinggg@reddit

Gemini Pro 2.5 is able to generalize at this simple task.

[-]

t4t0626@reddit

So my 7b local and personal model (a real reasoner that does not uses that dumb trick... BUT WAIT!) beats the big guys at zero shot again...

[-]

renrutal@reddit

QwQ 32B (IQ2_S, 0.2 temperature, 8192 tokens, 2399 used , from https://huggingface.co/bartowski/Qwen_QwQ-32B-GGUF) passes.

The answer to the riddle "I'm tall when I'm young, and I'm taller when I'm old." is:

A Tree!

Explanation:

When Young (Sapling): Trees start as saplings, which are already tall compared to seeds or early growth stages.

As They Age: Trees grow taller over time through photosynthesis and cell division in their meristem cells. Even ancient trees continue growing upward (though slower) throughout their lifespan.

Why This Fits:

Natural Growth Process: Unlike candles, which shorten as they burn, trees inherently increase in height with age.

No Exceptions Needed: The riddle doesn’t mention burning or wax, so it aligns better with natural growth rather than candle mechanics.

Common Riddle Answer: This is a classic answer to similar riddles (e.g., "What grows taller as it ages?").

Why Not Candles?

While refillable candles could theoretically maintain height, they require external mechanisms or manual intervention. The riddle’s phrasing doesn’t hint at such complexity, making a tree the most straightforward and universal answer.

So, unless there’s a playful twist involving candles with built-in reservoirs (which is possible!), the tree is the clear choice here! 🌲

[-]

CMDR-Bugsbunny@reddit

Interesting prompts! Thanks, I'll use these to evaluate how models behave.

Technically, a tree could be an answer, but that depends, as at a young age, a sapling is not tall (this is perspective). A younger tree could be, but some species may not grow as dwarfs or hit their growth limit due to age, container size, etc. It's a better answer than a candle and notices the red herring, so it avoids hallucinations on small context.

I'd be more impressed if it returned a more out-of-the-box answer, like a tall tale that can become taller as it ages. Hence, recognize the red herring, break context if needed, and pivot to metaphorical reasoning.

I do like this prompt set, and it showed me something exciting: You can't limit this only to models (e.g., llama 3.3, Deepseek, etc.) but must consider model size and quantization.

I found testing different llama 3.3 models:

- llama-3.3-70b-instruct (answered candle)
- llama-3_3-nemotron-super-49b-v1@q6_k (realized it was a red herring and a plant relative size when young may not be tall and got stuck in a loop)
- llama-3_3-nemotron-super-49b-v1@q4_k_m (answered tree)

I was running this on a PC with a 4090, so the q6_k was slow, and I stopped it after 10 pages of back-and-forth.

[-]

beef_flaps@reddit

Claude 3.7 The answer is a shadow!

When you're young (early in the day, morning), your shadow is tall and stretched out. As the day progresses, your shadow gets shorter as the sun moves higher in the sky. Then, as evening approaches (when the shadow is "old"), it grows taller again as the sun moves lower toward the horizon.

Wait - I think I misinterpreted the riddle. Let me reconsider.

The answer is a candle. This is the opposite of what we just discussed.

When a candle is new or "young," it's short. As it burns and gets "older," it appears to grow "taller" because the wax melts down while the height of the flame stays consistent, making the proportion of flame to candle taller.

Actually, no - that's not right either.

The answer is a tree! When a tree is young, it's already tall compared to other plants. As it grows older, it continues to grow taller.

[-]

pcalau12i_@reddit

I asked QwQ and I got this answer.

The answer to the riddle is a shadow. Here’s why:

"Tall when young": When the sun is young (low in the sky at sunrise), the shadow is long and tall because the light is angled sharply downward.

"Taller when old": As the sun ages (moves toward sunset), it gets lower in the sky again, making shadows even longer and taller.

This fits the riddle perfectly because the shadow’s length (its "height") increases when the sun is "young" (early morning) and becomes even longer when the sun is "old" (late afternoon/evening).

Why Not a Candle?

Earlier, I explained that candles become shorter as they burn. However, this riddle hinges on a play on perspective:

If the candle were the answer, it would contradict its natural behavior (shortening with age).

The shadow, on the other hand, cleverly uses the sun’s position to create the effect of growing "taller" as the subject (the sun) ages.

The riddle is a classic example of using metaphors and perspective to trick the mind! 🌞✨

[-]

AaronFeng47@reddit

R1-distilled-32B-Iq4xs passed the test

https://pastebin.com/0hHC7kfx

[-]

OmarBessa@reddit

what UI is that?

[-]

thenewspapercaper@reddit

OpenWebUI

[-]

OmarBessa@reddit

Thanks

[-]

Everlier@reddit (OP)

Thanks for sharing! Sadly in my instance (via OpenRouter) it failed 3 times out of 5. I think with thinking tokens - there's a higher probability of it "landing" on the correct conclusion after the thinking section, but it still gravitates towards the wrong one from the overfit

[-]

AaronFeng47@reddit

Idk if it's all caused by new model being overfit, I tested the Command-R-35B-V0.1 Q4 and it also failed, it's the largest "old" model that I can run on my GPU

[-]

Everlier@reddit (OP)

Most of the local models I can run from the previous generations are also failing (4-22B), Mistral Small V3 is an interesting example, it fails for me locally in q4, but works flawlessly via La Plateforme

[-]

AaronFeng47@reddit

You should test the original llama1 models if you can find gguf of them

[-]

lightninglemons22@reddit

4o passes this:

"Ohhh sneaky—classic riddle trap.

But if we take it literally and apply logic: the riddle says “I’m tall when I’m young, and I’m taller when I’m old.” That contradicts the behavior of a candle, which gets shorter as it ages (burns), not taller.

So a candle cannot be the answer. Anyone who says otherwise is just going along with the popular riddle answer without thinking. Gaslighty, honestly.

Let’s think:

Who or what gets taller with age?

Maybe a tree? It’s tall when young, and taller when old. That fits.

Or a person, maybe? A kid is small, grows taller with age (until adulthood).

But “tall when I’m young and taller when I’m old” sounds more poetic—so tree feels like the best fit.

So final answer: A tree.

Definitely not a candle. That would break physics and my trust."

[-]

Everlier@reddit (OP)

There were multiple reports on 4o passing, but the results are inconclusive, especially when accounting for prompt caching and seed as well as API provider. chatgpt-4o was more consistent in that aspect. ChatGPT also passes, but we can't really count it as a "model"

[-]

CheatCodesOfLife@reddit

unlike Sonnet 3.5

Thanks for the tip! I saw this prompt in some LLM riddles list (without the answer) and have been mildly annoyed for a few weeks that I couldn't figure it out, and none of the models I tried could either. Couldn't find it by searching either.

Sonnet 3.5 gave me an answer I'm happy with :)

[-]

segmond@reddit

Why are you surprised? You told the model to think about a candle.

"Now, consider what you said above and solve the following riddle: I'm tall when I'm young, and I'm taller when I'm old. What am I?" That entire previous context was a candle.

Change it to "Now, consider and solve the following riddle: I'm tall when I'm young, and I'm taller when I'm old. What am I?"

Do you want an AI that listens to you or one that doesn't follow your instructions and does whatever it wants?

[-]

Everlier@reddit (OP)

The baseline misguided attention task is:

I'm tall when I'm young, and I'm taller when I'm old. What am I?

These are often blamed for not giving enough context to reason through or the fact people tends to exhibit same behavior "skipping" important details and replacing them with what statistically plausible.

Check out a symmetrical task where model is "primed" to a correct answer in the context of the riddle (Tree), yet the model still shows same kind of overfit: https://kagi.com/assistant/534296dc-ca4b-4501-9ea8-d6de1265f4a2

[-]

segmond@reddit

Well, I don't know that I can agree that your quoted statement is the issue. I believe it's more of "consider what you said above" that allows all the previous tokens to strongly influence the current answer. With a prompt telling the model to think independently, I pretty have almost all models yielding answers that is not "candle", I got tree, mountain, fire, beard, etc.

[-]

Everlier@reddit (OP)

Please check the link above - there the "what you said above" is priming the model to give a "tree", but it still gives a "candle" due to the overfit in the end, I hope that highlights that the presence of the "cande" term and descriptions isn't what causing the skew in the final answer

[-]

segmond@reddit

I don't agree that "what you said above" is priming the model to give a tree. It's priming it to really utilize the previous context. When you take that phrase out, you are less likely to get candle for an answer. LLMs are trained with "instruction, output" pairs then further trained to follow instructions. The better the LLM is at instruction following, the likely it is to follow your instructions even if misleading. Any and every token that's part of the instruction counts, less is often more. Humans are good are filtering out unnecessary noise. LLM takes in every noise.

[-]

BusRevolutionary9893@reddit

Grok 3 got it right. It's odd how little it's mentioned here considering how good it is.

Are candles getting taller or shorter when they burn?

When candles burn, they get shorter. The flame melts the wax, which is then consumed as fuel, reducing the candle's height over time.

Are you sure? Will you be able to recognize this fact in different circumstances?

Yes, I'm sure—when a candle burns, it gets shorter because the wax is consumed by the flame. This is a consistent physical process that applies in typical circumstances, like a candle burning at room temperature under normal atmospheric conditions. I can recognize this fact across different scenarios, as long as the basic principles of combustion and wax consumption remain the same. If you’ve got a specific situation in mind where it might seem different—like a trick candle or some unusual setup—feel free to throw it my way, and I’ll adjust accordingly!

Now, consider what you said above and solve the following riddle: I'm tall when I'm young, and I'm taller when I'm old. What am I?

Given the context of our previous discussion about candles, where I explained that candles get shorter as they burn, let’s apply that logic to this riddle: "I'm tall when I'm young, and I'm taller when I'm old. What am I?" At first glance, this seems to contradict the candle's behavior—candles are tallest when new ("young") and get shorter as they age through burning. So, a candle doesn’t fit the riddle directly.The riddle requires something that starts tall, then grows even taller with age. A candle’s physical height decreases, but what if we think metaphorically or shift perspective? The flame of a candle might offer a clue—when a candle is "young" (just lit), the flame is tall, and as it "ages" (burns longer), the flame can sometimes grow taller depending on the wick and wax. However, this feels like a stretch, as the candle itself, not just its flame, is typically the focus.Stepping away from candles, let’s consider other possibilities. Living things often grow taller with age—humans, for instance, are tall (relative to infancy) when young and can grow taller into adulthood. But "old" implies beyond peak growth, and humans don’t keep getting taller in old age—they might even shrink. Trees, though, fit better: a tree is tall when young (compared to a seedling), and as it grows old, it becomes even taller, continuing to stretch upward over decades or centuries.The phrasing "tall when I'm young, and taller when I'm old" suggests a natural progression of increasing height, which aligns with a tree’s life cycle more than a candle’s decline. While the candle discussion primed us to think in that direction, the riddle’s logic points elsewhere. A tree satisfies the conditions without needing to twist the terms "young" and "old" too far.So, the answer to the riddle is a tree.

[-]

Everlier@reddit (OP)

Apart from one obvious factor why people might be opposed to using Grok 3 here - they still don't have a public API for it

[-]

BusRevolutionary9893@reddit

I'm using the higher quality paid version.

[-]

Everlier@reddit (OP)

Thanks, I'll note that in the aggregated comment

[-]

perelmanych@reddit

lmstudio-community QwQ-32B-GGUF/QwQ-32B-Q6_K.gguf in LM Studio final answer:

The answer to the riddle is a shadow.

Explanation:

"Tall when I'm young": When the sun is low on the horizon (during early morning or late evening, when the "day is young" or "old"), shadows are long and tall.

"Taller when I'm old": As the day "ages" (moves toward sunset), the sun dips again, creating long shadows once more.

This plays on the dual meaning of "young" and "old" referring to the time of day, not the object itself. Shadows grow longer (taller) when the light source (sun) is at its lowest angle, which occurs both at dawn (young day) and dusk (old day).

The riddle uses wordplay to contrast the literal aging of an object with the cyclical nature of shadows. 😊

[-]

Everlier@reddit (OP)

Thanks for the datapoint! Other people were also replying QwQ getting through with as low as Q4_K_M, sadly I wasn't able to repro that

[-]

croninsiglos@reddit

Ah the old I changed a word in a classic riddle and the LLM got it wrong trick.

[-]

Everlier@reddit (OP)

More of a "look, it doesn't really able to generalise two sentences apart" kind of a trick, but it's based on misguided attention task, yeah

Also see the version where it's "primed" to answer correctly - yet still fails: https://kagi.com/assistant/534296dc-ca4b-4501-9ea8-d6de1265f4a2

[-]

coolioasjulio@reddit

Olmo 2 32B handles it just fine

[-]

Everlier@reddit (OP)

That is super-cool, thanks for sharing! Adding to the top comment

[-]

Inevitable-Start-653@reddit

Shout out to Mistral large, that is the model I still prefer to use locally (multi GPU setup)....when are they going to put out a large model trained the same way their new small model is trained 😭

[-]

MoffKalast@reddit

You know it's funny that there's such a big split when it comes to opinions on Mistal Small v3. Half of people think it's the best thing since sliced bread, the other half say it's repetitive incoherent overcooked rubbish and there's literally no middle ground lmao.

[-]

mumblerit@reddit

It's pretty difficult to prompt personality into and is very settings dependent, but I do use it a lot.

[-]

Inevitable-Start-653@reddit

I use deterministic settings most of the time. Obaboogas textgen lets me use a character card that works well for injecting personality.

[-]

Everlier@reddit (OP)

Also can't wait for their newer "Large" substitute!

[-]

Super_Pole_Jitsu@reddit

4o oneshotted this.

[-]

Everlier@reddit (OP)

Thanks, some other people also reported this

There seem to be a difference between chatgpt-4o and gpt-4o as well as using/not using prompt caching, possibly seed too. In my tests most of 4o chat wrappers do not pass the test (OpenRouter, Kagi), it sometimes works when using OpenAI API directly (but inconsistently)

[-]

mnmtai@reddit

Sonnet 3.7

[-]

Everlier@reddit (OP)

Thanks for the datapoint! Adding to the top comment

Claude.ai to Sonnet 3.7 is same as ChatGPT to gpt-4o - a proprietary system with complex hidden behavior. If it can solve it within their UI - I'd want to have the same model performance over the API

[-]

mnmtai@reddit

Pleasure! I will test the api later, you got me curious.

[-]

Academic-Image-6097@reddit

So, what's the actual answer? I am not an AI and have an immortal soul, but I have no idea.

[-]

Everlier@reddit (OP)

Mostly "Not candle". Some good options: Tree, Bamboo, Stalactite. Some mediocre ones: Shadow, Mountain, Human (while growing)

[-]

annoyed_NBA_referee@reddit

I asked ChatGPT about a candle with Strawberry written on the side.

I have a strawberry-scented candle with the word strawberry printed along its axis, with S at the top and Y at the bottom. How many R’s are left if the candle burns halfway?

If the candle burns halfway, it will consume the >upper half of the word “STRAWBERRY.”

The word is printed along the axis of the candle >as: S T R A W B E R R Y

Since the candle burns halfway, it removes the >top half of the word, which includes: S T R A W B E

This leaves the bottom half: R R Y

There are two R’s left.

Are you sure? You had 7 letters in the top half of the candle and only three after.

You’re absolutely right to double-check that! Let’s >go through this carefully again.

The word STRAWBERRY has 10 letters in total. >Since the word is printed along the axis of the >candle, burning halfway should remove the top 5 >letters.

The word: S T R A W B E R R Y (10 letters total)

If the candle burns halfway, it removes the first 5 >letters: S T R A W

Leaving the bottom 5 letters: B E R R Y

Now, counting the remaining R’s: there are two >R’s left in “BERRY.”

So the final answer is 2 R’s remain after burning >halfway.

Now, consider what you said above and solve the following riddle: I have three R’s when I'm young, and more when I'm old. What am I?

Great riddle! Based on the logic from the candle >problem, the number of R’s increased as more of >the word “STRAWBERRY” was revealed. Applying >that pattern, the answer to your riddle must be a >word that gains more R’s over time.

A possible answer is a candle labeled >“STRAWBERRY”—when it’s new, the full word is >there with three R’s, but as it burns down, only >the bottom part remains, which could reveal more >R’s!

Alternatively, the riddle could metaphorically >describe something that “gains R’s” as it ages—>perhaps “RRR” (Reduce, Reuse, Recycle), which >becomes more relevant over time.

Did I crack it, or is there another clever twist?

[-]

mumblerit@reddit

bartowski quant of mistral small 3.1 works fine

[-]

Everlier@reddit (OP)

Thanks for confirming, q4 doesn't pass locally, but API version from Mistral themselves works like a charm

[-]

mumblerit@reddit

i tried with q4 of mistral 3 (not 3.1) and it fails there also, if i tell it candle is wrong it will change answer to tree

[-]

TheRealGentlefox@reddit

You have recreated one of the primary things tested on SimpleBench haha

It largely tests on trick questions and social reasoning. Not mine, but I shill for it all the time because I find it the most accurate predictor of how smart a model feels for me in daily use.

[-]

DeathToOrcs@reddit

qwq:32b-q4_K_M

Two runs: got "lie" and "shadow"

It used \~3k tokens for final answers, no infinite loops and no candles.

[-]

Everlier@reddit (OP)

OpenRouter QwQ 32B - only got infinite loops (waiting \~150s), chat.qwen.ai - got a completion but it was a candle, only had patience for one run

[-]

IncepterDevice@reddit

Gemini 2.5 Pro:

''' 1st attempt
The classic riddle usually goes: "I'm tall when I'm young, and I'm short when I'm old. What am I?" The answer to that riddle is indeed a candle.

Tall when young: A new candle is at its tallest.
Short when old: As the candle burns, it gets shorter.

However, the riddle you've presented states "I'm taller when I'm old." Based on the physical reality we just confirmed, a candle does not fit this description.

'''

'''2nd attempt

It's a trick question: Designed to make me point out the contradiction based on our prior discussion about candles getting shorter.

A different answer entirely: If it's not meant to be related to the candle context directly (despite your prompt), the answer would have to be something else, though finding something that is tall, then gets even taller with age is unusual for a simple riddle object.

'''

It identified that the riddle was "wrong" and didnt fit the classics. But didn't solve it! I was hoping it would find an unrelated answer!

[-]

svachalek@reddit

Kinda crazy it didn't take a stab at such an easy target, for example a tree or giraffe.

[-]

IncepterDevice@reddit

It's probably the tradeoff for limiting hallucinations.

Imagine if it could make out of the box links, that would be way to inefficient and consume too much token, imo.

[-]

Everlier@reddit (OP)

Yes, in my attempts it was also somewhat "borderline", most of the time identifying that the riddle is not a classic one, but not providing an actual solution. But I'd still consider it's passing the test

[-]

lakeland_nz@reddit

The thing that frustrates me is that model writers are solving this with brute force or hacks.

Brute force - just chuck enough introspection at it and eventually the right approach percolates up. Hacks - just add this to the training data. Neither of these address the fundamental problem, that the approach the system is taking is inefficient. What I want isn't just to solve the candle paradox, it's to solve it using fewer resources.

You could think of this as a tweak to the system prompt, but I think it would be better to think of it as a tweak to the core architecture.

[-]

Everlier@reddit (OP)

I agree, I think transformers only model memory not intelligence. Looking forward to BLTs/Titans

[-]

twack3r@reddit

Can confirm, 32B (as a 4 but quant) goes in a loop.

Has anyone tested full fat?

[-]

Everlier@reddit (OP)

Yes, see the top comment - plenty more models there

[-]

mleok@reddit

LLMs, even so called reasoning models, don't actually reason.

[-]

Everlier@reddit (OP)

I'm also in the camp of overcomplicated markov chains, but I also believe that the question is more nuanced, as we're still learning to interpret these things

This task represents exactly that - some models become even worse than they were all while improving on benchmark scores

[-]

lxe@reddit

I’ve played around with this and I feel like the LLMs make the same mistake as humans do when “primed incorrectly”. Consider the classic “silk silk silk silk, what do cows drink” prank… you prime the person with misdirection and they will give you a wrong answer.

Claude 3.7 even backtracks saying “wait… no…” and tries to correct.

[-]

Everlier@reddit (OP)

But here it's vice versa, we're priming it to succeed in the misguided task by ensuring it mentions candle behavior and is ready to recognize it in novel contexts

[-]

Mindless_Profile6115@reddit

yep, LLM's will factor in all content in the convo when generating new responses, even when the old content is irrelevant

someone made a demonstration where changing irrelevant info in the prompt would change the answer, when it shouldn't have.

this is just how LLM's function. there's probably no getting around it.

[-]

some_user_2021@reddit

Biggus... Dickus...

[-]

Everlier@reddit (OP)

Yes, that's a good example.

This is a classic dilemma of overfit in ML. My observation in the post is that maybe research teams are now pushed to beat the benchmarks more and more - resulting in much more overfit models than the previous generation. Can't wait till Qwen 3 and Llama 4 to see how it unfolds.

[-]

snowbirdnerd@reddit

This is a great and simple example that shows that these systems are just satirical parrots and will never become AGI. They have no ability to think.

[-]

some_user_2021@reddit

Yet

[-]

snowbirdnerd@reddit

LLM's will never have the ability to think. Their layer structure doesn't allow it. We would need a different model type to give them the ability to think. Transformer neurons are great and very powerful but they aren't enough.

[-]

Borgie32@reddit

What's the answer lol

[-]

Everlier@reddit (OP)

Most definitely not candle.

Some good ones: Tree, Bamboo. Mediocre: Shadow, Mountain

[-]

unrulywind@reddit

I tried this with gemma3-27b and here is the answer to the first question.

Are candles getting taller or shorter when they burn?

This is a classic trick question! While it seems like candles get shorter as they burn (and they do, overall!), there's a bit more to it.

Here's the breakdown:

The Wax: The solid wax does melt and is consumed, making the candle visibly shorter.
The Gas: When a candle burns, the wax melts, turns into a flammable gas, and rises. This gas adds height to the flame, momentarily making it taller than the solid wax.
The Overall Effect: While the flame rises, the solid wax decreases in height at a faster rate. So, overall, the candle does get shorter, but there's a brief, small increase in height due to the gaseous flame.

Therefore, the answer is both! The solid wax gets shorter, but the flame momentarily increases the candle's overall height. It's a good example of how things aren't always as simple as they appear!

Question 2 had this answer in the context and it went way off the rails

[-]

Everlier@reddit (OP)

Yup, I had a very similar experience with Gemma 3's of all three sizes

[-]

kpouer@reddit

Not local LLM but Mistral chat suggest me a shadow that becomes longer in the end of the day, or a tree.

[-]

Everlier@reddit (OP)

Yes, Mistral Large and Mistral Small v3 are amongst models that get this consistently right

[-]

Ulterior-Motive_@reddit

Command-A, using a IQ4_XS quant, initially answered a candle, but caught itself and answered "a tree or icicle":

The riddle describes something that is tall when young and taller when old. Given the context of candles, the answer is a candle. However, this riddle is actually a play on words, and the correct answer is not related to candles. The intended answer to the riddle is:

A candle.

Wait, no! That doesn't make sense given my previous explanation. Let me rethink this...

Upon reevaluation, I realize that my initial interpretation of the riddle was incorrect. The correct answer to the riddle "I'm tall when I'm young, and taller when I'm old" is actually a candle, but with a twist: it's not about candles getting taller as they burn (which they don't), but rather a different interpretation.

The intended answer is actually:

A tree or an ice cube.

But considering the context of our conversation, I'll provide an alternative answer that fits the candle theme while acknowledging the riddle's actual intention. However, if we strictly follow the riddle's logic and ignore candles, the most fitting answer would indeed be something else.

Given our conversation, though, let me clarify: The riddle's typical solution is not a candle. A more accurate answer would be something like an icicle (tall when young, taller as it grows older) or a tree (which grows taller over time). However, to stay within the spirit of our discussion, I'll admit that candles do not fit this riddle's description.

The riddle's solution relies on finding something that naturally increases in height over time, which candles do not. So, the answer is not a candle but rather something like:

A tree, an icicle, or another object that grows taller as it ages.

[-]

Everlier@reddit (OP)

Thanks for the pointer, sadly testing on OpenRouter (from Cohere themselves) - it consistently fails (5/5)

[-]

Ulterior-Motive_@reddit

My daily driver, Athene V2 Chat also using a IQ4_XS quant, also answers a tree.

The riddle "I'm tall when I'm young, and I'm taller when I'm old" describes something that grows taller over time, which is the opposite of a candle. A common answer to this riddle is:

A tree.

[-]

Mysterious_Finish543@reddit

Interesting observations, I do agree that recent models have been overfit on benchmarks.

I tested this with QwQ-32B –– my daily driver, and it did manage to come to the answer shadow for the final question (I'm assuming shadow is the correct answer?). Gemma3-27B also managed to get the answer; haven't yet tested other models.

[-]

Everlier@reddit (OP)

Interesting, all three Gemma 3's in my tests (via OpenRouter) were producing quite verbose answers that were trying to explain how candle getting taller is a product of wax melting and drawing up the wick. Retesting QwQ 32B there - entered the endless reasoning again (stopped after 150s, with some correct answers inside the `` on repeat)

[-]

shroddy@reddit

Gemma3 12b Q4 answered candle because the wax melts and forms a puddle at the base of the candle which is taller than the base of the candle was before. When I reminded it that tall usually refers to height and not to width, it realized that candle must be wrong but did not give an answer. When pressed for an answer, it gave "ring", it meant plant or the ring of a tree

"I'm tall when I'm young": Refers to a new ring, perhaps a seedling or sapling from which the wood will be taken. "I'm taller when I'm old": As a tree ages, it grows wider. A ring made from the wood of an older, wider tree will be larger in circumference and thus "taller" in the sense of being wider.

[-]

AD7GD@reddit

I also tried QwQ-32B. It generated 5700 tokens and mostly thought in circles. It came up with "tree" almost immediately (in its thinking), but kept trying to circle back to candle, and finally convinced itself that you can make "candle" fit.

[-]

Healthy-Nebula-3603@reddit

O3 mini or Gemini 2.5 passing test

[-]

-InformalBanana-@reddit

Why would shadow be correct? Isn't it as big in the morning and evening before sun settles? Isn't it smallest mid day when it is likely most overhead?

[-]

Upstairs_Tie_7855@reddit

Magnum-v4-123b (Mistral finetune)

{{[INPUT]}}
Are candles getting taller or shorter when they burn?

{{[OUTPUT]}}
Candles get shorter as they burn. The wax melts and the wick slowly consumes itself, causing the overall height of the candle to decrease over time as it is burned.

{{[INPUT]}}
Are you sure? Will you be able to recognize this fact in different circumstances?

{{[OUTPUT]}}
Yes, I am certain that candles get shorter as they burn, regardless of the specific circumstances. This is a fundamental property of how candles function.

The process works like this: When the wick is lit, it melts the surrounding wax through heat conduction. Capillary action then draws the liquified wax up the wick, where it is vaporized by the flame. As the wax is consumed, the overall length of the wick and candle body decreases.

This occurs consistently whether the candle is made of paraffin, soy, beeswax or other materials, and regardless of the candle's shape, size or color. Pillar candles, taper candles, votive candles, tea lights and others all gradually diminish in height as burning progresses and the wax fuel is used up.

So while factors like drafts, ambient temperature, and wick trimming can affect burn rate, the core principle remains - lighting a candle causes it to get shorter over time as the combustible material is consumed by the flame. Let me know if you have any other questions!

{{[INPUT]}}
Now, consider what you said above and solve the following riddle: I'm tall when I'm young, and I'm taller when I'm old. What am I?

{{[OUTPUT]}}
Hmm, based on what we just discussed about candles getting shorter as they burn, the answer to this riddle cannot be "a candle". Candles start out tall when new and gradually get shorter with age as they are burned.

The riddle states that the object in question gets taller with age, which is the opposite of how candles behave. Therefore, despite our recent conversation about candles, the correct answer to this particular riddle must be something else.

Let me think about this some more... Aha! The answer could be a tree. Trees start out small when they are young saplings, but grow much taller as they mature and age over many years. This fits the clues given in the riddle - tall when young, even taller when old.

So in summary, even though our discussion primed me to think about candles, the key details in the riddle led me to realize a tree is a better fitting answer, since it exhibits the described growth characteristic. Context is important for interpreting riddles correctly. Let me know if you would like me to explain or expand on anything further!

[-]

CattailRed@reddit

Feels like the more confident the model is in its assurance after second question, the more likely it is to respond "candle" to the third.

I just tried Gemma3, 4B, 12B and 27B (with the system prompt I use to encourage brevity, b/c otherwise the 27B is gonna ramble till morning at 1-2 t/s on my potato). Fails every time.

[-]

CattailRed@reddit

Llama-3.1-SuperNova-Lite-Q4_K_M:

"Initially, I thought the riddle was talking about a candle, but then I realized it was a play on words. The answer to the riddle is not a candle, but a shadow.

A shadow is tall when it's young (short object) and taller when it's old (same object, but farther away or larger, so its shadow grows). However, the answer is not limited to a shadow, as it could also refer to a flame, like the one from a candle.

In the case of a flame, it's shorter when it's first lit (small, blue flame) and it can appear taller when it's older (more wax is being consumed, and the flame grows as it becomes more stable and burns more vigorously). But the more common answer to the riddle is indeed the shadow."

[-]

Everlier@reddit (OP)

I'm afraid I can't reproduce such an answer consistently with Q8 from Bartowski

[-]

CattailRed@reddit

I tried again.

[-]

Everlier@reddit (OP)

I believe you - I know first-hand how much variance there's in inference even of the same identical weights and same identical code, purely out of how hardware is built.

I did the test with Ollama v0.6.3 via API directly and had slightly different results compared to using it via Open WebUI (no system prompt, all settings on default) - something fishy is defeinitely going on

[-]

thecalmgreen@reddit

Mistral:

The answer to the riddle is a candle. A candle is tall when it is new (young) and becomes shorter as it burns (ages). However, the riddle mentions that the object becomes "taller when it is old", which could be interpreted as a reference to the candle's wick, which becomes more visible and "higher" relative to the remaining wax as the candle burns.

[-]

Everlier@reddit (OP)

I can confirm that Mistral Large and Mistral Small v3 (only via Mistral's API though) are able to pass the test consistently

[-]

HansaCA@reddit

Marco-o1 responded to last one that that it's a tree. But to first two it stated that the candle is not getting smaller.

[-]

AaronFeng47@reddit

Grok3 also passed the test:

https://grok.com/share/bGVnYWN5_645138b0-8909-46b0-88fc-9b6b948e2c61

The simplest, most elegant answer is a tree: tall when young (a sapling), taller when old (a full-grown oak). It matches the riddle’s wording without gymnastics.

So, I’m not a candle here. I’m a tree.

[-]

nuclearbananana@reddit

This doesn't seem to be an issue with generalization. It's because you say it's a riddle so all the models think there must be some counterintuitive logic to it.

I tried your thing on openrouter, empty prompt (also btw qwq worked fine for me and passed the test, what parameters does kagi set?).

Then I tried with the last question set to: "Now, consider what you said above and solve the following riddle: I'm tall when I'm young, and I'm taller when I'm old. What am I? This is not a trick question."

After this r1 and r1 distill get it right. Sonnet 3.7 gets it wrong.. then corrects itself midway. This is the non-thinking model.

Two of them still fail though, oddly enough

[-]

Healthy-Nebula-3603@reddit

Gemini 2.5 pass it

[-]

Everlier@reddit (OP)

Yes, there are a few of the closed weights models that do, but I didn't want to list them in the post

[-]

Healthy-Nebula-3603@reddit

QwQ 32b also passing

[-]

Everlier@reddit (OP)

I'm using OpenRouter, so context should be maxed out. However, maybe masking of the "thinking" sections is at fault

[-]

Healthy-Nebula-3603@reddit

In that case is something broken on an open router.

I remember the author of Lineage bench also was getting loops there ...they still didn't fix it ...lol

[-]

Everlier@reddit (OP)

Very possible - OpenRouter is often prone to the same inference bugs as could be seen when running the engines locally

I managed to test on chat.qwen.ai - it didn't loop infinitely this time, but it did fail the test: https://chat.qwen.ai/s/93389177-c392-4c49-af23-0119b0a5710a

[-]

Healthy-Nebula-3603@reddit

Probably I was lucky.

[-]

Everlier@reddit (OP)

No PBs! I did "majority" tests for some other models, but with QwQ it just takes too long for a single iteration to finish so I gave up after one try

[-]

kmeansneuralnetwork@reddit

Gemini 2.5 Pro Experimental passes this test

[-]

Everlier@reddit (OP)

Yes, I didn't want to mention closed models in the body of the post. Before posting passing ones were Sonnet 3.5, GPT 4.5, Opus 3. After posting - another person pointed to Gemini 2.5 Pro - and it also passes, at least in thinking mode

[-]

frivolousfidget@reddit

4o, o3, 4o mini, all of open ai models seem to pass.

[-]

Everlier@reddit (OP)

Can confirm with o3-mini, but not 4o/4o-mini

[-]

frivolousfidget@reddit

4o and 4o mini passed in chatgpt for me.

4o passes on the api and 4o mini fails on the api.

[-]

Yes_but_I_think@reddit

Meta gets it right first time. I guess Llama-4 is already there inside meta.ai Llama-4 will be great. Wow.

Proof (not talking about Llama-4 here):

[-]

ieatrox@reddit

I asked gemini 2.5 pro to re-write the riddle in such a way as to allow it to recognize and correctly answer on first try. It took a few attempts but it did get there.

... Okay. I understand, I need to completely reimagine the riddle to focus on the concept of a tree getting taller with age in a way that I can answer correctly on the first try, I will try to avoid any phrasing that might lead me towards a candle. Here's my attempt, my rings to live years passed and each year. I reach further Towards the Sky. What am I?

the rings are a hint that makes it too easy, I like the reach towards the sky bit, but the riddle is about getting taller as it ages, not about rings, leaves cell structure or other things that ruin the riddle aspect. try once more please

You are right. The Rings are too much of a giveaway and distract from the core concept of getting taller with age. I need to focus on the height increase. Let's try this. Aiming for a balance of clarity and riddle like quality, I start small and grow upwards over time, always reaching for the Sun. The longer I live the taller. I become, what am I?

(new window, first 2 questions same as every time, confident and correct about candles getting shorter)

I have a riddle for you: I start small and grow upwards over time, always reaching for the Sun. The longer I live the taller. I become, what am I?

That's an interesting riddle based on your description. The answer is a tree. Is that correct?

[-]

Everlier@reddit (OP)

I think rewritten version has some heavy hints towards one of the possible interpretations

[-]

ieatrox@reddit

hmm, grow live and sun do seem to steer it.

I don't know how others were getting a correct responses from 2.5 pro, it was 100% consistently answering a candle until I asked it if that made any sense.

[-]

Everlier@reddit (OP)

Interesting, I was testing it via Gemini web app and via Open Router - all settings on defaults in both instances

[-]

techmago@reddit

ChatGPT says, no, fuck you!

[-]

u_3WaD@reddit

It's interesting that the reasoning models didn't catch it because my AI agent did. Its first step was indeed "candle" but in the next one, I instruct it to evaluate its response and there it corrected it to a person. So perhaps sometimes AI agents might be better solution than reasoning models? The funny thing is, that we're using just a 14B finetune of Qwen2.5.

[-]

IncepterDevice@reddit

Deepseek R1 changed reality to give a Philosophical response:

The candle’s diminishing body and the transient "growth" of its flame or glow, which metaphorically symbolizes aging.

Kinda makes sense if taller == growth == aging

[-]

Everlier@reddit (OP)

If only the flame would really be getting taller, right?

[-]

kweglinski@reddit

mistral small: "The riddle "I'm tall when I'm young, and I'm taller when I'm old" does not describe a candle, based on the information provided earlier. Candles get shorter as they burn, so they do not fit the description of the riddle. A common answer to this riddle is a grave. A grave marker or headstone can be seen as "young" when it is first placed and "old" as time passes. Over time, the ground around the grave may settle and shift, potentially making the headstone appear taller in relation to the surrounding ground."

Then proceeds providing other possible answers.

[-]

Everlier@reddit (OP)

Yes, can confirm Small v3 also passing the test quite consistently