qwen3.6:35b always fails on this, unless very high resolution

Posted by qfghclvx@reddit | LocalLLaMA | View on Reddit | 22 comments

This is an exercise every child (I guess) can solve correctly. qwen says solution B is right, or D.

What would you say? Try it.

This was the thinking process around point Q. I can’t follow how it is so wrong:

But much much later (many thousand thinking tokens later):

And again later:

I don’t understand how it can misinterpret the slope so wrongly. And then correct again.

Gemma4:26b got this right most of the times, but sometimes says solution A is correct. Gemini 3.1 flash lite is always wrong and says solution A. But Gemini 3.1 pro preview is always correct.

And very interestingly: Opus 4.7 and Opus 4.6 always say solution A (mostly) or D is correct. Oh my god.

Although this looks like an easy exercise, this seems to be very difficult visual input. A good benchmark.

All other “difficult” visual physics exercises were solved correctly by qwen3.6:35b, where even Opus 4.7 failed and gemma failed at 26b but got it right at 31b. Do you want to see them?

The worst thing of gemma:26b was, that it produced so many hallucinated words in longer solutions and therefore made also markdown/latex errors. gemma:31b didn’t have that problem. And qwen3.6 never has.