Why does AI fail to generate simple ASCII images ?

Posted by ConcernedIndInvestor@reddit | LocalLLaMA | View on Reddit | 16 comments

I saw a post earlier about MineBench. I was impressed to see that the latest models can produce such realistic outputs. Their ability to understand the prompt and make spatial modifications were impressive.
But when I asked the models to generate simple ascii images, they failed spectacularly.

Prompt: Draw simple ascii image of a person touching his eyes.

gemma-4-31b-it

O /
  /|/
  / \

(looks like someone hung themselves to me)

grok-4.1-thinking

    (=⌵=)
   ( x x )
    ( ─ )
     ||||
     ||||
    /    \    (=⌵=)
   ( x x )
    ( ─ )
     ||||
     ||||
    /    \

deepseek-v3.2-exp-thinking

( ͡° ͜ʖ ͡°)( ͡° ͜ʖ ͡°)

I also tried Qwen 3.6 Plus gemini-3-flash-preview and free version of ChatGPT. All the models failed and produced absurd outputs. Do the latest local models produce any better results ? I don't understand how AI can solve advance math and fail at such a trivial task!