An LLM-Proof Approach to Reinventing Captcha Systems

Posted by adrianben10lam@reddit | LocalLLaMA | View on Reddit | 46 comments

After Claude's computer-use came out, it got me thinking: what happens to CAPTCHAs when LLMs can be prompted to act like humans too?

While studying how Claude processes visual info, I noticed something interesting - AI sees things frame-by-frame, but humans naturally experience motion blur. So I built a CAPTCHA that uses this human quirk to stay AI-resistant. I thought this was a fun experiment so I wanted to share this with y'all!

https:\/\/www.linkedin.com\/posts\/adrianlhlam_i-am-thrilled-to-announce-that-benedict-neo-activity-7259610466417586176-l7dU?utm_source=share&utm_medium=member_desktop

[-]

Disastrous_Ad8959@reddit

The goal is to solve captchas not create new ones

[-]

No-Refrigerator-1672@reddit

I think this solution is a no-go. I can very easily imagine epileptic people getting sick of this test, as well as I suspect people with slow reaction will fail it miserably (like elders, brain damage patients, etc). I highly doubt that this concept is suitable for random audience, meanwhile defeating it is as easy as capturing multiple frames and combining them into one.

[-]

Unlikely_Track_5154@reddit

I haven't even taken the test and I am not epileptic and I am tired of it.

[-]

RikuDesu@reddit

yeah a lot of the maplestory korea anti cheat systems and MMO bot detectors would really give LLMs a lot of trouble.

[-]

GhostOfaBotInPants@reddit

Games for captcha is great idea. It's so much better than guessing the swirly character is a o O 0 or big fat 6 with a small head.

[-]

Dead_Internet_Theory@reddit

or worse, those that have an l / I.

[-]

DeltaSqueezer@reddit

Just ask how many 'r's in the following word...

[-]

Dead_Internet_Theory@reddit

o1-preview: "My chain of thought contains 496 letters 'r', and it cost $3.29 for the complete analysis"

gemini-pro: "I'm sorry, but the R word, and 'hard R' are both dehumanizing language, thus we had to terminate your Gmail account for repeated violations of our terms of service"

Claude Sonnet: Artifact: "Analysis of the letter R" (click to view more).

[-]

shroddy@reddit

Claude might be not that great, it often does not see things that Gpt or Gemini or often even Molmo or InternVL2 see

[-]

Dead_Internet_Theory@reddit

it's not just about seeing, but high framerate video capture and motion analysis. Still too difficult for LLMs.

[-]

KTibow@reddit

Cool, but...

This is both impossible for blind users and easily circumvented if it's imemented on the client side

[-]

water_bottle_goggles@reddit

skill issue

[-]

Dead_Internet_Theory@reddit

If you can't see... like, just open your eyes.

[-]

CodeMurmurer@reddit

Well, don't we already have a id system for blind users where they won't be prompted for a CAPTCHA?

[-]

Sudden-Lingonberry-8@reddit

disabled people are not allowed on the Internet didn't you get the memo /s

[-]

CodeMurmurer@reddit

Can't you read?

[-]

noneabove1182@reddit

If nothing else this would definitely buy some time! Vision models would have to be incredibly advanced in order to solve this, basically real time, great idea!

[-]

Qual_@reddit

[-]

Dead_Internet_Theory@reddit

I assume... you click on "E"? Is that it?

[-]

gofiend@reddit

I guess if you were to build some sort of custom image capture to average out frames you'd defeat this pretty quickly, but it raises the bar quite a bit on attackers. Good stuff!

[-]

adrianben10lam@reddit (OP)

Yeah that's definitely a consideration we had. I can imagine potentially sporadic frames might be interesting, but we're gonna need much more complex solutions for this growing problem.

[-]

remixer_dec@reddit

I hate it, feels like a machine playing cat-vs-laser game on humans, imagine solving this multiple times a day

[-]

kulchacop@reddit

Something tells me this is going to be defeated soon

https://www.reddit.com/r/LocalLLaMA/comments/1gg2gbk/pdf_autoscroll_video_retrieval/

[-]

Former-Ad-5757@reddit

Basically every captcha method has been solved years ago.
Just start a p*rn site and just show the captcha you want solved to the user.
Unlimited horny teenagers are the defeaters to any captcha for years.

[-]

NoIntention4050@reddit

Super easy as a user too, not annoying like others

[-]

Derefringence@reddit

select ALL the traffic lights

[-]

Acceptable_Username9@reddit

select the WEIRD CUBE

[-]

Derefringence@reddit

(they're all WEIRD)

[-]

No-Marionberry-772@reddit

Ngl, its getting to the point where I'm looking at some captchas and I start to question if I'm even human

[-]

adrianben10lam@reddit (OP)

Right?! I feel like this rising skill curve in captcha solving is hitting diminishing returns

[-]

Beautiful_Help_3853@reddit

You can use optical illusions, like moving sapes that don't actually move, or lines that appear to be different sizes.

[-]

Mundane_Ad8936@reddit

Captcha is more than just a visual puzzle, there is behavior tracking & other metadata that makes it hard for bots to circumvent it.

[-]

passinglunatic@reddit

Couldn’t you just give it a few frames?

[-]

matteogeniaccio@reddit

Very cool! Given what you wrote in the post, another approach could be quickly alternating between two colors. For example red and green. A human would see yellow.

[-]

fuckAIbruhIhateCorps@reddit

we can simplify it by prompting the user to just type the letter they see which is being drawn by the fast moving cursor, lets say the circle draws B or N in a very fast pace, without a trail, the LLM can still not comprehend what happened. Instead of making the user wait for a tiresome spiral finding hunt, we can use letters.

[-]

ThiccStorms@reddit

[-]

justicecurcian@reddit

While this should be llm proof, a simple opencv code should be able to hack it

[-]

Jumper775-2@reddit

Good for now, but know this is a losing game.

[-]

Calcidiol@reddit

Congratulations on the net negative value to humanity. It's especially ironic that it's posted in the localllama forum. One would have thought people aware of LLMs could actually understand the benefits of personal agents over the dystopia that is the modern web which is increasingly designed literally to waste people's lives / time on nonsense.

[-]

Economy_Hippo_8107@reddit

I could see this bypassed with some javascript and math calculations to autosolve the captcha....no AI even needed

[-]

AnticitizenPrime@reddit

This is very clever!

[-]

horse1066@reddit

This might be harder if it used multiple balls of different colours, but only one of them doing a spiral? Humans would still notice

[-]

adrianben10lam@reddit (OP)

This is a good idea!

[-]