An LLM-Proof Approach to Reinventing Captcha Systems
Posted by adrianben10lam@reddit | LocalLLaMA | View on Reddit | 44 comments
After Claude's computer-use came out, it got me thinking: what happens to CAPTCHAs when LLMs can be prompted to act like humans too?
While studying how Claude processes visual info, I noticed something interesting - AI sees things frame-by-frame, but humans naturally experience motion blur. So I built a CAPTCHA that uses this human quirk to stay AI-resistant. I thought this was a fun experiment so I wanted to share this with y'all!
RikuDesu@reddit
yeah a lot of the maplestory korea anti cheat systems and MMO bot detectors would really give LLMs a lot of trouble.
GhostOfaBotInPants@reddit
Games for captcha is great idea. It's so much better than guessing the swirly character is a o O 0 or big fat 6 with a small head.
Dead_Internet_Theory@reddit
or worse, those that have an l / I.
DeltaSqueezer@reddit
Just ask how many 'r's in the following word...
Dead_Internet_Theory@reddit
o1-preview: "My chain of thought contains 496 letters 'r', and it cost $3.29 for the complete analysis"
gemini-pro: "I'm sorry, but the R word, and 'hard R' are both dehumanizing language, thus we had to terminate your Gmail account for repeated violations of our terms of service"
Claude Sonnet: Artifact: "Analysis of the letter R" (click to view more).
shroddy@reddit
Claude might be not that great, it often does not see things that Gpt or Gemini or often even Molmo or InternVL2 see
Dead_Internet_Theory@reddit
it's not just about seeing, but high framerate video capture and motion analysis. Still too difficult for LLMs.
KTibow@reddit
Cool, but...
This is both impossible for blind users and easily circumvented if it's imemented on the client side
water_bottle_goggles@reddit
skill issue
Dead_Internet_Theory@reddit
If you can't see... like, just open your eyes.
CodeMurmurer@reddit
Well, don't we already have a id system for blind users where they won't be prompted for a CAPTCHA?
Sudden-Lingonberry-8@reddit
disabled people are not allowed on the Internet didn't you get the memo /s
CodeMurmurer@reddit
Can't you read?
noneabove1182@reddit
If nothing else this would definitely buy some time! Vision models would have to be incredibly advanced in order to solve this, basically real time, great idea!
Qual_@reddit
Dead_Internet_Theory@reddit
I assume... you click on "E"? Is that it?
gofiend@reddit
I guess if you were to build some sort of custom image capture to average out frames you'd defeat this pretty quickly, but it raises the bar quite a bit on attackers. Good stuff!
adrianben10lam@reddit (OP)
Yeah that's definitely a consideration we had. I can imagine potentially sporadic frames might be interesting, but we're gonna need much more complex solutions for this growing problem.
remixer_dec@reddit
I hate it, feels like a machine playing cat-vs-laser game on humans, imagine solving this multiple times a day
kulchacop@reddit
Something tells me this is going to be defeated soon
https://www.reddit.com/r/LocalLLaMA/comments/1gg2gbk/pdf_autoscroll_video_retrieval/
Former-Ad-5757@reddit
Basically every captcha method has been solved years ago.
Just start a p*rn site and just show the captcha you want solved to the user.
Unlimited horny teenagers are the defeaters to any captcha for years.
NoIntention4050@reddit
Super easy as a user too, not annoying like others
Derefringence@reddit
select ALL the traffic lights
Acceptable_Username9@reddit
select the WEIRD CUBE
Derefringence@reddit
(they're all WEIRD)
No-Marionberry-772@reddit
Ngl, its getting to the point where I'm looking at some captchas and I start to question if I'm even human
adrianben10lam@reddit (OP)
Right?! I feel like this rising skill curve in captcha solving is hitting diminishing returns
Beautiful_Help_3853@reddit
You can use optical illusions, like moving sapes that don't actually move, or lines that appear to be different sizes.
Mundane_Ad8936@reddit
Captcha is more than just a visual puzzle, there is behavior tracking & other metadata that makes it hard for bots to circumvent it.
No-Refrigerator-1672@reddit
I think this solution is a no-go. I can very easily imagine epileptic people getting sick of this test, as well as I suspect people with slow reaction will fail it miserably (like elders, brain damage patients, etc). I highly doubt that this concept is suitable for random audience, meanwhile defeating it is as easy as capturing multiple frames and combining them into one.
passinglunatic@reddit
Couldn’t you just give it a few frames?
matteogeniaccio@reddit
Very cool! Given what you wrote in the post, another approach could be quickly alternating between two colors. For example red and green. A human would see yellow.
fuckAIbruhIhateCorps@reddit
we can simplify it by prompting the user to just type the letter they see which is being drawn by the fast moving cursor, lets say the circle draws B or N in a very fast pace, without a trail, the LLM can still not comprehend what happened. Instead of making the user wait for a tiresome spiral finding hunt, we can use letters.
ThiccStorms@reddit
we can simplify it by prompting the user to just type the letter they see which is being drawn by the fast moving cursor, lets say the circle draws B or N in a very fast pace, without a trail, the LLM can still not comprehend what happened. Instead of making the user wait for a tiresome spiral finding hunt, we can use letters.
justicecurcian@reddit
While this should be llm proof, a simple opencv code should be able to hack it
Jumper775-2@reddit
Good for now, but know this is a losing game.
Calcidiol@reddit
Congratulations on the net negative value to humanity. It's especially ironic that it's posted in the localllama forum. One would have thought people aware of LLMs could actually understand the benefits of personal agents over the dystopia that is the modern web which is increasingly designed literally to waste people's lives / time on nonsense.
Economy_Hippo_8107@reddit
I could see this bypassed with some javascript and math calculations to autosolve the captcha....no AI even needed
AnticitizenPrime@reddit
This is very clever!
horse1066@reddit
This might be harder if it used multiple balls of different colours, but only one of them doing a spiral? Humans would still notice
adrianben10lam@reddit (OP)
This is a good idea!
TubasAreFun@reddit
Many present models can take in multi-image input, and I’m willing to bet those may solve this in the near future if not now. It would also require Claude or similar controls to pass in a window of screencaptures not just the most recent frame
gofiend@reddit
This is both clever and very cleanly executed. Kudos!
benthecoderX@reddit
this is super cool!