How I patched Playwright to bypass fingerprinting anti-bot
Posted by Timely-Leave-8342@reddit | Python | View on Reddit | 1 comments
I spent the last year on this because every AI web agent i built kept dying the same way. recaptcha walls, fingerprintjs flagging tampering=true, agent
hits a captcha after 2-3 actions and run dies. tried every JS-level stealth
lib i could find, ceiling was \~0.4 on recaptcha v3 no matter what.
Took me ages to figure out what was going on. every js-level patch is itself a fingerprint. you spoof the right value but Function.prototype
.toString returns the wrong shape, or descriptor flags are off, or the prototype chain has a wrong link. creepjs catches exactly this. the js
layer is what's being inspected so anything you patch there shows. so i went one floor down into gecko. canvas readback noise at the c++ side. webgl readPixels handled the same way.AnalyserNode noise for audio.
fonts measured at gfxFont rather than a measureText shim. webrtc was its own thing, srflx swap post-STUN plus a synthetic fallback through nICEr for sites that get 0 candidates back. 15 patches against mozilla-central FF150 by the end. canvas was the single biggest jump, fp pro flipped from tampering=true to tampering=false the day the per-pixel noise landed.
then i had to patch playwright itself on the python side, because even with a clean browser the playwright runtime leaves its own tells. wrapper
intercepts page.mouse.move/click and expands
them into bezier trajectories with gaussian jitter on the intermediate y points, \~10ms between waypoints. no more teleporting cursor straight to the target which is one of the easiest bot signals to catch. there's also a bayesian sampler on the python side that generates a coherent fingerprint per session (gpu, audio, fonts,\~400 fields total) seeded from a single int. runs are reproducible if you log the seed. without that, each launch gets its own coherent fingerprint,no two identical.
repo: invisible_playwright
where it is now: recaptcha v3 0.9, fp pro not_detected with vpn=false
tampering=false, creepjs 0 lies, sannysoft trivially green. each detector
catches a different inconsistency so fixing one tends to regress another,
took forever to get them all coherent at once.
curious if anyone's done similar work on gecko or webkit. chromium has way
more public material on this than firefox does.
AutoModerator@reddit
Your submission has been automatically queued for manual review by the moderation team because it has been reported too many times.
Please wait until the moderation team reviews your post.
I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.