How I patched Playwright to bypass fingerprinting anti-bot

Posted by Timely-Leave-8342@reddit | Python | View on Reddit | 1 comments

I spent the last year on this because every AI web agent i built kept dying the same way. recaptcha walls, fingerprintjs flagging tampering=true, agent

hits a captcha after 2-3 actions and run dies. tried every JS-level stealth

lib i could find, ceiling was \~0.4 on recaptcha v3 no matter what.

Took me ages to figure out what was going on. every js-level patch is itself a fingerprint. you spoof the right value but Function.prototype

.toString returns the wrong shape, or descriptor flags are off, or the prototype chain has a wrong link. creepjs catches exactly this. the js

layer is what's being inspected so anything you patch there shows. so i went one floor down into gecko. canvas readback noise at the c++ side. webgl readPixels handled the same way.AnalyserNode noise for audio.

fonts measured at gfxFont rather than a measureText shim. webrtc was its own thing, srflx swap post-STUN plus a synthetic fallback through nICEr for sites that get 0 candidates back. 15 patches against mozilla-central FF150 by the end. canvas was the single biggest jump, fp pro flipped from tampering=true to tampering=false the day the per-pixel noise landed.

then i had to patch playwright itself on the python side, because even with a clean browser the playwright runtime leaves its own tells. wrapper

intercepts page.mouse.move/click and expands

them into bezier trajectories with gaussian jitter on the intermediate y points, \~10ms between waypoints. no more teleporting cursor straight to the target which is one of the easiest bot signals to catch. there's also a bayesian sampler on the python side that generates a coherent fingerprint per session (gpu, audio, fonts,\~400 fields total) seeded from a single int. runs are reproducible if you log the seed. without that, each launch gets its own coherent fingerprint,no two identical.

repo: invisible_playwright

where it is now: recaptcha v3 0.9, fp pro not_detected with vpn=false

tampering=false, creepjs 0 lies, sannysoft trivially green. each detector

catches a different inconsistency so fixing one tends to regress another,

took forever to get them all coherent at once.

curious if anyone's done similar work on gecko or webkit. chromium has way

more public material on this than firefox does.