Jupyter notebooks touching production data are application code from a security standpoint

Posted by UnhappyPay2752@reddit | Python | View on Reddit | 15 comments

Started auditing how our data team works and the security picture was worse than expected. Notebooks querying production databases directly, credentials hardcoded in cells because environment variable setup felt like friction, code that's been copied between notebooks so many times the original author is impossible to trace.

None of it goes through any review process that the engineering team's code goes through. No SAST, no security-minded PR review, no scanning of any kind. The assumption seems to be that notebooks are exploratory and therefore informal, but at some point exploratory code started running against production data with production access and that distinction stopped meaning anything.

These notebooks often have broader data access than the application code because the people writing them needed to move fast and used their own credentials. That access never got revisited.