CI pipeline, overkill or a stable foundation?
Posted by MuditaPilot@reddit | Python | View on Reddit | 15 comments
I'm using Claude to vibecoded a website. I have deep experience in infrastructure management, but was never a developer, other then tools that were built for configuration management or cloud deployment.
I do interact with a lot of opinionated developer leadership.
I think I have pretty reasonable guidelines for the coding agents, and I have expanded considerable on Karpathy's claude.md. Some issue I encountered made me confirm type checking, and found the agent's was severely lacking in discipline.. I have resolved all of those issues in the code base and implemented strict checking on linting and type checkers. This what my CI pipeline looks like now:
| Slot | Tool of record |
|---|---|
| Type checker (primary) | pyright |
| Type checker (cross-check) | pyrefly + mypy |
| Linter | ruff check |
| Formatter | ruff format |
| Dependency vulnerability scan | pip-audit |
| Test runner | pytest |
| SAST | Semgrep (CI) |
| Secret scan | Gitleaks + Trivy (CI) |
Overkill for what will become a production website in a month or overkill? general thoughts are welcomed.
jwpbe@reddit
just pay someone to do it correctly now, it will be cheaper than when you pay someone to unfuck it later
Motor-Ad2119@reddit
not overkill at all, especially if you're running AI generated code. The whole point is that you can't fully trust what the agent produces so the pipeline becomes your safety net
I'd question pyright + pyrefly + mypy together. That's three type checkers which is probably redundant. Pyright alone is solid, drop the others unless you have a specific reason. Everything else is reasonable for prod
PinkSlugger@reddit
This is what happens when infra engineers apply their mindset to code quality — and that's actually a good thing. Multiple layers of validation beats zero layers of validation. I've seen enough agent-generated Python hit production with subtle type bugs that passed review to agree with the multi-checker reasoning.
Two practical suggestions: 1) pyrefly + mypy is redundant overlap; pick one to cross-check pyright. 2) pip-audit + Gitleaks + Trivy on every PR is where the pipeline gets slow — move those to a merge-to-main or nightly job. Blocking PRs on security scans means either you merge everything anyway or you slow iteration to a crawl, neither of which is what you want from a solo vibe-coded project.
cidy0983@reddit
The triple type-checker stack makes more sense than it looks when Claude is generating your code. LLMs pattern-match confidently without actually tracking type invariants — pyright catches most of it, but running mypy or pyrefly alongside specifically catches the cases where pyright makes plausible-but-wrong inferences about generics or overloaded callables.
Whether the overhead is worth it depends on how much of the codebase is agent-generated and how complex the type landscape is. For infra-management code with async pipelines and config objects, I'd keep it. If CI time becomes a problem, push pyrefly + mypy to a separate slow-CI job and only block merges on pyright.
The rest of the stack looks solid. Security scanning + SAST are worth keeping regardless of what else gets trimmed.
student_03072003@reddit
Not overkill at all — this is what production-grade engineering looks like.
Strict typing, linting, security scans, and CI checks exposing weak AI-generated code is exactly why these tools matter. Honestly, this setup is more disciplined than many teams shipping real products today.
Zouden@reddit
em dash detected
90rk1@reddit
As an infra engineer, I suggest swapping pip and pip-audit for uv and uv audit. They are much faster, which means quicker pipelines for your team.
Also you don't really need to run vuln and secret scans for every pipeline. maybe at staging, maybe when some files (like requirements.txt, pyproject.toml or uv.lock) changes.
student_03072003@reddit
This honestly isn’t overkill — it’s good engineering. Most AI-generated projects lack proper discipline, but your pipeline shows you’re treating it like a real production system from day one.
BeamMeUpBiscotti@reddit
Normally the use case for running multiple type checkers is when you have a library that is used by other people, and you want to make sure it works regardless of what type checker they're using.
One thing to be careful about here is that when type checkers disagree on something it could confuse the agent.
InternationalPop8482@reddit
I have absolutely no experience building frontend with python, unless you're talking about backend?
If you're using AI, why do with Python?
If you want industry standards and something you'd actually like to understand/debug, go with TS, Vite, Tailwind, MUI, etc. It's popular for a reason. TS is just fantastic.
MuditaPilot@reddit (OP)
Backend is python
AstroPhysician@reddit
Who said anything about frontend? Theyre obviously doing backend. You cannot code a frontend in Python that statement doesn’t even make sense
shibbypwn@reddit
Just because you’re not running python in the browser doesn’t mean you can’t write python for the front end.
Is it the best tool? Probably not, unless your project is very simple and you want to keep it in python.
But it’s certainly doable, and there are entire libraries that wrap JS frameworks (like React components) in python.
InternationalPop8482@reddit
Yes you can. I don't even need to look at libraries to know you can create a full website, front and back end, using python. Having built dozens of full stack applications, I'm confident in this.
bishopExportMine@reddit
https://kerrick.blog/articles/2025/ship-software-that-does-nothing/