I built a $1.12/page AI landing page builder that uses two models instead of one

Posted by ioanastro@reddit | LocalLLaMA | View on Reddit | 7 comments

I'm a principal product designer. I got tired of paying $2-3 per page on AI builders, only to get broken output I couldn't edit without burning more credits.

So I built cozonac — a browser-based tool that splits the work between two AI models:

Claude Opus reads your prompt and reference screenshots, extracts typography/spacing/color into a structured "Design DNA," and writes the build plan
A local model on your GPU (Qwen3-Coder or Gemma 4 via Ollama) executes the plan at 170-220 tok/s. Cost: $0.00

When the local model fails — missing section, broken layout — Opus catches it automatically. Screenshots the output, compares to reference, writes a surgical fix. No re-prompting.

After the build, you edit visually. Click text to change it. Property panel for CSS. One-click WCAG accessibility checker. Version history with cherry-pick. Export to clean HTML + ZIP.

Total cost for a 13-section landing page with working calculator, email capture, video backgrounds: $1.12.

Would love feedback on the architecture — especially from anyone running local models.

[-]

Yes-Scale-9723@reddit

Cool but the local gpu cost is not $0.00. You actually have to divide the cost of the hardware by the total time you are using it. Many local LLM servers are unused most of of the time, so each minute is costing you something like 5 dollars or more. If you add the configuration and maintenance time it can easily reach 10 dollars per minute.

I mean, you can achieve the same result by using Deepseek as the second model.

[-]

ioanastro@reddit (OP)

many local model aren't as good each minute is costing $5 not sure how's that cost and $10, electricity cost is not as much

[-]

gitsad@reddit

I'm sorry but you mentioned:
"local model on your GPU (Qwen3-Coder or Gemma 4 via Ollama) executes the plan at 170-220 tok/s."

What GPU you have? From this statement I guess it's way much better than average people. To have this tps on local model I need some RTX 4090 minimum and I'm still not sure if this would be achievable with eq. Gemma 4 (because I guess we need bigger model than smaller to achieve any complex layout understanding, even for landing pages)

That's why cost of this GPU makes this optimization not very optimistic

[-]

ioanastro@reddit (OP)

amazing comment, yes i have a 5090 to run that at that speed, Gemma 4 is slow and does run on a 5090 here's the kicker the GPU can be rented for $1 an hour a 5090, or 2x5090 for around $1.3 per hour and blackwell rtx 96gb monster for $2h per hour to code...I also tried coding on 2x5090 (my local machine) using Llama, Gemma and Qwen3-Coder:80b and those kill.

Opus will understand the complex stuff and it doesn't cost that much \~$0.3 the local model just doesn't need to halucinate.

[-]

ShengrenR@reddit

The 80b is qwen3-coder-next. Qwen3-coder is 480b

[-]

ioanastro@reddit (OP)

also Qwen3-coder:480B-cloud is free to use with Ollama, slow but free and there is the 30B as a local model that can run on 24GB Vram device and

[-]

ioanastro@reddit (OP)

100% correct, it is Qwen3-coder-Next, it's been a long day