I tested structured output from 288 LLM calls and logged every way JSON breaks. Here's what I found

Posted by kexxty@reddit | Python | View on Reddit | 49 comments

I've been building Python services that consume LLM output for the past few years, and I kept accumulating the same pile of regex fixups for broken JSON in every project. Markdown fences, trailing commas, Python booleans inside JSON, truncated objects, unescaped quotes, the usual.

Instead of keeping a private junk drawer of string manipulations, I decided to actually study the problem. Ran structured output prompts through 288 model calls across every major provider and catalogued what breaks, how often, and whether the failure modes are consistent across model families. (Spoiler: they are. Weirdly consistent.)

Wrote it up here: What Breaks When You Ask an LLM for JSON

The article covers:

The findings eventually turned into a library (outputguard), but the article stands on its own if you just want to understand the failure modes. Curious if other people are seeing the same patterns.