Built a complaint intelligence pipeline using category-specific BERT models — here is why one model is not enough
Posted by Serious_Damage5274@reddit | programming | View on Reddit | 2 comments
Hey r/programming,
Sharing an interesting engineering decision I made while building an AI complaint analysis system.
**The naive approach:**
Train one BERT model on all 51,000 reviews. Simple. Clean. One model to rule them all.
**The problem:**
A complaint about an Appliance:
"Motor making grinding noise after 3 months. Compressor seems faulty."
A complaint about Fashion:
"Colour faded after first wash. Stitching came apart."
A complaint about Electronics:
"Screen flickering randomly. Battery draining in 2 hours."
These are completely different vocabularies, failure modes, and complaint patterns. A single model trained on all of them learns a blurry average. Category-specific models learn the actual language of each domain.
**What I built:**
7 separate BERT models — one per product category.
```python
# Simplified training loop per category
for category in categories:
df_cat = df[df['category'] == category]
df_balanced = balance_classes(df_cat)
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
train(model, df_balanced)
model.save_pretrained(f'models/{category}_bert')
```
**Results:**
- Electronics — 100% accuracy (5,870 rows)
- Appliances — 99% accuracy (27,314 rows)
- Home — 100% accuracy (2,720 rows)
- Fashion — 96% accuracy (2,244 rows)
**Interesting engineering challenges:**
-
Class imbalance — most reviews are positive. Fixed with equal sampling per class before training.
-
Memory on Mac CPU — 27K row Appliances model took 45 mins. Batch size of 16 was the sweet spot.
-
UNEXPECTED keys warning — normal when loading bert-base-uncased for classification. The pre-training MLM head gets replaced by a classification head. Safe to ignore.
-
Saving mid-epoch — save the best model per epoch using validation accuracy, not just the last epoch.
**Next step:**
CrewAI multi-agent layer on top — Classifier Agent, Pattern Agent, Priority Agent, Resolution Agent.
**GitHub:** github.com/niteshnankani-svg
Happy to discuss the category-specific vs single model tradeoff or the BERT fine-tuning pipeline.
programming-ModTeam@reddit
r/programming is not a place to post your project, get feedback, ask for help, or promote your startup.
Technical write-ups on what makes a project technically challenging, interesting, or educational are allowed and encouraged, but just a link to a GitHub page or a list of features is not allowed.
The technical write-up must be the focus of the post, not just a tickbox-checking exercise to get us to allow it. This is a technical subreddit.
We don't care what you built, we care how you build it.
mistahspecs@reddit
Wow you're so bad at this that you couldn't even get you LLM to generate this post correctly