Llama 3.1 70B handles German e-commerce queries surprisingly well — multi-agent shopping assistant results

Posted by m3m3o@reddit | LocalLLaMA | View on Reddit | 10 comments

Llama 3.1 70B handles German e-commerce queries surprisingly well — multi-agent shopping assistant results

I built a multi-agent shopping assistant using NVIDIA's retail blueprint + Shopware 6 (European e-commerce platform). Wanted to share some observations about Llama 3.1 70B Instruct in a multilingual context.

Setup: 5 LangGraph agents, Llama 3.1 70B via NVIDIA Cloud API (integrate.api.nvidia.com), Milvus vector search, NeMo Guardrails.

Multilingual findings:

Intent classification works cross-language. The Planner agent uses an English routing prompt but correctly classifies German queries like "Zeig mir rote Kleider unter 100 Franken" (show me red dresses under 100 CHF). No German routing prompt needed.

Chatter prompt needs explicit bilingual instruction. Without it, the model responds in whatever language the system prompt is in, ignoring the query language. Adding "Respond in the same language the customer used" fixed this.

NeMo Guardrails are English-tuned. German fashion terms triggered false positives. "Killer-Heels" (common German fashion term) got flagged as unsafe. If you're deploying for non-English markets, plan for guardrails calibration.

Self-hosting question: For Swiss data residency (DSG compliance), you'd need self-hosted NIMs instead of NVIDIA Cloud API. H100 GPUs run ~$2-4/hr per GPU on Lambda/Vast.ai. Has anyone here self-hosted the NVIDIA NIM containers for Llama 3.1 70B? Curious about real-world RAM/VRAM requirements.

Full write-up: https://mehmetgoekce.substack.com/p/i-connected-nvidias-multi-agent-shopping