Reliable Open Source LLM as a Service
Posted by pravictor@reddit | LocalLLaMA | View on Reddit | 9 comments
Has anyone figured out a provider whose open source models (Kimi, Qwen, GLM e.t.c) can be used reliably in production.
I have tested some well known providers and they all suffer from high latency and poor uptime rendering them mostly useless for production implementation.
I am using them for an agentic workflow in production so reliability and low latency are very important for me.
Is there no provider that compares to Gemini / Claude in reliability but with open source models?
So far tested Teogether.ai and Fireworks and Groq looks like it is dying
tecneeq@reddit
You stepped into the wrong neighborhood, cloud-kid. Around here it's all about local llamas.
Ill_Fun5415@reddit
One useful evaluation is to separate model quality from workflow quality. The model may be strong, but the surrounding context, review loop, and failure recovery usually decide whether it is usable day to day.
Formal-Exam-8767@reddit
On-prem depending on your budget:
RandumbRedditor1000@reddit
Your pc
jikilan_@reddit
Can be also my pc
FriskyFennecFox@reddit
Novita is often praised by their zero data retention policy and is practically a veteran that appeared roughly at the same time Together did. If you're looking something supported by the big guys, then there's Cloudflare.
I recommend just opening OpenRouter, finding a popular model, and researching every available provider individually. Each has their pros and cons.
ttkciar@reddit
If you want reliability, you really should host on-premises.
All commercial inference providers change models (or their quantization), token caps, and price tiers without forewarning, which makes them intrinsically unreliable.
Hosting on-premises is more expensive, but provides advantages in addition to reliability like privacy, transparency, control, and future-proofing, so it's a trade-off.
1beb@reddit
Opencode (Go, Zen) seems pretty stable so far and they serve most of the frontier open source models.
FoxiPanda@reddit
/r/LocalLLaMA recommends consulting your local GPU server for reliable hosting of LLMs.