Current state of open-source ?

Posted by DarkMatter007@reddit | LocalLLaMA | View on Reddit | 10 comments

I’m trying to understand the current open-source LLM landscape beyond surface-level hype.

We all got used to the nerfed products of Claude/Geminj so I believe really in opensource as a solution.

I keep seeing models like GLM, Kimi, MiniMax, DeepSeek, Qwen, Mistral, etc., but it’s honestly hard to tell how they actually compare in practice.

A few things I’m confused about:

Where does DeepSeek stand right now? It used to be everywhere, now feels less dominant
GLM / Kimi / MiniMax are these actually toptier or just benchmark for very specific job?
Are there any real benchmarks people trust (not cherry-picked blog posts)?

What do you guys actually use in production or serious projects?

[-]

DepartmentOk9720@reddit

Deepseek is pretty good, it's cheap and can go large volumes

[-]

honestly the landscape looks crowded but in practice most teams converge on a small set based on their workload. deepseek had a moment bc of cost/perf, but consistency and integration matter more over time, so people mix it with qwen or mistral depending on the task. a lot of the others look strong on benchmarks but feel narrow or less predictable in real flows. i’d trust your own evals over public benchmarks. run your actual tasks, long context, tool use, edge cases, and see where it breaks. most “top tier” models look similar until u hit those.

[-]

fustercluck6000@reddit

I’ve been very impressed by Qwen3.5-27b, especially the Opus 4.6 distillations which have worked extremely well in production. Open-weight models are advancing a WHOLE lot faster than the blackbox, especially when you consider the difference in inference costs.

[-]

Few_Painter_5588@reddit

The Open Weight models are about a year behind the current frontier models. So there's no open weight model that can compete with Claude Opus 4.7 or even Claude Sonnet 4.6. Most Open Weight models are between GPT 5.4 Mini and Claude 4.6 Sonnet.

GLM / Kimi / MiniMax are these actually toptier or just benchmark for very specific job?

GLM, Kimi and MiniMax are great models, but they're not frontier models. GLM 5.1 is probably the best Open Weight model.

Where does DeepSeek stand right now? It used to be everywhere, now feels less dominant

Behind by quite a bit, but apparently V4 is coming soon™. They updated the model that their API serves though, and have been updating their github repo - so a launch could be imminent.

Are there any real benchmarks people trust (not cherry-picked blog posts)?

It depends on the task, a lot of benchmarks have become saturated and everyone is benchmaxxing now. For coding, SWE bench pro is a good indicator and for creative writing, EQBench is a good indicator too.

[-]

BidWestern1056@reddit

before kimi-k2.5 i felt there were no serious viable alternatives to models from anthropic/gemini/openai, but now i almost exclusively use kimi, glm-5.1, and minimax-2.7 through ollama cloud with npcsh and incognide

https://github.com/npc-worldwide/npcsh

https://github.com/npc-worldwide/incognide

i've always designed my tools to work with small open-source models too, so even for small qwen models (4b-10b) they can do a decent portion of useful shell tasks. this capability at this lower threshold will continue to improve too, the future is ours, open and local!

[-]

Medium_Chemist_4032@reddit

Ive trialled qwen3 122b q4 one day, as a work issued Claude Code replacement. Mostly comprehension of existing legacy code. It served me very well and I can foresee a future, where companies advice on using local models as a Haiku (research agent) or Sonnet replacement.

[-]

Mediocre_Doctor4712@reddit

How is it with tool calling ? Sometimes these opensource just dont get that right.

[-]

Medium_Chemist_4032@reddit

No issues. Vllm uses the same templates model's author provides. Jinja templates are native to python. I've had a few templating related issues in llama.cpp (with minimax 2.7), but those typically do get sorted out eventually. Ollama also rewrites the jinja templating engine and some small details (like a lack of a newline) tend to slip up. Once those get sorted out, tool calling works across the whole context size.

[-]

jacek2023@reddit

Do you ask about:

- models usable locally (on local setup)?

- models usable in cloud (same way as Claude/Gemini but cheaper)?

- hype (benchmarks and clickbaits)?

Because answers will be different

[-]

MengerianMango@reddit

deepseek is due for a release soon. currently they're a gen or more behind minimax/glm/kimi. i listed those in increasing order of size and ability. they're all pretty good. glm/kimi are very usable for swe work. minimax feels a bit amateurish to me, but it can sorta do stuff.