Meanwhileee

Posted by Comfortable_Eye_7736@reddit | LocalLLaMA | View on Reddit | 13 comments

Meanwhile people are debating about frontier models such as claude models, and such.

And here i am just using minimax without any issues whatsoever, with a technical knowledge of 7 years manual programming and coding, i don’t mind hand holding the agent from time to time if it can’t solve the problem.

I am at the point that i know every model even opus can’t basically one shot everything yo asked it, it’s a marketing.

You just need to accept that you really need a coding expertise or even know how to make your projects work.

This debate towards, who is better model and such. Bro if you have a model that can one shot anything but takes 1 hour to do a simple task it’s not worth it, otherwise a fast and efficient model that can perform well not in a perfect way, but can do well is much better option.

Bottomline, it’s simple kimi k2.6, glm 5.1, minimax m2.7 and qwen, if it’s good enough to perform agentic coding then it’s a good model, no need comparison, you just need to guide it, because if you can’t then it’s not a model issue, it’s a skill issue.

[-]

rudidit09@reddit

what IDE / CLI app are you using with them?

[-]

Comfortable_Eye_7736@reddit (OP)

Hermes, i am using Hermes Agent and i think it’s using Pi. But yeah it’s efficient.

[-]

o0genesis0o@reddit

I pay for Minimax subscription as a way to show appreciation for the work they have done, and because the model is decent, and because qwen does not even sell subscription plan. But minimax has been getting on my nerve in the last week or so. They regularly has overloaded error and ask me to update to their plus, but I'm already on plus. Hopefully they just have some compute problem rather than enshitification.

Other than that, yeah, this model gets thing done and quite okay to work with. I usually let it come up with a plan for me to review, and then work from there. But most of the time I have very clear idea where and how to implement something. The model can just take it over from there.

GLM 5.1 is also good. But Z AI made a random regression on the 4.7 that I was using for my 24/7 agent a few weeks ago that completely broke the agent, so I'm a bit skeptical about them.

In an ideal world, I would get a pair of RTX6000 and solar panels and run M2.7 locally with full context. That's literally all the AI capability that I need right now.

[-]

Enthu-Cutlet-1337@reddit

context packing is the actual variable nobody benchmarks. 32k well-managed beats 128k used sloppily. minimax m2.7 and qwen3-30b both Q4_K_M at 24GB hold coherence across long agentic runs. that's the real differentiator, not the leaderboard position.

[-]

blackhawk00001@reddit

I’ve been enjoying qwen 3.6 27B and 35B a3b as the backend for Claude code cli. They are a far stretch from opus but I can work around any shortcomings.

[-]

mr_zerolith@reddit

The more you know the subject, the less impressive AI is :)

[-]

Kahvana@reddit

Yeah Qwen3.5 has genuinely been amazing, Qwen3.6 blows my mind. I’ve only been using LLMs since march 2025. Qwen3 really wasn’t up to this level, I did not dare to dream we would get this quality this soon.

I wonder how many have real a software engineering degree (or akin) and public/professional experience under their belt before using LLMs.

Even Qwen3.5-35B-A3B can do complex tasks, as long as you only let it handle bite-sized tasks (okay, rewrite this loop to use ParallelFor) one step at a time.

In the end you are the “plan” function; It can help break down the problem, but you must still attest and steer it. Knowing when something goes wrong only comes from previous experience without the tool.

I’ve found the cloud models to be a much bigger pain than local ones. GPT Codex is obsessed with smoke tests, claude does the least work to it’s own detriment, and neither wants to admit when they can’t do something. Qwen3.5 admits it, helping me take a step back and think how to approach the issue differently instead of staying stuck.

[-]

Long_comment_san@reddit

I am curious to what the fuck are these datacenters being built for.

[-]

ambient_temp_xeno@reddit

I think that apart from the 'testing' aspect of it, the whole 'one-shot' thing was because 4k or 8k if you were lucky was not a lot of context to work with.

[-]

megadonkeyx@reddit

they all have their strengths, ive seen minimax outsmart gpt5.4 / codex with its "im going one level deeper".

have also seen minimax go off the rails, yes its all about guidance.

rule #1 logs, logs and more logs, share the logs with the LLM.

[-]

Comfortable_Eye_7736@reddit (OP)

Know your tech, and the rest will follow. If you know what the agent is doing, when it fails to complete the task, due to unseen circumstances, or problematic approach, you easily just guide it. Then your problem is solved. Don’t let the agent do everything for you, sometimes your approach is better than the agent’s approach. Let it do the thinking but if it doesn’t think better, then do the thinking.

[-]

SourceCodeplz@reddit

Everything changed when they started training for tool calls. Now even small 4b-9b models can do agentic work.

[-]

LoveMind_AI@reddit

I'm absolutely with you. Honestly, I'm entirely grateful for Claude's stumble. Right now, Kimi K2.6 is absolutely proving itself. The port to Kimi Code was painless, and while there's some stuff I miss, what I *don't* miss was a paternalistic frontier LLM making decisions on my behalf and acting lazier and lazier every day. K2.6 is a killer model and while I'm clearly not running it locally, the knowledge that I could if I wanted to make the investment leads to a lot of peace of mind. GLM-5.1 has been better across different harnesses than Kimi, which seems to really want to be nestled into specific harnesses. The vibe of Kimi to me is basically perfect. Personable but not sycophantic, able to read between the lines but cautious, and won't just go on a wild riff and do a bunch of stuff I didn't ask for while also NOT doing the stuff I did ask for.

I think we're finally at the level where there's essentially parity between the frontier and the open source models, and you're right that we're just debating the nuances of the trade-offs. Even if not hosting locally, the financial trade-off between Kimi and Claude is... gigantic, in addition to all of the very real areas that I prefer Kimi. And honestly, Qwen 3.6 27B is just incredibly, as people have been going on and on about. When a local model is *that* strong, it really should put the fear of Gandalf into the big labs.

Everything that's been happening with Claude, from the DoW blacklist through the enshittification, really does make the point clear: open source AI is a logistical, moral necessity.