I just realised how good GLM 5 is
Posted by CrimsonShikabane@reddit | LocalLLaMA | View on Reddit | 148 comments
This is crazy. As a heavy Claude code user, who has used over 12 billion tokens in the last few months, and never tried local coding, I finally decided to try OpenCode with the Zen plan and GLM 5.
Initially tried Kimi K2.5 but it was not good at all.
Did a test to see how far 1-2 prompts could get me with GLM 5 versus the same prompt in Claude Code.
First task, a simple dashboard inventory tracker. About equal although Claude code with opus 4.6 came out ahead.
Then I ran a harder task. Real time chat application with web socket.
Much to my surprise, GLM comes out ahead. Claude code first shot doesn’t even have working streaming. Requires a page refresh to see messages.
GLM scores way higher on my criteria.
Write detailed feedback to Claude and GLM on what to fix.
GLM still comes out better after the changes.
Am I tripping here or what? GLM better than Claude code on any task is crazy.
Does anyone here have some difficult coding tasks that can showcase the real gap between these two models or is GLM 5 just that good.
JohnSnowHenry@reddit
For me GLM it’s almost useless for Unreal Engine, but even Claude sonnet makes everything I need nicely :)
maladiusdev@reddit
Are you using it for C++ or trying to get it to control the editor? I'm using Codex and it's great for C++ but it burns through quota too fast so I'm looking for a backup option.
Powerful_Froyo1727@reddit
I've built an adapter for Claude Code that uses the leaked code and integrates glm5 as the core brain — pretty creative, if I do say so myself! 😂
NewtMurky@reddit
If only there were a good GLM-5 provider with a coding plan…
brabdi@reddit
ollama cloud pro has been pretty good in my experience
AcidicAttorney@reddit
Alibaba Cloud's pretty good.
look@reddit
I’d not be shocked if Alibaba is running 2-bit quants on their coding plan models. Personally, I found it complete waste of my $5.
r3mp3y3k@reddit
I'm currently subscribe to Alibaba Coding Plan. I don't know if they somehow tuning down the models but I found GLM-5 is still better than any models they have provided in their coding plan.
look@reddit
Yeah, GLM-5 held up the best in my tests on the Alibaba plan. It didn’t go full psychotic like the other models did, but some head-to-head comparisons I did with it against other providers’s GLM-5s still showed significant degradation in its thinking output. It was like theirs was drunk or stoned, losing its train of thought repeatedly and coming back to it eventually.
estimated1@reddit
Just to give another option: we (Neuralwatt) just started offering our hosted inference. We've been focused more on an "energy pricing" model but feel pretty confident about the throughput of the models we're hosting. Our base subscription is $20 and we don't really have rate limits, just focused on energy consumption. I'd be happy to give some free credits in exchange for some feedback if there is interest. Please DM me! (https://portal.neuralwatt.com).
Also, we serve GLM-5 with solid throughput (IMO)
We also have a virtual endpoint (GLM-5-Fast) that turns off reasoning for fast agentic scenarios.
NewtMurky@reddit
Do I understand correctly, that you don't support caching? At least, I don't see it mentioned on pricing page.
estimated1@reddit
We do support caching; we need to make that more clear. With a cache hit the energy cost is 0, so you will just see much reduced energy cost of those requests. We expose the caching data in the result, but we don't put it in the UI on the dashboard -- I'll file a bug to do this and provide more info about our caching support.
GreenGreasyGreasels@reddit
This is interesting. How do you make money if you are paying only for electricity? What about hosting charges and the servers? How do turn a profit and be sustainable long term?
is no discount on input cache etc. is how you make your money?
The playground tests seems to be fast enough. Do you serve quantized or full fat models (16bit for GLM-5 and 4-bit for K2.5 and so on)?
estimated1@reddit
We bake infra costs into pricing. The difference is: inference gets cheaper at scale (batching, higher GPU utilization → lower energy/request).
Instead of keeping that as margin, we pass it through. So over time you get more tokens per kWh.
That’s the core idea behind energy pricing. This is all built upon our core tech which provides increased energy efficiency for GPUs/inference. We will license that to other hyperscalers/neoclouds as well to make inference more energy efficient.
Superb_Onion8227@reddit
Why not do energy trading at the same time? You could save ppl's money by buying cheap energy
estimated1@reddit
oh sorry, for the other questions:
For GLM-5 it's FP8 and for K2.5 it is IN4. We don't do any of our own quantizations (yet).
TheMisterPirate@reddit
this is interesting to me, so its similar to openrouter but billing is by energy usage rather than tokens? its hard for me to understand how much usage I'd actually get for $20/mo or even if I pay per kwh. I think that would be a good thing to add to your website. Like, if I use this model for X hours how much would it actually cost, both in kwh and $, since some models are more efficient right?
I'd be interested in trying it out if the rates are good.
estimated1@reddit
Thanks for the feedback u/TheMisterPirate . I agree having some sort of calculator would be a good thing to help people understand. I think our method *does* enable much more inference per $ than other methods but we have work to do to present this more clearly. I'd be happy to grant some credits if you created an account in exchange for more feedback (what we're really eager for at this stage).
TheMisterPirate@reddit
sure, I'll DM you.
rudkws@reddit
I have an Ollama subscription. I am using the cloud version - it's good enough. Quite often, it finds a better solution than Claude, which is interesting.
davernow@reddit
Can’t tell if sarcastic. Z.ai coder is the best $27 I spend a month. Can easily put a billion tokens through it
17hoehbr@reddit
I bought a year of the lite plan during the black Friday sale but ever since GLM 5 came out it feels like they really dumbed down GLM 4.7, and of course GLM 5 is paywalled behind the pro plan.
imonlysmarterthanyou@reddit
Their GLM pro isn’t that much better oddly, but much slower.
AldoEliacim@reddit
Try OpenCode Go, it's providing a decent amount of GLM-5 usage, I've been looping it around to write tests for my Opus code
harrro@reddit
I keep hearing Opencode Zen/Go's GLM is heavily quantized
AldoEliacim@reddit
Not really, I haven't tested on another provider to compare it, but it does it's job.
Maybe it is quantized, but their plan is really cheap at $10 and they have an offer right now for only $5
hawseepoo@reddit
I just use it on Fireworks AI, pretty cheap per 1M tokens
vramkickedin@reddit
GLM5 is stupid underrated. It hallucinates less than the top dogs (chatgpt, claude, etc). Very good at following small details. In some cases, it will tell you "no" and it will explain why (unless you have a logical use case or explanation).
Its very good at paraphrasing content, spotting SEO hiccups and of course, very tight with coding.
ProfessionalSpend589@reddit
I've had similar feelings for smaller models like MiniMax M2.5 in Q6 (unsloth) and Qwen 3 235b in similar quant. People prized MiniMax, but Qwen just worked for me (and was better for lyrics and songs).
lookwatchlistenplay@reddit
That's great, ProfessionalSpend.
twack3r@reddit
Please leave this sub.
lookwatchlistenplay@reddit
And your problem with the words, "That's great", is...
Emotional-Baker-490@reddit
Thats great, lookwatchlistenplay.
lebed2045@reddit
whil it's cool if true, i think people should stop measure quality of ai coders by asking them to building tempalet-like projects from scratch. it's much more reliasting to ask to fix some bugs in existing big code bases.
EffectiveCeilingFan@reddit
How in the world do you use 12B tokens?? In an entire year, I doubt I will reach 1B, and I use vibe coding daily.
In order to use 12B tokens in six months of work, you’d need to be using 771 tokens per second every single second of the day, including at night. There’s no way.
temperature_5@reddit
This is why all the coding plans end up getting capped and rate limited. People just abuse the hell out of them, running multiple instances with multiple sub-agents simultaneously, or setting them up to just constantly poll github for issues, or even backdooring them for production inference.
BannedGoNext@reddit
If a coding plan offers X usage and you use X usage how is that abuse? The problem is that the plans are trying to utilize the phone system/modem internet provider type model where they expect most people to use very little and the heavy users to be subsidized, but the few people they can get to subscribe at all are voracious.
nakedspirax@reddit
And then the project gets sidelined haha
emprahsFury@reddit
Abuse? What? Inference time compute/scaling/buzzword was sold for years as the solution. These companies sold "tokens are cheap" for actual years. Anthropic & OpenAI only have themselves to blame. We only have them to blame, cause we're using their best advice.
Simple_Split5074@reddit
Most of that will be cached input tokens which can get to a million in a minute or two with tool calls and half filled context without even trying hard.
rosstafarien@reddit
Cached input tokens shouldn't be counted in your usage.
Simple_Split5074@reddit
According to whom? I think *all* providers do count them.
EffectiveCeilingFan@reddit
Eh. That’s still RAM that you’re taking up. I think the 1/10th cost that most providers do for cached input is fairly reasonable. What’s ridiculous are the providers that don’t offer any discount and just cache transparently, taking all the cost savings for themselves, or the ones that make you pay extra to use the cache (i.e. Anthropic).
lemondrops9@reddit
I've only done local so that seems crazy to me too. But many have said that they do.
ConSemaforos@reddit
I'm a hobbyist at home and can burn through 5 million easily within a day. I imagine that someone doing it full time and blow through 12B
EffectiveCeilingFan@reddit
Let’s say they use GLM-5 full-time from the moment it’s released right up until now. That’s 35 days. So, 400M tokens per day. That’s 80x what you’re burning through, every single day, with no breaks at all. OP has probably vibe-coded seven SaaS startups by now.
IrisColt@reddit
heh
ConSemaforos@reddit
Heck yeah! I love it. It's opened up so much. I've built 6 websites for local businesses and am working on two apps for local nonprofits. I used to spend an hour just reading docs trying to get Firebase set up. Now I can literally speak a command, and it's done for me.
mr_Owner@reddit
Ease with subagents and multi sessions
vinigrae@reddit
Agentic systems
florinandrei@reddit
Human-powered Ralph loop.
arman-d0e@reddit
I alone am responsible for 2 Billion tokens of usage on hunter alpha lol
klawisnotwashed@reddit
About a year ago i was hitting 2b tokens in CC coding my ass off 15+ hours a day with multiple REPLs open, about 99% of my tokens were input cached… no idea how bro is doing 12b
Accomplished-Bird829@reddit
i have used GLM5 from day 1 i used it to code do stuff and anything its excellent thanks to z.ai i have more time to enjoy my small tools
qubridInc@reddit
GLM-5 is genuinely strong, especially for structured coding + execution tasks. It can sometimes outperform Claude on specific implementations.
But on complex systems, edge cases, and long-term reasoning, Claude still tends to be more consistent
Exciting_Garden2535@reddit
I have a feeling that Opus 4.6 has become stupider than it was initially. Or maybe not exactly stupider, but more lazy. It skips requirements, does more careless work, and even argues: when I asked it to fix its own error, it spent time proving that the error was made in a previous session, not during this feature implementation.
wouldacouldashoulda@reddit
I wonder if it might be related to the larger context window? Like it cant deal with it.
m0j0m0j@reddit
This happens to me when the context is short
salomo926@reddit
Like a real programmer
anon377362@reddit
There was a Reddit post saying that the Opus 4.6 system prompt was updated to tell it to not think things through as much or something like that (keep answers brief).
So I don’t think it was a change to the model itself, just the prompt.
Personally I think if the system prompt is updated then it should count as a new minor/patch version bump as it can really affect the performance.
So Opus:
4.6.0 -> great
4.6.1 -> not as great.
Easy-Unit2087@reddit
I thought it was just me. Maybe (just guessing here) due to capacity problems they nerfed Opus 4.6 a bit. Codex 5.4 is doing better rn.
Fast-Satisfaction482@reddit
Since yesterday, 5.4 started annoying me with not fulfilling the requirements and then "next I could do this or that to actually make it work. Shall I?". Super annoying stuff.
mrtie007@reddit
i find i have to interrupt its thinking more lately because it's inventing/going on silly sidequests
Goldkoron@reddit
I wouldn't put it past Anthropic to intentionally nerf opus during periods where they think they are being distill attacked.
NoahFect@reddit
They are definitely having capacity issues at the moment. I'm getting a lot of "Overloaded" and unspecified "Internal server errors" in Claude CLI right now.
It also seems less... relentless than usual, somehow. I've had to prompt it to go back and finish subtasks it gave up on.
-dysangel-@reddit
Yeah that was my experience when I tried it again the other day. It kept asking "want me to do this?" instead of just doing things like GLM does. It makes sense that if they're having capacity issues that they'd train it to stop more often
aeroumbria@reddit
I'm not familiar with large model deployments, but I wonder if you can do something like increasing KV cache match tolerance to reduce server load, so not-so-matched prefixes will end up using the same cache? I can see how such a saving measure could lead to strange, unpredictable behaviours.
Easy-Unit2087@reddit
My flow has audits from Codex 5.4, Opus 4.6 and Qwen 397b (local) on each other's work after each phase and Codex is finding more things to fix in Opus than usual. Also noticed a few API unreachable errors with Claude today.
ansmo@reddit
I knew it could just be me! On medium and low effort especially, it takes those instructions literally. On max effort it still seems to be getting the job done, just at thrice the price. If GLM 5 was hosted at a usable speed, I'd definitely consider switching. Though now that I'm getting used to the 1M context window and less than a fifth of the previous time spent compacting and summarizing, it would be pretty hard to go back. My only hope is that the degradation in Opus 4.6 signals the imminent release of new models.
nekmatu@reddit
I’ve felt this too. It was really good and then got … I like your word lazier. It was super sharp at first.
Disposable110@reddit
Same experience, it oneshot an entire roguelike game before and now it can barely implement two features without me handholding it.
Infamous-Crew1710@reddit
Lol at the arguing
suddenlypandabear@reddit
I wonder if they have multiple variants of the opus models at different sizes, to scale down load at different times rather than just rate limit.
Ell2509@reddit
I noticed that too.
Dazzling_Focus_6993@reddit
That means, anthroponic will release 4.7 opus soon (rebranded 4.5 opus)
oodelay@reddit
Token greedy?
Vlyn@reddit
Sorry, but what does this have to do with LocalLLaMA?
You didn't run anything locally, you just switched to a different provider/model.
Spectrum1523@reddit
The model they switched to can be run locally so
Vlyn@reddit
Can you run a 744B model locally? (:
Spectrum1523@reddit
Sure - although itll be quanted to 1.8bits or very very slow
Vlyn@reddit
So 180+ GB of memory for a lobotomized model that runs as fast as a snail, gz.
These simply aren't "local" models, except you're a company with an expensive server rack.
DedsPhil@reddit
If is open source it's game. Even qwen3.5 27b or 35b can't run well on 24gb vram if you need to code anything.
Electroboots@reddit
I mean, it's not technically Llama either. And there are plenty of people who can't run the larger Llamas, so by this logic this reddit should only be about Llama 3.2 1B and its various finetunes to be really 100% authentic to the name.
But that would make for a lame sub.
Vlyn@reddit
Even with just 16 GB VRAM at the moment I'm running a 24B Q4_K_M model fully on my GPU.
So limiting it down to 1B is a bit too much (:
OmarBessa@reddit
It's really good. I'm generating around 1B tokens per month and it really feels very close to opus 4.5.
The current opus is a bit nerfed these days.
RevolutionaryLime758@reddit
Lmfao you think a cloud model is local
Spectrum1523@reddit
you can run it locally, ez
RevolutionaryLime758@reddit
Btw Reddit mod I can see all your posts even if you hide them and I’m gonna make fun of you in a minute
RevolutionaryLime758@reddit
You opencode zen is local? Are you an idiot?
divide0verfl0w@reddit
GLM 5 is very good but now try Minimax 2.5 and have your mind explode.
Same bug. Same prompt. Claude Code w Opus 4.6 took 32 minutes. OpenCode w Minimax 2.5 took 8 mins.
I realized I had accidentally let Minimax 2.5 plan before execute and Claude was not in plan mode. Felt like apples ≠ oranges. So created another worktree, started Claude Code w Opus 4.6 in plan mode. Unfortunately, Claude went down a path for over 30 mins and never solved the issue.
I compared the code quality of the solutions produced. Minimax 2.5 used the correct React Router API to fix the issue. Claude Code switched to setting window.location. Something I would do back when I was junior and too stubborn learn the right paradigm for the framework.
evia89@reddit
Do you use skills like systematic debug and provide sample logs?
divide0verfl0w@reddit
I provide the npm command to run end-to-end playwright tests, which spits out logs.
I don’t use any skills. And it doesn’t matter because Minimax didn’t have skills either.
adrazzer@reddit
yeah I am really impressed with GLM-5 myself, have been running it on Ollama cloud
novalounge@reddit
I've been running one of the Unsloth quants (UD-Q3_K_XL) at home with 128k, and it's been a great general purpose home AI model.
MR_Weiner@reddit
Am I missing something or does that quant needs like 350gb vram? And you’re running it locally on what hardware?
twack3r@reddit
Not who you are asking but I have the following system and can run the same quant. Does it fly? No. Can I work with it, very much so.
TR7955WX 256GiB DDR5 6400, 8 Channels 1 RTX6000 1 5090 3 pairs of 3090s, nvlinked
MR_Weiner@reddit
Ha, man I’m such a noob in here that I forget the crazy setups some of y’all have!
relmny@reddit
udq2kxl can be run on a single 32gb vram + 128gb ram, if you don't mind less than 1.6t/s...
sshwifty@reddit
Oof, that is some "It's compiling" speed
novalounge@reddit
Sorry - M3 Ultra 512. It’s fast, still have 100gb free.
getpodapp@reddit
I’ve been using kimi k2,5 because it’s a vision model and I like to just send screenshots to my ai tools. If GLM5 is that much better than I’ll have to take a look 🤔
IrisColt@reddit
u-use c-case? genuinely intrigued...
Dany0@reddit
12 bil tokens? What have you shipped?
lookwatchlistenplay@reddit
Billionaire status.
Dany0@reddit
Another plaque from openai
Risen_from_ash@reddit
We use 10x tokens! 10x developers use 10x tokens. …ship? Yes, we shipped 10x tokens.
LoaderD@reddit
$200 to Anthropic
lookwatchlistenplay@reddit
Every 30 days or so. On repeat. Relentlessly.
segmond@reddit
GLM-5 is good. I had a coding task that KimiK2.5, Qwen3.5-397B-Q6, Qwen3CoderNext-Q8 and DeepSeekv3.2-Q6 all failed at. As in generated code that was heading towards the right idea but all bugged and none could run correctly. GLM5 at Q4 is the only model that generated code that works, not perfect, but works and is a good foundation to build on. I'm running locally and did a few multiple passes. So impressed by it that I'm now downloading Q5 and hope to upgrade my system soon to be able to run Q6.
relmny@reddit
Have you tried with deepseek-v3.1-terminus by any chance? I'm still trying to figure out if v3.2 is actually better or not (I only get about 1t/s, so can't test them much...), to which I have my doubts
ihaag@reddit
What hardware are you using ?
Blackvz@reddit
You could also check out minimax m2.5
Also a good open source model.
I would love to hear your opinion in comparison to glm5
R_Duncan@reddit
This highlights something we already know or suspect: under the hood, every model served is quantized/changed without users getting notified. Is there a new version? A distilled model? A 3 bit quantized version?
Users don't know, and the worst is that happens from yesterday to today, so you started the project with a model, and midway it become dumber and your project goes....
Conclusion: you can't trust an online service until this get addressed and a checksum of the model used isn't served as well, together with quantization and other parameters.
Vozer_bros@reddit
I spent several B last year and I would like to say, GLM-5 is incredible, but sometime the quality just drop significant due to their lack of hardware, which I do understand.
Try GLM5-turbo one bro, that one is solid.
ihaag@reddit
I find GLM really good but clouds does make prettier CSS than GLM but still GLM is my go to.
4xi0m4@reddit
I think the most useful takeaway here is that this sounds like a workload fit issue more than a clean global ranking.\n\nIf the task is concrete, tool heavy, and the feedback loop is short, GLM 5 can absolutely overperform expectations. Claude still feels stronger to me when the taIsk gets messy, under-specified, or needs better judgment during refactors.\n\nSo your result does not sound crazy. It sounds like your benchmark is rewarding a type of work that GLM handles unusually well.
4xi0m4@reddit
Probe via contenteditable div for Reddit submit test.
Emergency-Pick5679@reddit
How do you guys run this models ? OpenCode ? any way to access all the latest sota models ?
Briskfall@reddit
It actually surprised me as well. Thought that it was going to be a dud due to how much I've heard that it's "distilled."
I have a private set of questions for historical facts with "misleading" formats that usual open-source models fail in but SOTA ones don't.
Smart models would actually not get swayed by the template; while dumb ones wouldn't even bother do the search and capitulate.
GLM-5 actually was one of the rare few that passed it during a test with LMArena. (and of course, Opus 4.6 Thinking and Gemini 3.1 Pro did too)
(but some older SOTA models like 2.5 Gemini didn't though... nor did the latest versions of Grok nor mistral.)
JimJamieJames@reddit
How did Qwen3.5 do?
randomlyme@reddit
This isn’t how you do perform spec driven development testing
po_stulate@reddit
Wasn't GLM 5 focused on general chatting instead of coding?
4xi0m4@reddit
'My
Fun_Nebula_9682@reddit
GLM 5 is genuinely underrated. I've been running GLM-OCR locally on Mac Studio M2 Ultra for document processing — tables, math equations, mixed CJK text — and it handles everything at ~260 tokens/sec with just 2GB VRAM.
What surprised me most is how well it handles code-related content. I use it as part of a local pipeline where OCR output feeds into Claude Code for analysis. The combination of a fast local model for extraction + a frontier model for reasoning is way more cost-effective than sending everything to the cloud.
Have you tried it for any specific use cases beyond chat?
Own-Relationship-362@reddit
GLM 5 is surprisingly good at structured tasks too — I've been testing it for matching natural language task descriptions to structured skill files (SKILL.md format). The instruction following is solid enough that it picks up domain-specific terminology better than some of the bigger models. Not great for creative writing but for tool-use and structured reasoning it punches above its weight.
jeffwadsworth@reddit
The web version is nothing compared to the 4bit version run locally. Night and day.
agentcubed@reddit
As others say, I don't recommend using one-shots as a benchmark.
In the end, it depends on your workflow. If you are a 100% vibe coder (pls no), then maybe judging by one shots is good
cantgetthistowork@reddit
Writing fresh code is something every model does well these days. It's working with existing codebases where you see all the problems
slypheed@reddit
Let me guess; you write JS ?
fugogugo@reddit
12 billion token .. how much you spent already?
BP041@reddit
Real-time chat with websockets is actually a decent stress test because it requires getting async state management right on the first attempt. That's a different skill from code generation — it's more about the model's internal architecture of how state flows.
For harder tests that separate them: try multi-file refactoring where the context spans more than one codebase, or debugging something where the bug is in a dependency interaction rather than obvious logic. Those tend to reveal where each model's "implicit understanding" of the codebase breaks down. Claude tends to track cross-file state better in my experience, but GLM might surprise you on certain patterns.
metigue@reddit
A lot of that has to do with the agentic harness. Claude code despite being so popular is just not good. You should compare opus 4.6 and GLM in the same harness - I recommend Droid or forge code.
unltdhuevo@reddit
When it comes to following instructions GLM 5 is too good
robberviet@reddit
I would love some task on existing repo too. Also what gpu/hardware are you using at what speed?
-dysangel-@reddit
No you're not tripping. I've been using GLM Coding Plan for a while. The brief time I tried Claude again, I felt like I was babysitting vs working with a competent colleague.
Though GLM-5's coherence has been getting lower and lower. I suspect they're heavily quantising the KV cache. A few days ago it would lose it at 80k tokens, but earlier today I was getting issues even at 40k tokens. I've switched to GLM 4.7 until they work out the bugs, or unless I really need better quality planning for something
twack3r@reddit
Which is exactly why there is no logical substitue to owning your own metal and running your own local models.
Spurnout@reddit
I've been using it lately, especially while building a piece of software similar to openclaw, but I actually got better results from kimi-k2.5 which i was a bit surprised about. I've been thinking of updating the scoring though...
LargelyInnocuous@reddit
Isn't that like $50k in tokens? do you mean 12M? Or are you creating datasets for a large model and have business paying for it?
okyaygokay@reddit
Sorry but creating a websocket chat app is not a hard task
SvenVargHimmel@reddit
You're right about that. The test is a bit arbitrary. I find GLM fails in existing codebase . It's not very good with anything that's not react. It gets worse when your language is not typescript.
I find planning with opus and building with Kimi and reviewing with Gemini works well
Happythen@reddit
oi, yes it is. at least a production one.
SpicyWangz@reddit
Really depends on the level of features, but yeah. Just the bar bones is pretty underwhelming.
asria@reddit
What hardware do you have? How many t/s did you achieve?
Orlandocollins@reddit
They said opencode zen so they aren't running locally
Effective-Drawer9152@reddit
It is very very slow
johnerp@reddit
What spec machine fid you run it on, what quant etc?
lookwatchlistenplay@reddit
Too many questions. AI already answered those. Ask your AI what specs OP has, haha.
FullOf_Bad_Ideas@reddit
I don't run GLM 5 (too big) but I do use local GLM 4.7 355B in OpenCode and Claude Opus in CC. I think the difference is really big there. Way more bugs in the code with GLM. Maybe in your testing GLM 5 looked so good because of the front-end aspect. I don't do front end. I think Zhipu focused on web dev so it should shine there. GLM 5 is pretty high up on the DesignArena.