TheaterFire

GLM 4.7 is out on HF!

Posted by KvAk_AKPlaysYT@reddit | LocalLLaMA | View on Reddit | 131 comments

Reply to Post

131 Comments

Any-Conference1005@reddit

Awesome, can we prune to 90+ % of its size so it can fit my 4090? Plzzzzzzzzzzzzz :p
View on Reddit #74110502

LagOps91@reddit

Get 128GB ram and you can actually run it at 4 tokens per second at q2. Not great, but I'm happy to be able to run it at all.
View on Reddit #74133280

Any-Conference1005@reddit

in DDR4?
View on Reddit #74580664

LagOps91@reddit

Will be less with ddr4, but not sure by how much 
View on Reddit #74582452

Emotional-Baker-490@reddit

4.6 air wen?
View on Reddit #74085033

SilentLennie@reddit

Maybe when people ban together and chip in to do a distilled model.
View on Reddit #74101626

TomLucidor@reddit

\*band Also yes, if only there is a way to easily distill weights... Or just factorize the matrices!
View on Reddit #74112691

SilentLennie@reddit

> if only there is a way to easily distill weights It's not an unsolved problem, we know know how to do it in general and who has experience with it, etc. Just a matter of getting enough compute together.
View on Reddit #74128789

TomLucidor@reddit

You managed to utter the underlying problem: can we have a way of not needing to rain dance to get a distilled model from someone else?
View on Reddit #74131298

SilentLennie@reddit

I mean, just get together with people and chip in 5 bucks. Maybe a kickstarter (we will hire X and Y who are known in the community to know what they are doing and use Z for infrastructure to rent and X thinks it will take W amount of compute and time) ?
View on Reddit #74207562

TomLucidor@reddit

It's the other half of the problem even if it is KoFi rather than KickStarter/GoFundMe: nailing the ballpark amounts down, assuming that (a) compute price fluctuates like hell (b) distillations can fail sometimes (c) taxes and red tape insanity when foreigners wants to pitch in cash.
View on Reddit #74260498

SilentLennie@reddit

All, true, but after the first time, it should get easier to do so.
View on Reddit #74278777

TomLucidor@reddit

Patreon => actually needing to be a trusted celeb with a large fanbase Proper donation with an org => red tape + infighting + embezzlement
View on Reddit #74299714

SilentLennie@reddit

Seems like you are just trying to look for reasons why it wouldn't work. I'm saying it just needs some people to put in some effort to figure out what would work.
View on Reddit #74300506

TomLucidor@reddit

Lived long enough to see these thing recurring consistently, that I want to plant my face on the table when no dream-team steps up. It's how waiting for a savior is complementary to running into a burning building. Also for some damn reason DashCon 1 appears in my mind. Hmmm
View on Reddit #74300871

abnormal_human@reddit

What do you think 4.6V was?
View on Reddit #74086381

Karyo_Ten@reddit

A better 4.5V but they state in the readme that they know it has flaws for text and they didn't release text benchmarks. Not saying it's bad, but for me it implies they don't think it's a superset of GLM-4.5-Air
View on Reddit #74124861

bbjurn@reddit

Not 4.6 Air... In my testing it isn't necessarily better than 4.5 Air, but that's just my use case. Let's hope there'll be a 4.7 Air.
View on Reddit #74089244

RickyRickC137@reddit

In two weeks
View on Reddit #74089476

Tall-Ad-7742@reddit

no no no... its now 4.7 air wen?
View on Reddit #74088723

ttkciar@reddit

I'm happy to continue using 4.5-Air until a worthy successor comes along.
View on Reddit #74086343

jacek2023@reddit

No Air - no fun
View on Reddit #74083462

JustinPooDough@reddit

You realize their coding plan is incredibly cheap and you can use the api for anything - not just Claude code
View on Reddit #74083925

jacek2023@reddit

But I use AI locally
View on Reddit #74084131

_VirtualCosmos_@reddit

Crazy, right? What was this sub about again?
View on Reddit #74084452

fanhed@reddit

Buy pro 6000 x3, so you can run glm-4.7-awq locally.
View on Reddit #74085996

Zyj@reddit

I just ordered my second Strix Halo!
View on Reddit #74087671

_VirtualCosmos_@reddit

Mine have not ever arrived yet and I bought it on kickstarter months ago... which one you have/will have?
View on Reddit #74099201

Zyj@reddit

2x Bosgame M5
View on Reddit #74147260

_VirtualCosmos_@reddit

Now I know what to ask Santa Claus.
View on Reddit #74086898

TheRealMasonMac@reddit

Santa Claus is busy gooning to his AI GF
View on Reddit #74087480

_VirtualCosmos_@reddit

Dang. Understandable tho.
View on Reddit #74088021

Emotional-Baker-490@reddit

No way, someone who uses ai on their own computer in Local Llama!?
View on Reddit #74085093

pilibitti@reddit

sir this is r/localllama
View on Reddit #74098238

kimodosr@reddit

glm say new model coming soon. nano or air don't know
View on Reddit #74101539

Recoil42@reddit

Everything's amazing and nobody's happy.
View on Reddit #74084174

duboispourlhiver@reddit

I'm happy
View on Reddit #74088483

thrownawaymane@reddit

I’m not happy, Bob.
View on Reddit #74095960

duboispourlhiver@reddit

I give free hugs
View on Reddit #74096051

thrownawaymane@reddit

What about the shareholders? Who’s hugging them?
View on Reddit #74096259

duboispourlhiver@reddit

Money I guess?
View on Reddit #74099603

pmttyji@reddit

Right after 4.6 Air release
View on Reddit #74084817

Long_comment_san@reddit

Just curious - how would people rate something like Q2 of a model like that?
View on Reddit #74123601

LagOps91@reddit

Q2 works great for me. Much better than qwen 235b at Q4 at least. Leagues ahead of air.
View on Reddit #74133334

Long_comment_san@reddit

Yay. Thanks. I'm looking to hop off 4.5 air to something newer. Seems like it's decided.
View on Reddit #74133429

DingyAtoll@reddit

Wow this really is SOTA https://preview.redd.it/sj2t2pcl3t8g1.png?width=4426&format=png&auto=webp&s=e9dd2990317e5d290df8331f3c7ecfef96c399b2
View on Reddit #74090869

martinsky3k@reddit

wow! Sota benchmarks. Sota metrics Sota Sota. Wow look at benchmarks!!! They mean model good!! Why would charts say otherwise?
View on Reddit #74129362

DingyAtoll@reddit

Fair point tbh
View on Reddit #74130591

usernameplshere@reddit

Do we know if this is with thinking enabled?
View on Reddit #74108837

AnticitizenPrime@reddit

Diagrams in the reasoning/planning stage, cool. That's a first. https://media.discordapp.net/attachments/1451755268789768192/1452707589744889997/image.png?ex=694acadf&is=6949795f&hm=f1c5a42ea847a6f85e7cd7ba49639ae383dcbedb5765d8323acc471c524deac5&=&format=webp&quality=lossless Result: https://chat.z.ai/space/v08umaevwcn0-art Prompt: Create a user friendly, attractive web radio app that will play free SomaFM streams. Make it fully featured. Use your web search tool functionality to identify the correct station endpoints, 'album art', etc.
View on Reddit #74088610

Square_Quarter516@reddit

https://preview.redd.it/z2sau9rjbx8g1.png?width=2750&format=png&auto=webp&s=6af6761ff52652334255e7b055e6e5afc6b9e99a gemini 3 pro, not bad
View on Reddit #74128784

Arindam_200@reddit

Oh nice!
View on Reddit #74119283

GTHell@reddit

So how long does it take to complete this? Just curious.
View on Reddit #74117942

AnticitizenPrime@reddit

Couple of minutes.
View on Reddit #74118742

No_Conversation9561@reddit

See how it’s done, Minimax?
View on Reddit #74083902

coder543@reddit

What is Minimax doing instead?
View on Reddit #74085191

zmarty@reddit

Not yet releasing Minimax 2.1 weights.
View on Reddit #74085384

ForsookComparison@reddit

I'm not going to even evaluate it with their API if I can't eventually transition to on-prem or to a provider that better suits my needs. For that to even be on the table they'd need to crush Sonnet or something.
View on Reddit #74090005

power97992@reddit

By the time they crush sonnet 4.5, there will be sonnet 4.7 or 5
View on Reddit #74127471

usernameplshere@reddit

I didn't even know there was 2.1, lol.
View on Reddit #74108782

dan_goosewin@reddit

I know for a fact they will release the weights on Hugging Face
View on Reddit #74092771

zmarty@reddit

Great. Looking forward to it, I use Minimax M2 locally.
View on Reddit #74093242

power97992@reddit

Yeah, glm 4.7 is not better than minimax 2.1 for certain coding tasks, perhaps even worse, but someone should test them both more to assess them further..
View on Reddit #74088774

thatsnot_kawaii_bro@reddit

And then 2 comments later you'll see another one with the names flipped (minus the last one) And then again
View on Reddit #74111239

Repulsive_Educator61@reddit

chill minimax
View on Reddit #74102709

Dany0@reddit

Oh Santa clause is comin' to town this year boys and gals
View on Reddit #74083331

LegacyRemaster@reddit

I've been testing 4.7 for the last hour, and it's incredible. Python and HTML: all tasks solved. About 2,000 lines of code in Python and 1,200 in HTML+CSS, etc. Maximum 2 runs and everything was fine.
View on Reddit #74084906

TheRealMasonMac@reddit

I haven't tried 4.7 with CLI agentic coding tools yet. GLM-4.6 had an issue with not really understanding how to optimally use tools for performing a task, especially in comparison to M2. Is that addressed?
View on Reddit #74087430

Karyo_Ten@reddit

One of the main changes imof GLM-4.7 is that z-ai changed the tool calling format, so I assume this was their focus.
View on Reddit #74124799

SuperChewbacca@reddit

GLM-4.6 was actually worse at tool calling than GLM-4.5-Air for me. It's still a good model though, I just had to prompt it more to encourage tool calling.
View on Reddit #74095068

Dany0@reddit

Python and web development is not real programming. Give the models a 2-shot minesweeper clone with a twist in pure C.
View on Reddit #74087554

thatsnot_kawaii_bro@reddit

"real programming" *Asks it to two shot a greenfield project of a small game* What do you think is more common in industry? Backend/frontend? Or small games in a greenfield codebase?
View on Reddit #74111314

RickDripps@reddit

Just because they're interpreted languages doesn't diminish the incredible and amazing things you can do with them. (Thinking specifically about Python...) Don't be "that guy" and let people be excited. Also, I bet it's a hell of a lot better at C, Kotlin/Java, Swift, and probably any language than I am and I'm getting paid lots of money to do it. More power in the hands of people who don't need to go through all the shit I went through is great. Can't wait until it completely outclasses any engineer (instead of just 90% of us). Then we can focus on the actual complex issues instead of just the code to get us to the resolution.
View on Reddit #74093711

Dany0@reddit

Vibe coders are excited about models just to vibe code a... language that's supposed to be easier for humans. Sure, okay. Failure of imagination. If you have an all-powerful AI that can do the coding part for you surely it can do what you can't. But no vibe coders want a pansy AI that's just like them
View on Reddit #74095120

jazir555@reddit

Gate keeping programming and shitting on languages you dont like, quintessential haughty redditor
View on Reddit #74103106

RickDripps@reddit

If you're not "vibe coding" all of the simple shit we do as part of our job you are wasting insane amounts of time. Great coders don't make great engineers. Great problem-solvers do. So yeah, keep your head in the sand. Label anyone who uses AI as a "vibe coder" and keep your gatekeeping up. The rest of us are running circles around our peers and getting more done in much easier ways than ever. Look down your nose at people who will soon be outperforming you all you want. One day you'll look around and realize the entire industry has changed and you're stuck clutching your pearls.
View on Reddit #74101289

AlwaysLateToThaParty@reddit

[pytorch](https://pytorch.org/#) is "not real programming" apparently.
View on Reddit #74096846

Purple-Programmer-7@reddit

💀
View on Reddit #74092702

Professional_Price89@reddit

Sonnet and Opus are bad models for me, they cant solve algorithm, math, cryptographic related problem.
View on Reddit #74085792

MrMrsPotts@reddit

Which do you find better?
View on Reddit #74087254

Professional_Price89@reddit

Gemini 3 pro, or Deepseek 3.2 Speciale. I try breaking a game security and Claude only throw "I see" "I found the problem..." Then start to write a lot of .md files and code that nothing related to real problem.
View on Reddit #74088883

buppermint@reddit

This is what I've found too, I use LLMs for research and have found that Claude models are REALLY bad at deeply thinking through deep, abstract concepts. Opus 4.5 is dumber than even old reasoning models like R1 and o1 in this regard. They are very good at creating boilerplate SWE-type code though.
View on Reddit #74097504

Fuzzy_Independent241@reddit

You must admit then that Claude is TOP OF THE POOPS for writing irrelevant MD files! All they need now is the right benchmark.
View on Reddit #74089321

Dany0@reddit

I honestly cannot relate. Maybe it's because I told it to write everything in mermaid graphs and data flows and stick to data-oriented programming, or maybe it's because I told it to break down everything into tasks and also criticise itself, or maybe it's because I gave it an .MD file I wrote by hand which was up to my standards and told it to read that if it needs style guidance. But the .md files it produces for me are short and to the point. Usually I get it to plan around the end goal, then tell it to translate its plan to an .md and then tick off one task after another
View on Reddit #74091127

wittlewayne@reddit

I am almost annoyed by how good sonnet is.... and Im mostly annoyed because it's only cloud based....I want that shit local
View on Reddit #74096294

Mkengine@reddit

Not that I am not happy about all the chinese releases, but if you look at uncontaminated benchmarks like [swe-rebench] (https://swe-rebench.com/) you see a big gap between GLM 4.6 and GPT 5.x models instead if the 2% on swe-bench verified. Don't trust benchmarks companies can perform themselves.
View on Reddit #74088974

Dany0@reddit

That's still a very respectable showing for GLM 4.6 and represents probably where I'd put it given my experience with it. I'd wager GLM 4.7 will be significantly higher than DeepSeek 3.2 when they test it
View on Reddit #74090920

decentralize999@reddit

Do they have android app for testing it? Seems the best openweight llm after Xiaomi Mimo V2 Flash in this month.
View on Reddit #74122903

GTHell@reddit

Good open source model, but bad business practice. Their paid model got nerf to infinity, though GLM 4.6 was actually a good model if you can pay from other providers.
View on Reddit #74117858

RandomThoughtsAt3AM@reddit

https://preview.redd.it/q9czibbqgt8g1.png?width=1162&format=png&auto=webp&s=118b4808a3e8fa8dc176a95cb7085d5cd392c2dc Loved the transparency of the model. I always go for the more extreme or philosophical on personal life questions, and the model gave me the best response possible, no filters on what was being recommended. No other model has ever suggested anything like this.
View on Reddit #74095913

TomLucidor@reddit

Turn this into an EQ-Bench like benchmark already!
View on Reddit #74112758

Mochila-Mochila@reddit

Getting away from the abuse should be the top priority bro, best of luck.
View on Reddit #74099702

seppe0815@reddit

very low vram needed big love ..........
View on Reddit #74085127

TomLucidor@reddit

Pray for GLM Air then!
View on Reddit #74112609

mivog49274@reddit

benchmaxx it until the last drop of 2025
View on Reddit #74110225

Shir_man@reddit

Q1 imat when
View on Reddit #74108043

KvAk_AKPlaysYT@reddit (OP)

On it lol, was working on the big boi quants so far :)
View on Reddit #74109498

Shir_man@reddit

What is the cheapest way to run this model in cloud?
View on Reddit #74108077

KvAk_AKPlaysYT@reddit (OP)

Runpod most probably, or GColab if you are on Pro. On Runpod you'd need multiple GPUs though, something like 4x6000 Pros Blackwells for respectable context windows and sick speeds.
View on Reddit #74109299

KvAk_AKPlaysYT@reddit (OP)

GGUF: [https://huggingface.co/AaryanK/GLM-4.7-GGUF](https://huggingface.co/AaryanK/GLM-4.7-GGUF)
View on Reddit #74098151

ParadigmComplex@reddit

Thank you!
View on Reddit #74108919

KvAk_AKPlaysYT@reddit (OP)

Thou shall receive, uploading the final batch of quants rn :)
View on Reddit #74109120

Kompicek@reddit

Honestly VERY impressed so far. I expected only a marginal improvement. Better than Kimi so far?
View on Reddit #74104041

unbrained_01@reddit

tbh, using it with dcp in opencode just blew me away! [https://github.com/Opencode-DCP/opencode-dynamic-context-pruning](https://github.com/Opencode-DCP/opencode-dynamic-context-pruning)
View on Reddit #74099881

SilentLennie@reddit

I think Github is having some issues: 503 Service Unavailable No server is available to handle this request.
View on Reddit #74101796

kimodosr@reddit

and new model coming soon. nano or air
View on Reddit #74101417

Goldandsilverape99@reddit

https://preview.redd.it/71336m2qys8g1.jpeg?width=1408&format=pjpg&auto=webp&s=3fc8857c9ade10a5a6884689614e4d4892a94bd1
View on Reddit #74089143

KvAk_AKPlaysYT@reddit (OP)

Did it :) [https://huggingface.co/AaryanK/GLM-4.7-GGUF](https://huggingface.co/AaryanK/GLM-4.7-GGUF)
View on Reddit #74098170

KvAk_AKPlaysYT@reddit (OP)

On it 🫡
View on Reddit #74089364

Goldandsilverape99@reddit

Tried the model on [https://chat.z.ai/](https://chat.z.ai/) . For a particular test puzzle question (appears in the computer game Indiana Jones and the Great Circle ) it failed, that 4.6 could answer. A bit depressing, meaning that the improvement are a bit more on the benchmaxxing side?
View on Reddit #74098026

WithoutReason1729@reddit

Your post is getting popular and we just featured it on our Discord! [Come check it out!](https://discord.gg/PgFhZ8cnWW) You've also been given a special flair for your contribution. We appreciate your post! *I am a bot and this action was performed automatically.*
View on Reddit #74093627

serige@reddit

I swear I just downloaded 4.6 gguf like 3 days ago
View on Reddit #74084358

ResidentPositive4122@reddit

Flashbacks to that time where you'd download something from kazaa over dial-up, and after a few hours of waiting you'd get ... not the movie you wanted :D
View on Reddit #74087274

AlbeHxT9@reddit

You just had to put down the popcorn cylindrical container, and take another cylinder
View on Reddit #74093397

abnormal_human@reddit

I like how they compare to OpenAI's flagship but Anthropic's one-step-down model. Come on guys, real people using Claude today are using Opus, not Sonnet. Don't be misleading in your evals.
View on Reddit #74086446

mantafloppy@reddit

How dare you. Here on LOCALllama, we praise every model, even those 99% of us cant run. We are here for the benchmark publish by the model maker, they are Gospel. Get with the program dude, this is the bot and marketing time, not the /r/LocalLLaMA of old.
View on Reddit #74092969

SlaveZelda@reddit

Opus is also 20 times the price and probably 3 times the size.
View on Reddit #74087921

Nicoolodion@reddit

Yep. They compare it to models in their price range
View on Reddit #74092315

DHasselhoff77@reddit

I agree. Not using top-of-the-line model of your competitors in a chart like that is very misleading.
View on Reddit #74089334

dan_goosewin@reddit

damn, GLM-4.7 scored 42% on HLE o.O
View on Reddit #74092822

waste2treasure-org@reddit

...and still no Gemma 4
View on Reddit #74085392

ReallyFineJelly@reddit

Wow, chill. We just got Gemini 3, 3 Flash and Nano Banana Pro. Gemma is always the last model to come.
View on Reddit #74085979

coder543@reddit

Gemini and Gemma are separate teams that do their own things. | Release date (YYYY-MM-DD) | Gemini releases | Gemma releases | |---:|---|---| | 2023-12-06 | Gemini 1.0 Pro; Gemini 1.0 Nano  | — | | 2024-02-08 | Gemini 1.0 Ultra  | — | | 2024-02-15 | Gemini 1.5 Pro  | — | | 2024-02-21 | — | Gemma 2B; Gemma 7B  | | 2024-04-04 | — | Gemma 1.1 2B; Gemma 1.1 7B  | | 2024-05-14 | Gemini 1.5 Flash  | — | | 2024-06-27 | — | Gemma 2 9B; Gemma 2 27B  | | 2024-07-31 | — | Gemma 2 2B  | | 2024-12-11 | Gemini 2.0 Flash (experimental)  | — | | 2025-02-05 | Gemini 2.0 Pro (experimental); Gemini 2.0 Flash-Lite (preview)  | — | | 2025-03-10 | — | Gemma 3 1B; Gemma 3 4B; Gemma 3 12B; Gemma 3 27B  | | 2025-03-25 | Gemini 2.5 Pro (experimental)  | — | | 2025-04-17 | Gemini 2.5 Flash (preview)  | — | | 2025-06-17 | Gemini 2.5 Pro (GA); Gemini 2.5 Flash (GA); Gemini 2.5 Flash-Lite (preview)  | — | | 2025-08-14 | — | Gemma 3 270M  | | 2025-11-18 | Gemini 3 Pro (preview); Gemini 3 Deep Think  | — | | 2025-12-17 | Gemini 3 Flash  | — | No real pattern.
View on Reddit #74086948

pmttyji@reddit

It's been 9 months(Mar 2025) since Gemma3-1-4-12-27B models. Hopefully Gemma4 in 3 months(Mar 2026)
View on Reddit #74091333

Zyj@reddit

Who cares about closed weights models here?
View on Reddit #74087542

Different_Fix_2217@reddit

I'd say its nearly as good as gemini 3 flash. Which is impressive since flash is apparently 1.2T
View on Reddit #74088316

Minute-Act-4943@reddit

GLM 4.7 is generally available, I just tried it with my super cheap subscription plan. For anyone looking to subscribe, they are currently offering stacked discounts 50%+(20-30%)+10% for black Friday deals. Use link [https://z.ai/subscribe?ic=OUCO7ISEDB](https://z.ai/subscribe?ic=OUCO7ISEDB)
View on Reddit #74087238

JLeonsarmiento@reddit

https://preview.redd.it/g22il1oeps8g1.jpeg?width=1000&format=pjpg&auto=webp&s=2c6c3677d806d1f193562348530ce173acbcc9ba
View on Reddit #74085451

jacek2023@reddit

[https://huggingface.co/zai-org/GLM-4.7/discussions/1](https://huggingface.co/zai-org/GLM-4.7/discussions/1)
View on Reddit #74085260

No_Conversation9561@reddit

See how it’s done, Minimax?
View on Reddit #74083890

doradus_novae@reddit

gguf wen
View on Reddit #74083595