GLM 4.7 is out on HF!

[-]

Any-Conference1005@reddit

Awesome, can we prune to 90+ % of its size so it can fit my 4090? Plzzzzzzzzzzzzz :p

Reply

[-]

LagOps91@reddit

Get 128GB ram and you can actually run it at 4 tokens per second at q2. Not great, but I'm happy to be able to run it at all.

Reply

[-]

LagOps91@reddit

Will be less with ddr4, but not sure by how much

Reply

[-]

SilentLennie@reddit

Maybe when people ban together and chip in to do a distilled model.

Reply

[-]

TomLucidor@reddit

\*band Also yes, if only there is a way to easily distill weights... Or just factorize the matrices!

Reply

[-]

> if only there is a way to easily distill weights It's not an unsolved problem, we know know how to do it in general and who has experience with it, etc. Just a matter of getting enough compute together.

Reply

[-]

TomLucidor@reddit

You managed to utter the underlying problem: can we have a way of not needing to rain dance to get a distilled model from someone else?

Reply

[-]

SilentLennie@reddit

I mean, just get together with people and chip in 5 bucks. Maybe a kickstarter (we will hire X and Y who are known in the community to know what they are doing and use Z for infrastructure to rent and X thinks it will take W amount of compute and time) ?

Reply

[-]

TomLucidor@reddit

It's the other half of the problem even if it is KoFi rather than KickStarter/GoFundMe: nailing the ballpark amounts down, assuming that (a) compute price fluctuates like hell (b) distillations can fail sometimes (c) taxes and red tape insanity when foreigners wants to pitch in cash.

Reply

[-]

SilentLennie@reddit

All, true, but after the first time, it should get easier to do so.

Reply

[-]

TomLucidor@reddit

Patreon => actually needing to be a trusted celeb with a large fanbase Proper donation with an org => red tape + infighting + embezzlement

Reply

[-]

SilentLennie@reddit

Seems like you are just trying to look for reasons why it wouldn't work. I'm saying it just needs some people to put in some effort to figure out what would work.

Reply

[-]

TomLucidor@reddit

Lived long enough to see these thing recurring consistently, that I want to plant my face on the table when no dream-team steps up. It's how waiting for a savior is complementary to running into a burning building. Also for some damn reason DashCon 1 appears in my mind. Hmmm

Reply

[-]

abnormal_human@reddit

What do you think 4.6V was?

Reply

[-]

Karyo_Ten@reddit

A better 4.5V but they state in the readme that they know it has flaws for text and they didn't release text benchmarks. Not saying it's bad, but for me it implies they don't think it's a superset of GLM-4.5-Air

Reply

[-]

bbjurn@reddit

Not 4.6 Air... In my testing it isn't necessarily better than 4.5 Air, but that's just my use case. Let's hope there'll be a 4.7 Air.

Reply

[-]

RickyRickC137@reddit

In two weeks

Reply

[-]

Tall-Ad-7742@reddit

no no no... its now 4.7 air wen?

Reply

[-]

ttkciar@reddit

I'm happy to continue using 4.5-Air until a worthy successor comes along.

Reply

[-]

jacek2023@reddit

No Air - no fun

Reply

[-]

JustinPooDough@reddit

You realize their coding plan is incredibly cheap and you can use the api for anything - not just Claude code

Reply

[-]

jacek2023@reddit

But I use AI locally

Reply

[-]

_VirtualCosmos_@reddit

Crazy, right? What was this sub about again?

Reply

[-]

fanhed@reddit

Buy pro 6000 x3, so you can run glm-4.7-awq locally.

Reply

[-]

Zyj@reddit

I just ordered my second Strix Halo!

Reply

[-]

_VirtualCosmos_@reddit

Mine have not ever arrived yet and I bought it on kickstarter months ago... which one you have/will have?

Reply

[-]

Zyj@reddit

2x Bosgame M5

Reply

[-]

_VirtualCosmos_@reddit

Now I know what to ask Santa Claus.

Reply

[-]

TheRealMasonMac@reddit

Santa Claus is busy gooning to his AI GF

Reply

[-]

_VirtualCosmos_@reddit

Dang. Understandable tho.

Reply

[-]

Emotional-Baker-490@reddit

No way, someone who uses ai on their own computer in Local Llama!?

Reply

[-]

pilibitti@reddit

sir this is r/localllama

Reply

[-]

kimodosr@reddit

glm say new model coming soon. nano or air don't know

Reply

[-]

Recoil42@reddit

Everything's amazing and nobody's happy.

Reply

[-]

duboispourlhiver@reddit

I'm happy

Reply

[-]

thrownawaymane@reddit

I’m not happy, Bob.

Reply

[-]

duboispourlhiver@reddit

I give free hugs

Reply

[-]

thrownawaymane@reddit

What about the shareholders? Who’s hugging them?

Reply

[-]

duboispourlhiver@reddit

Money I guess?

Reply

[-]

pmttyji@reddit

Right after 4.6 Air release

Reply

[-]

Long_comment_san@reddit

Just curious - how would people rate something like Q2 of a model like that?

Reply

[-]

LagOps91@reddit

Q2 works great for me. Much better than qwen 235b at Q4 at least. Leagues ahead of air.

Reply

[-]

Long_comment_san@reddit

Yay. Thanks. I'm looking to hop off 4.5 air to something newer. Seems like it's decided.

Reply

[-]

DingyAtoll@reddit

Wow this really is SOTA https://preview.redd.it/sj2t2pcl3t8g1.png?width=4426&format=png&auto=webp&s=e9dd2990317e5d290df8331f3c7ecfef96c399b2

Reply

[-]

martinsky3k@reddit

wow! Sota benchmarks. Sota metrics Sota Sota. Wow look at benchmarks!!! They mean model good!! Why would charts say otherwise?

Reply

[-]

DingyAtoll@reddit

Fair point tbh

Reply

[-]

usernameplshere@reddit

Do we know if this is with thinking enabled?

Reply

[-]

AnticitizenPrime@reddit

Diagrams in the reasoning/planning stage, cool. That's a first. https://media.discordapp.net/attachments/1451755268789768192/1452707589744889997/image.png?ex=694acadf&is=6949795f&hm=f1c5a42ea847a6f85e7cd7ba49639ae383dcbedb5765d8323acc471c524deac5&=&format=webp&quality=lossless Result: https://chat.z.ai/space/v08umaevwcn0-art Prompt: Create a user friendly, attractive web radio app that will play free SomaFM streams. Make it fully featured. Use your web search tool functionality to identify the correct station endpoints, 'album art', etc.

Reply

[-]

Square_Quarter516@reddit

https://preview.redd.it/z2sau9rjbx8g1.png?width=2750&format=png&auto=webp&s=6af6761ff52652334255e7b055e6e5afc6b9e99a gemini 3 pro, not bad

Reply

[-]

Arindam_200@reddit

Oh nice!

Reply

[-]

GTHell@reddit

So how long does it take to complete this? Just curious.

Reply

[-]

AnticitizenPrime@reddit

Couple of minutes.

Reply

[-]

No_Conversation9561@reddit

See how it’s done, Minimax?

Reply

[-]

coder543@reddit

What is Minimax doing instead?

Reply

[-]

zmarty@reddit

Not yet releasing Minimax 2.1 weights.

Reply

[-]

ForsookComparison@reddit

I'm not going to even evaluate it with their API if I can't eventually transition to on-prem or to a provider that better suits my needs. For that to even be on the table they'd need to crush Sonnet or something.

Reply

[-]

power97992@reddit

By the time they crush sonnet 4.5, there will be sonnet 4.7 or 5

Reply

[-]

usernameplshere@reddit

I didn't even know there was 2.1, lol.

Reply

[-]

dan_goosewin@reddit

I know for a fact they will release the weights on Hugging Face

Reply

[-]

zmarty@reddit

Great. Looking forward to it, I use Minimax M2 locally.

Reply

[-]

power97992@reddit

Yeah, glm 4.7 is not better than minimax 2.1 for certain coding tasks, perhaps even worse, but someone should test them both more to assess them further..

Reply

[-]

thatsnot_kawaii_bro@reddit

And then 2 comments later you'll see another one with the names flipped (minus the last one) And then again

Reply

[-]

Repulsive_Educator61@reddit

chill minimax

Reply

[-]

Dany0@reddit

Oh Santa clause is comin' to town this year boys and gals

Reply

[-]

LegacyRemaster@reddit

I've been testing 4.7 for the last hour, and it's incredible. Python and HTML: all tasks solved. About 2,000 lines of code in Python and 1,200 in HTML+CSS, etc. Maximum 2 runs and everything was fine.

Reply

[-]

TheRealMasonMac@reddit

I haven't tried 4.7 with CLI agentic coding tools yet. GLM-4.6 had an issue with not really understanding how to optimally use tools for performing a task, especially in comparison to M2. Is that addressed?

Reply

[-]

Karyo_Ten@reddit

One of the main changes imof GLM-4.7 is that z-ai changed the tool calling format, so I assume this was their focus.

Reply

[-]

SuperChewbacca@reddit

GLM-4.6 was actually worse at tool calling than GLM-4.5-Air for me. It's still a good model though, I just had to prompt it more to encourage tool calling.

Reply

[-]

Dany0@reddit

Python and web development is not real programming. Give the models a 2-shot minesweeper clone with a twist in pure C.

Reply

[-]

thatsnot_kawaii_bro@reddit

"real programming" *Asks it to two shot a greenfield project of a small game* What do you think is more common in industry? Backend/frontend? Or small games in a greenfield codebase?

Reply

[-]

RickDripps@reddit

Just because they're interpreted languages doesn't diminish the incredible and amazing things you can do with them. (Thinking specifically about Python...) Don't be "that guy" and let people be excited. Also, I bet it's a hell of a lot better at C, Kotlin/Java, Swift, and probably any language than I am and I'm getting paid lots of money to do it. More power in the hands of people who don't need to go through all the shit I went through is great. Can't wait until it completely outclasses any engineer (instead of just 90% of us). Then we can focus on the actual complex issues instead of just the code to get us to the resolution.

Reply

[-]

Dany0@reddit

Vibe coders are excited about models just to vibe code a... language that's supposed to be easier for humans. Sure, okay. Failure of imagination. If you have an all-powerful AI that can do the coding part for you surely it can do what you can't. But no vibe coders want a pansy AI that's just like them

Reply

[-]

jazir555@reddit

Gate keeping programming and shitting on languages you dont like, quintessential haughty redditor

Reply

[-]

RickDripps@reddit

If you're not "vibe coding" all of the simple shit we do as part of our job you are wasting insane amounts of time. Great coders don't make great engineers. Great problem-solvers do. So yeah, keep your head in the sand. Label anyone who uses AI as a "vibe coder" and keep your gatekeeping up. The rest of us are running circles around our peers and getting more done in much easier ways than ever. Look down your nose at people who will soon be outperforming you all you want. One day you'll look around and realize the entire industry has changed and you're stuck clutching your pearls.

Reply

[-]

AlwaysLateToThaParty@reddit

[pytorch](https://pytorch.org/#) is "not real programming" apparently.

Reply

[-]

Purple-Programmer-7@reddit

💀

Reply

[-]

Professional_Price89@reddit

Sonnet and Opus are bad models for me, they cant solve algorithm, math, cryptographic related problem.

Reply

[-]

MrMrsPotts@reddit

Which do you find better?

Reply

[-]

Professional_Price89@reddit

Gemini 3 pro, or Deepseek 3.2 Speciale. I try breaking a game security and Claude only throw "I see" "I found the problem..." Then start to write a lot of .md files and code that nothing related to real problem.

Reply

[-]

buppermint@reddit

This is what I've found too, I use LLMs for research and have found that Claude models are REALLY bad at deeply thinking through deep, abstract concepts. Opus 4.5 is dumber than even old reasoning models like R1 and o1 in this regard. They are very good at creating boilerplate SWE-type code though.

Reply

[-]

Fuzzy_Independent241@reddit

You must admit then that Claude is TOP OF THE POOPS for writing irrelevant MD files! All they need now is the right benchmark.

Reply

[-]

Dany0@reddit

I honestly cannot relate. Maybe it's because I told it to write everything in mermaid graphs and data flows and stick to data-oriented programming, or maybe it's because I told it to break down everything into tasks and also criticise itself, or maybe it's because I gave it an .MD file I wrote by hand which was up to my standards and told it to read that if it needs style guidance. But the .md files it produces for me are short and to the point. Usually I get it to plan around the end goal, then tell it to translate its plan to an .md and then tick off one task after another

Reply

[-]

wittlewayne@reddit

I am almost annoyed by how good sonnet is.... and Im mostly annoyed because it's only cloud based....I want that shit local

Reply

[-]

Mkengine@reddit

Not that I am not happy about all the chinese releases, but if you look at uncontaminated benchmarks like [swe-rebench] (https://swe-rebench.com/) you see a big gap between GLM 4.6 and GPT 5.x models instead if the 2% on swe-bench verified. Don't trust benchmarks companies can perform themselves.

Reply

[-]

Dany0@reddit

That's still a very respectable showing for GLM 4.6 and represents probably where I'd put it given my experience with it. I'd wager GLM 4.7 will be significantly higher than DeepSeek 3.2 when they test it

Reply

[-]

decentralize999@reddit

Do they have android app for testing it? Seems the best openweight llm after Xiaomi Mimo V2 Flash in this month.

Reply

[-]

GTHell@reddit

Good open source model, but bad business practice. Their paid model got nerf to infinity, though GLM 4.6 was actually a good model if you can pay from other providers.

Reply

[-]

RandomThoughtsAt3AM@reddit

https://preview.redd.it/q9czibbqgt8g1.png?width=1162&format=png&auto=webp&s=118b4808a3e8fa8dc176a95cb7085d5cd392c2dc Loved the transparency of the model. I always go for the more extreme or philosophical on personal life questions, and the model gave me the best response possible, no filters on what was being recommended. No other model has ever suggested anything like this.

Reply

[-]

TomLucidor@reddit

Turn this into an EQ-Bench like benchmark already!

Reply

[-]

Mochila-Mochila@reddit

Getting away from the abuse should be the top priority bro, best of luck.

Reply

[-]

seppe0815@reddit

very low vram needed big love ..........

Reply

[-]

TomLucidor@reddit

Pray for GLM Air then!

Reply

[-]

mivog49274@reddit

benchmaxx it until the last drop of 2025

Reply

[-]

Shir_man@reddit

Q1 imat when

Reply

[-]

KvAk_AKPlaysYT@reddit (OP)

On it lol, was working on the big boi quants so far :)

Reply

[-]

Shir_man@reddit

What is the cheapest way to run this model in cloud?

Reply

[-]

KvAk_AKPlaysYT@reddit (OP)

Runpod most probably, or GColab if you are on Pro. On Runpod you'd need multiple GPUs though, something like 4x6000 Pros Blackwells for respectable context windows and sick speeds.

Reply

[-]

KvAk_AKPlaysYT@reddit (OP)

GGUF: [https://huggingface.co/AaryanK/GLM-4.7-GGUF](https://huggingface.co/AaryanK/GLM-4.7-GGUF)

Reply

[-]

ParadigmComplex@reddit

Thank you!

Reply

[-]

KvAk_AKPlaysYT@reddit (OP)

Thou shall receive, uploading the final batch of quants rn :)

Reply

[-]

Kompicek@reddit

Honestly VERY impressed so far. I expected only a marginal improvement. Better than Kimi so far?

Reply

[-]

unbrained_01@reddit

tbh, using it with dcp in opencode just blew me away! [https://github.com/Opencode-DCP/opencode-dynamic-context-pruning](https://github.com/Opencode-DCP/opencode-dynamic-context-pruning)

Reply

[-]

SilentLennie@reddit

I think Github is having some issues: 503 Service Unavailable No server is available to handle this request.

Reply

[-]

kimodosr@reddit

and new model coming soon. nano or air

Reply

[-]

Goldandsilverape99@reddit

https://preview.redd.it/71336m2qys8g1.jpeg?width=1408&format=pjpg&auto=webp&s=3fc8857c9ade10a5a6884689614e4d4892a94bd1

Reply

[-]

KvAk_AKPlaysYT@reddit (OP)

Did it :) [https://huggingface.co/AaryanK/GLM-4.7-GGUF](https://huggingface.co/AaryanK/GLM-4.7-GGUF)

Reply

[-]

KvAk_AKPlaysYT@reddit (OP)

On it 🫡

Reply

[-]

Goldandsilverape99@reddit

Tried the model on [https://chat.z.ai/](https://chat.z.ai/) . For a particular test puzzle question (appears in the computer game Indiana Jones and the Great Circle ) it failed, that 4.6 could answer. A bit depressing, meaning that the improvement are a bit more on the benchmaxxing side?

Reply

[-]

WithoutReason1729@reddit

Your post is getting popular and we just featured it on our Discord! [Come check it out!](https://discord.gg/PgFhZ8cnWW) You've also been given a special flair for your contribution. We appreciate your post! *I am a bot and this action was performed automatically.*

Reply

[-]

serige@reddit

I swear I just downloaded 4.6 gguf like 3 days ago

Reply

[-]

ResidentPositive4122@reddit

Flashbacks to that time where you'd download something from kazaa over dial-up, and after a few hours of waiting you'd get ... not the movie you wanted :D

Reply

[-]

AlbeHxT9@reddit

You just had to put down the popcorn cylindrical container, and take another cylinder

Reply

[-]

abnormal_human@reddit

I like how they compare to OpenAI's flagship but Anthropic's one-step-down model. Come on guys, real people using Claude today are using Opus, not Sonnet. Don't be misleading in your evals.

Reply

[-]

mantafloppy@reddit

How dare you. Here on LOCALllama, we praise every model, even those 99% of us cant run. We are here for the benchmark publish by the model maker, they are Gospel. Get with the program dude, this is the bot and marketing time, not the /r/LocalLLaMA of old.

Reply

[-]

SlaveZelda@reddit

Opus is also 20 times the price and probably 3 times the size.

Reply

[-]

Nicoolodion@reddit

Yep. They compare it to models in their price range

Reply

[-]

DHasselhoff77@reddit

I agree. Not using top-of-the-line model of your competitors in a chart like that is very misleading.

Reply

[-]

dan_goosewin@reddit

damn, GLM-4.7 scored 42% on HLE o.O

Reply

[-]

waste2treasure-org@reddit

...and still no Gemma 4

Reply

[-]

ReallyFineJelly@reddit

Wow, chill. We just got Gemini 3, 3 Flash and Nano Banana Pro. Gemma is always the last model to come.

Reply

[-]

coder543@reddit

Gemini and Gemma are separate teams that do their own things. | Release date (YYYY-MM-DD) | Gemini releases | Gemma releases | |---:|---|---| | 2023-12-06 | Gemini 1.0 Pro; Gemini 1.0 Nano | — | | 2024-02-08 | Gemini 1.0 Ultra | — | | 2024-02-15 | Gemini 1.5 Pro | — | | 2024-02-21 | — | Gemma 2B; Gemma 7B | | 2024-04-04 | — | Gemma 1.1 2B; Gemma 1.1 7B | | 2024-05-14 | Gemini 1.5 Flash | — | | 2024-06-27 | — | Gemma 2 9B; Gemma 2 27B | | 2024-07-31 | — | Gemma 2 2B | | 2024-12-11 | Gemini 2.0 Flash (experimental) | — | | 2025-02-05 | Gemini 2.0 Pro (experimental); Gemini 2.0 Flash-Lite (preview) | — | | 2025-03-10 | — | Gemma 3 1B; Gemma 3 4B; Gemma 3 12B; Gemma 3 27B | | 2025-03-25 | Gemini 2.5 Pro (experimental) | — | | 2025-04-17 | Gemini 2.5 Flash (preview) | — | | 2025-06-17 | Gemini 2.5 Pro (GA); Gemini 2.5 Flash (GA); Gemini 2.5 Flash-Lite (preview) | — | | 2025-08-14 | — | Gemma 3 270M | | 2025-11-18 | Gemini 3 Pro (preview); Gemini 3 Deep Think | — | | 2025-12-17 | Gemini 3 Flash | — | No real pattern.

Reply

[-]

pmttyji@reddit

It's been 9 months(Mar 2025) since Gemma3-1-4-12-27B models. Hopefully Gemma4 in 3 months(Mar 2026)

Reply

[-]

Zyj@reddit

Who cares about closed weights models here?

Reply

[-]

Different_Fix_2217@reddit

I'd say its nearly as good as gemini 3 flash. Which is impressive since flash is apparently 1.2T

Reply

[-]

Minute-Act-4943@reddit

GLM 4.7 is generally available, I just tried it with my super cheap subscription plan. For anyone looking to subscribe, they are currently offering stacked discounts 50%+(20-30%)+10% for black Friday deals. Use link [https://z.ai/subscribe?ic=OUCO7ISEDB](https://z.ai/subscribe?ic=OUCO7ISEDB)

Reply

[-]