GLM 5.1 Benchmarks

[-]

pmttyji@reddit

I think GLM-5.1 set the bar high for DeepSeekV4.

[-]

Yes-Scale-9723@reddit

i'm also waiting for the new version of deepseek. currently it has an outstanding value for money

[-]

power97992@reddit

As long v4 is just as good as 5.1 but 8x cheaper , it will be great!

[-]

SourceCodeplz@reddit

I've used GLM-5 and it is fantastic for my usage OOP PHP. Same as Sonnet 4.5 really for me.

[-]

Radiant_Hair_2739@reddit

wow, now I have the local GPT-5.4 on my local server PC with Epyc with 512gb RAM DDR4, GLM-5 given pp = 110 t/s with tg = 5.5 t/s, thanks!

[-]

Yes-Scale-9723@reddit

That's great but for coding agents 5.5 t/s is really slow, it will take hours to complete a typical 50k token task

[-]

Radiant_Hair_2739@reddit

it doesn''t matter. Because if I try to run Opus 4.6 for agentic task using API, for example here I should pay almost 20$ per difficult task which I can complete almot free using local GLM:

[-]

Yes-Scale-9723@reddit

well in that case it's a great solution, i forgot how expensive are those models

[-]

pmttyji@reddit

Nice. Hope you're using optimized llama.cpp command. Also ik_llama

[-]

Specter_Origin@reddit

I hope it has faster inference speed than last one…

[-]

NandaVegg@reddit

It's somehow much faster than 5 in all inference providers in spite of the same-ish architecture (fp8).

[-]

atape_1@reddit

Coding benchmarks are absolutely wild.

[-]

-dysangel-@reddit

I've been using it for coding the last few weeks. It's good!

[-]

the most important thing for me is is this model more CoT efficient because glm models always seem to think for like 97 years for me and im using it on zhipus official website so its not even a local hosting skill issue

[-]

Edzomatic@reddit

From my very limited testing it does indeed think less, and the final output also has less AI fluff

[-]

Xisrr1@reddit

It's been like that since GLM 5 already, now even more efficient

[-]

EndlessZone123@reddit

no vision still :(

[-]

kaggleqrdl@reddit

AHAHA GLM 5.1 announces SOTA and Anthropic comes back with .. a model you can't use. LOL. PANIC

[-]

LittleYouth4954@reddit

I've been using glm 5.1, 5-turbo and 5v for a week now and they are amazing. I am also impressed by qwen 3.6.

[-]

LegacyRemaster@reddit

Unfortunately, to make it run at at least 20 tokens/sec on 192 GB vram I would have to limit myself to IQ1... So a few percentage points above minimax or qwen are almost certainly lost in quantization.

[-]

Makers7886@reddit

Agreed, imo the best model right now for 192gb vram is qwen 3.5 122b FP8 via vLLM. Over 80 t/s solid and 220-240 with 6+ concurrent and 200k context. Every time I "stretch" for a large model I lose the speed, concurrency, and context in exchange for "checking out the big dog" which is simply not usable for real purposes or at least feels unusable because of all the cons.

[-]

ambient_temp_xeno@reddit

It's 1.9% better than gemma 4 31b on GPQA-Diamond.

I guess I'll use all that ram for gemma SWA checkpoints because I'm guessing I'd lose that 1.9% advantage running GLM 5.1 in IQ1.

[-]

Ok-Measurement-1575@reddit

So... Minimax is basically the best pound for pound LLM right now?

Where dem weights at? :D

pmttyji@reddit

Yes-Scale-9723@reddit

power97992@reddit

SourceCodeplz@reddit

Radiant_Hair_2739@reddit

Yes-Scale-9723@reddit

Radiant_Hair_2739@reddit

Yes-Scale-9723@reddit

Caffdy@reddit

pmttyji@reddit

Specter_Origin@reddit

NandaVegg@reddit

Remote_Rutabaga3963@reddit

atape_1@reddit

-dysangel-@reddit

pigeon57434@reddit

Edzomatic@reddit

Xisrr1@reddit

EndlessZone123@reddit

kaggleqrdl@reddit

LittleYouth4954@reddit

LegacyRemaster@reddit

Makers7886@reddit

ambient_temp_xeno@reddit

Ok-Measurement-1575@reddit