az226

Why send traffic through more devices/extra hops/extra congestion when fewer devices work just as well (and better). Spines are needed in massive synchronous GPU cluster training, not needed in inference.

Zai replaced the network architecture running GLM-5.1 inference and the gains are pretty wild

Posted by Scared-Biscotti2287@reddit | LocalLLaMA | View on Reddit | 71 comments

[-]

az226@reddit

I’ve created my own GPU cluster and this seems obvious to me. Spines really are for training clusters, but if you want inference, it actively hurts. You only need leaves.

The Greek Nonverbal “No”: an upward head nod, paired with a brief eyebrow raise and upward eye roll, accompanied by a dental click (“tsou”). Does this gesture exist in your country?

Posted by freddo_expresso@reddit | AskBalkans | View on Reddit | 157 comments

[-]

az226@reddit

Huuooooouuupff.

The Greek Nonverbal “No”: an upward head nod, paired with a brief eyebrow raise and upward eye roll, accompanied by a dental click (“tsou”). Does this gesture exist in your country?

Posted by freddo_expresso@reddit | AskBalkans | View on Reddit | 157 comments

[-]

az226@reddit

In northern Sweden there is the opposite (meaning yes) where you suck in air like you’re opening your mouth to whistle but suck in air instead like a straw but it makes and audible whoop sound.

Deepseek V4's 1M context window: the breaking point

Posted by TangeloOk9486@reddit | LocalLLaMA | View on Reddit | 41 comments

[-]

az226@reddit

Opus becomes a headless chicken after 300k

China modded GPU (eg. 4090 48gb) --> I'm gonna figure it out. IS THERE NO ONE ELSE CURIOUS??

Posted by LeatherRub7248@reddit | LocalLLaMA | View on Reddit | 140 comments

[-]

az226@reddit

https://www.techpowerup.com/vgabios/278392/278392 https://www.techpowerup.com/vgabios/274124/274124 Hmm.

China modded GPU (eg. 4090 48gb) --> I'm gonna figure it out. IS THERE NO ONE ELSE CURIOUS??

Posted by LeatherRub7248@reddit | LocalLLaMA | View on Reddit | 140 comments

[-]

az226@reddit

Do you do vram swaps?

China modded GPU (eg. 4090 48gb) --> I'm gonna figure it out. IS THERE NO ONE ELSE CURIOUS??

Posted by LeatherRub7248@reddit | LocalLLaMA | View on Reddit | 140 comments

[-]

az226@reddit

H100 32gb?

China modded GPU (eg. 4090 48gb) --> I'm gonna figure it out. IS THERE NO ONE ELSE CURIOUS??

Posted by LeatherRub7248@reddit | LocalLLaMA | View on Reddit | 140 comments

[-]

az226@reddit

Was a pain in the ass to try to get a refund had to do a chargeback

China modded GPU (eg. 4090 48gb) --> I'm gonna figure it out. IS THERE NO ONE ELSE CURIOUS??

Posted by LeatherRub7248@reddit | LocalLLaMA | View on Reddit | 140 comments

[-]

az226@reddit

Do you swap the vbios or keep whatever the card had?

China modded GPU (eg. 4090 48gb) --> I'm gonna figure it out. IS THERE NO ONE ELSE CURIOUS??

Posted by LeatherRub7248@reddit | LocalLLaMA | View on Reddit | 140 comments

[-]

az226@reddit

False

China modded GPU (eg. 4090 48gb) --> I'm gonna figure it out. IS THERE NO ONE ELSE CURIOUS??

Posted by LeatherRub7248@reddit | LocalLLaMA | View on Reddit | 140 comments

[-]

az226@reddit

Would you be surprised to know the vbios is exactly the same as the 4090?

What do you guys think of Greek history according to modern day Hollywood?

Posted by Starfalloss@reddit | AskBalkans | View on Reddit | 158 comments

[-]

az226@reddit

We’re tired boss

What do you guys think of Greek history according to modern day Hollywood?

Posted by Starfalloss@reddit | AskBalkans | View on Reddit | 158 comments

[-]

az226@reddit

Zeushuan. Cleopatrisha. Helen of Detroit. Tyroneilles.

Software FP8 for GPUs without hardware support - 3x speedup on memory-bound operations

Posted by Venom1806@reddit | LocalLLaMA | View on Reddit | 61 comments

[-]

az226@reddit

r/ConfidentlyIncorrect. 1) I didn’t say “only added” I said added. And I said basically. I was simplifying the key changes. The ones that mattered. 2) Vera is a CPU, Rubin is a GPU. 3) There is no record of RTX 60 series, it’s is speculative at this point. But thanks for trying to correct something that wasn’t incorrect.

Is Lepa Brena the “Michael Jackson” of the Balkans/Eastern Europe in terms of fame?

Posted by tipoftheiceberg1234@reddit | AskBalkans | View on Reddit | 308 comments

[-]

az226@reddit

She is very well known among Yugoslavians, more like 80s/90s popularity, not so much any more. Also not MJ level fame/intensity. More like a Jennifer Lopez.

Why isn't ebay doing anything to stop those scams?

Posted by KillerMiller13@reddit | LocalLLaMA | View on Reddit | 142 comments

[-]

az226@reddit

And at a price well below market. It’s kind of insane. Execs are so out of touch.

what’s actually stopping an insider from leaking model weights?

Posted by itsArmanJr@reddit | LocalLLaMA | View on Reddit | 127 comments

[-]

az226@reddit

That’s assuming the OS isn’t locked down on file transfers to external drives. Big if.

what’s actually stopping an insider from leaking model weights?

Posted by itsArmanJr@reddit | LocalLLaMA | View on Reddit | 127 comments

[-]

az226@reddit

And something like Mythos is 6TB.

what’s actually stopping an insider from leaking model weights?

Posted by itsArmanJr@reddit | LocalLLaMA | View on Reddit | 127 comments

[-]

az226@reddit

Mythos checkpoint would be a gift to humanity.

Did I just destroy a brand new motherboard?

Posted by life_coaches@reddit | LocalLLaMA | View on Reddit | 62 comments

[-]

az226@reddit

The worst that can happen if you are hitting your NVMe with max/massive loads for extended periods of time it might trigger its firmware to slow it down/become non responsive for moments to cool off, and that might wear it down faster. But the scratches have a negligible difference for the cooling. For most use, you will be fine, you will never notice this.

DFlash: Block Diffusion for Flash Speculative Decoding.

Posted by Total-Resort-3120@reddit | LocalLLaMA | View on Reddit | 127 comments

[-]

az226@reddit

“We will also open-source the training recipe soon, so you can train your own DFlash draft model to accelerate any LLM.” Hope they actually do it.

OpenAI, Anthropic, Google Unite to Combat Model Copying in China

Posted by External_Mood4719@reddit | LocalLLaMA | View on Reddit | 150 comments

[-]

az226@reddit

Agreed. Claude is also the one that has the highest ratio of visible reasoning tokens to hidden reasoning tokens.

Qwen3.6-Plus

Posted by Nunki08@reddit | LocalLLaMA | View on Reddit | 226 comments

[-]

az226@reddit

So charts look better

Why exactly can't we use the techniques in TurboQuant on the model's quantizations themselves?

Posted by ea_nasir_official_@reddit | LocalLLaMA | View on Reddit | 33 comments

[-]

az226@reddit

Exactly. So why not release the code?

Why exactly can't we use the techniques in TurboQuant on the model's quantizations themselves?

Posted by ea_nasir_official_@reddit | LocalLLaMA | View on Reddit | 33 comments

[-]

az226@reddit

The whole point of a method is to be able to apply it to any model.

Why exactly can't we use the techniques in TurboQuant on the model's quantizations themselves?

Posted by ea_nasir_official_@reddit | LocalLLaMA | View on Reddit | 33 comments

[-]

az226@reddit

No code

Why exactly can't we use the techniques in TurboQuant on the model's quantizations themselves?

Posted by ea_nasir_official_@reddit | LocalLLaMA | View on Reddit | 33 comments

[-]

az226@reddit

You can. Someone already did it.

LocalLLaMA 2026

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 133 comments

[-]

az226@reddit

Or, Sahil did it! Pepperidge Farms remembers. (Reflection 70B lol).

OVH raises prices. My new offer is 55.1% higher starting April.

Posted by linkoid01@reddit | sysadmin | View on Reddit | 159 comments

[-]

az226@reddit

I’m month to month and they jacked up 70%

OVH raises prices. My new offer is 55.1% higher starting April.

Posted by linkoid01@reddit | sysadmin | View on Reddit | 159 comments

[-]

az226@reddit

And that’s assuming full replenishment I imagine.

OVH raises prices. My new offer is 55.1% higher starting April.

Posted by linkoid01@reddit | sysadmin | View on Reddit | 159 comments

[-]

az226@reddit

It’s opportunistic. Because customers who are not on OVH compare cost of buying servers or renting from them. As do existing customers. It’s how rents go up when mortgage rates go up. My OVH bill is going up 70% on April Fools. Looks like I am the fool.

NVIDIA 2026 Conference LIVE. New Base model coming!

Posted by last_llm_standing@reddit | LocalLLaMA | View on Reddit | 64 comments

[-]

az226@reddit

Nvidia is king of apples to oranges charts.

NVIDIA 2026 Conference LIVE. New Base model coming!

Posted by last_llm_standing@reddit | LocalLLaMA | View on Reddit | 64 comments

[-]

az226@reddit

Looks good when you compare to last generation of models.

DGX Station is available (via OEM distributors)

Posted by Temporary-Size7310@reddit | LocalLLaMA | View on Reddit | 130 comments

[-]

az226@reddit

Dell has always been overpriced.

55 → 282 tok/s: How I got Qwen3.5-397B running at speed on 4x RTX PRO 6000 Blackwell

Posted by lawdawgattorney@reddit | LocalLLaMA | View on Reddit | 104 comments

[-]

az226@reddit

Nvidia: this was intentional we just don’t want you to know about it.

President Trump orders ALL Federal agencies in the US Government to immediately stop using Anthropic's technology.

Posted by External_Mood4719@reddit | LocalLLaMA | View on Reddit | 276 comments

[-]

az226@reddit

I think domestic mass surveillance is what they wanted more.

President Trump orders ALL Federal agencies in the US Government to immediately stop using Anthropic's technology.

Posted by External_Mood4719@reddit | LocalLLaMA | View on Reddit | 276 comments

[-]

az226@reddit

They were so instrumental the government tried to force their hand, yet they’re so bad they must be stopped from being used. The contradiction is over 9000.