waruby

see the `-ctk` and `-ctv` command line options. If you compile the Rotorquant fork of llama.cpp you can do `-ctk planar3 -ctv turbo3` which give 10.3x compression of the KV cache for negligible loss in quality.

Built LazyMoE — run 120B LLMs on 8GB RAM with no GPU using lazy expert loading + TurboQuant

Posted by ReasonableRefuse4996@reddit | LocalLLaMA | View on Reddit | 47 comments

[-]

waruby@reddit

It's going to be in seconds per token.

Do LLM generate meaning, or do they merely produce the form of meaning?

Posted by ParadoxeParade@reddit | LocalLLaMA | View on Reddit | 8 comments

[-]

waruby@reddit

https://preview.redd.it/n8ync2nstwsg1.jpeg?width=488&format=pjpg&auto=webp&s=190ec2ad7ff75cc043b3438d69e0155baa885576

Gemma 4 has been released

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 702 comments

[-]

waruby@reddit

I got one too and I feel you, but what is worth considering is that the massive VRAM means that you can give these models several context windows at once to several agents that can run in parallel, increasing your tokens/seconds/agent. I'll try it with claw-code.

Is 1-bit and TurboQuant the future of OSS? A simulation for Qwen3.5 models.

Posted by GizmoR13@reddit | LocalLLaMA | View on Reddit | 84 comments

[-]

waruby@reddit

The latest paper from Deepseek kind of does that, and is orthogonal with MoE, so it further reduces the number of active parameters required for the same quality of answers from the model.

Any real alternative to Claude code?

Posted by FriendlyStory7@reddit | LocalLLaMA | View on Reddit | 70 comments

[-]

waruby@reddit

TurboQuantization does not make information disappear, even at 1bit per weight GLM5 needs more than 128GB VRAM, good luck consumers.

Why Most “Agentic AI” Architectures Are Failing, and Why the Missing Layer Is Middleware, Not Bigger Models...

Posted by nice2Bnice2@reddit | LocalLLaMA | View on Reddit | 7 comments

[-]

waruby@reddit

I too, do Bayesian posterior collapse weighting when I analyze your mom's fat ass sitting down on a chair. Bro, stop with all those nonsensical sciency-looking expressions or explain every single one of them, at least in a glossary.

What do you do, if you invent AGI? (seriously)

Posted by teachersecret@reddit | LocalLLaMA | View on Reddit | 186 comments

[-]

waruby@reddit

This post will make you spied on by every single secret service in the world

Google C2S-Scale 27B (based on Gemma) built with Yale generated a novel hypothesis about cancer cellular behavior - Model + resources are now on Hugging Face and GitHub

Posted by Nunki08@reddit | LocalLLaMA | View on Reddit | 38 comments

[-]

waruby@reddit

The hypothesis : "Cancerous cells are hurtful to the rest of the organism".

Looking for open-source tool to blur entire bodies by gender in videos/images

Posted by DayOk2@reddit | LocalLLaMA | View on Reddit | 17 comments

[-]

waruby@reddit

Did yOu waNT yoUR ToOl to just ASSumE pOEpLe's gEndEr ?!?