waruby

Man trains local model to detect and kill mosquitos with a laser

Posted by No_Information9314@reddit | LocalLLaMA | View on Reddit | 47 comments

I fine-tuned Cohere Transcribe to support diarization and timestamps

Posted by iamMess@reddit | LocalLLaMA | View on Reddit | 25 comments

[NEW] Supra-50M Released!

Posted by Dangerous_Try3619@reddit | LocalLLaMA | View on Reddit | 60 comments

Heretic has been served a legal notice by Meta, Inc.

Posted by -p-e-w-@reddit | LocalLLaMA | View on Reddit | 348 comments

mistralai/Mistral-Medium-3.5-128B · Hugging Face

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 322 comments

Deepseek V4 Flash and Non-Flash Out on HuggingFace

Posted by MichaelXie4645@reddit | LocalLLaMA | View on Reddit | 317 comments

Llama.cpp's auto fit works much better than I expected

Posted by a9udn9u@reddit | LocalLLaMA | View on Reddit | 75 comments

waruby@reddit

see the `-ctk` and `-ctv` command line options. If you compile the Rotorquant fork of llama.cpp you can do `-ctk planar3 -ctv turbo3` which give 10.3x compression of the KV cache for negligible loss in quality.

Built LazyMoE — run 120B LLMs on 8GB RAM with no GPU using lazy expert loading + TurboQuant

Posted by ReasonableRefuse4996@reddit | LocalLLaMA | View on Reddit | 47 comments

Do LLM generate meaning, or do they merely produce the form of meaning?

Posted by ParadoxeParade@reddit | LocalLLaMA | View on Reddit | 8 comments

Gemma 4 has been released

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 702 comments

waruby@reddit

I got one too and I feel you, but what is worth considering is that the massive VRAM means that you can give these models several context windows at once to several agents that can run in parallel, increasing your tokens/seconds/agent. I'll try it with claw-code.

Is 1-bit and TurboQuant the future of OSS? A simulation for Qwen3.5 models.

Posted by GizmoR13@reddit | LocalLLaMA | View on Reddit | 84 comments

waruby@reddit

The latest paper from Deepseek kind of does that, and is orthogonal with MoE, so it further reduces the number of active parameters required for the same quality of answers from the model.

Any real alternative to Claude code?

Posted by FriendlyStory7@reddit | LocalLLaMA | View on Reddit | 70 comments

waruby@reddit

TurboQuantization does not make information disappear, even at 1bit per weight GLM5 needs more than 128GB VRAM, good luck consumers.

Why Most “Agentic AI” Architectures Are Failing, and Why the Missing Layer Is Middleware, Not Bigger Models...

Posted by nice2Bnice2@reddit | LocalLLaMA | View on Reddit | 7 comments

waruby@reddit

I too, do Bayesian posterior collapse weighting when I analyze your mom's fat ass sitting down on a chair. Bro, stop with all those nonsensical sciency-looking expressions or explain every single one of them, at least in a glossary.

What do you do, if you invent AGI? (seriously)

Posted by teachersecret@reddit | LocalLLaMA | View on Reddit | 186 comments

Google C2S-Scale 27B (based on Gemma) built with Yale generated a novel hypothesis about cancer cellular behavior - Model + resources are now on Hugging Face and GitHub

Posted by Nunki08@reddit | LocalLLaMA | View on Reddit | 38 comments

Looking for open-source tool to blur entire bodies by gender in videos/images

Posted by DayOk2@reddit | LocalLLaMA | View on Reddit | 17 comments