yeah-ok
Me visiting this sub
Posted by Scutoidzz@reddit | LocalLLaMA | View on Reddit | 139 comments
More Gemma 4 models incoming
Posted by Deep-Vermicelli-4591@reddit | LocalLLaMA | View on Reddit | 165 comments
yeah-ok@reddit
llama: use f16 mask for FA to save VRAM by am17an · Pull Request #23764 · ggml-org/llama.cpp
Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 78 comments
yeah-ok@reddit
I implemented Laguna (XS.2) as a model in Llama.cpp
Posted by linuxid10t@reddit | LocalLLaMA | View on Reddit | 10 comments
yeah-ok@reddit
llama.cpp server have built-in native tools (exec_shell, edit_file, etc.)
Posted by srigi@reddit | LocalLLaMA | View on Reddit | 49 comments
yeah-ok@reddit
magic incantation to get llama-bench to work with MTP ?
Posted by jdchmiel@reddit | LocalLLaMA | View on Reddit | 8 comments
yeah-ok@reddit
Quick note on sudden performance loss when running GGUFs
Posted by yeah-ok@reddit | LocalLLaMA | View on Reddit | 5 comments
yeah-ok@reddit (OP)
110 tok/s with 12GB VRAM on Qwen3.6 35B A3B and ik_llama.cpp
Posted by janvitos@reddit | LocalLLaMA | View on Reddit | 109 comments
yeah-ok@reddit
Qwen is cooking hard
Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 225 comments
yeah-ok@reddit
Qwen 3.7 droped on Qwen Chat
Posted by Foxiya@reddit | LocalLLaMA | View on Reddit | 221 comments
yeah-ok@reddit
RDNA3 Flash Attention fix just dropped by llama.cpp b9158
Posted by Bulky-Priority6824@reddit | LocalLLaMA | View on Reddit | 10 comments
yeah-ok@reddit
Is there a big gap between Q4 and Q6 on Qwen3.6?
Posted by vick2djax@reddit | LocalLLaMA | View on Reddit | 78 comments
yeah-ok@reddit
we really all are going to make it, aren't we? 2x3090 setup.
Posted by RedShiftedTime@reddit | LocalLLaMA | View on Reddit | 164 comments
yeah-ok@reddit
we really all are going to make it, aren't we? 2x3090 setup.
Posted by RedShiftedTime@reddit | LocalLLaMA | View on Reddit | 164 comments
yeah-ok@reddit
we really all are going to make it, aren't we? 2x3090 setup.
Posted by RedShiftedTime@reddit | LocalLLaMA | View on Reddit | 164 comments
yeah-ok@reddit
we really all are going to make it, aren't we? 2x3090 setup.
Posted by RedShiftedTime@reddit | LocalLLaMA | View on Reddit | 164 comments
yeah-ok@reddit
VS Code's new "Agents window" lets you use local AI models. Still requires an Internet connection and a Github Copilot plan (because we can't have nice things)
Posted by _wsgeorge@reddit | LocalLLaMA | View on Reddit | 73 comments
yeah-ok@reddit
MI50s Qwen 3.6 27B @52.8 tps TG @1569 tps PP (no MTP, no Quant)
Posted by ai-infos@reddit | LocalLLaMA | View on Reddit | 80 comments
yeah-ok@reddit
Decoupled Attention from Weights - Gemma 4 26B
Posted by yeah-ok@reddit | LocalLLaMA | View on Reddit | 26 comments
yeah-ok@reddit (OP)
Will there be any more Qwen3.6 series models?
Posted by cafedude@reddit | LocalLLaMA | View on Reddit | 102 comments
yeah-ok@reddit
Will there be any more Qwen3.6 series models?
Posted by cafedude@reddit | LocalLLaMA | View on Reddit | 102 comments
yeah-ok@reddit
Qwen3.6 35b-a3b 🤯
Posted by EffectiveMedium2683@reddit | LocalLLaMA | View on Reddit | 118 comments
yeah-ok@reddit
Decoupled Attention from Weights - Gemma 4 26B
Posted by yeah-ok@reddit | LocalLLaMA | View on Reddit | 26 comments
yeah-ok@reddit (OP)
How long for llama.cpp official support of MTP?
Posted by Manaberryio@reddit | LocalLLaMA | View on Reddit | 50 comments
yeah-ok@reddit
vLLM ROCm has been added to Lemonade as an experimental backend
Posted by jfowers_amd@reddit | LocalLLaMA | View on Reddit | 93 comments
yeah-ok@reddit
Decoupled Attention from Weights - Gemma 4 26B
Posted by yeah-ok@reddit | LocalLLaMA | View on Reddit | 26 comments
yeah-ok@reddit (OP)
Decoupled Attention from Weights - Gemma 4 26B
Posted by yeah-ok@reddit | LocalLLaMA | View on Reddit | 26 comments
yeah-ok@reddit (OP)
Decoupled Attention from Weights - Gemma 4 26B
Posted by yeah-ok@reddit | LocalLLaMA | View on Reddit | 26 comments
yeah-ok@reddit (OP)
Decoupled Attention from Weights - Gemma 4 26B
Posted by yeah-ok@reddit | LocalLLaMA | View on Reddit | 26 comments
yeah-ok@reddit (OP)
Heretic 1.3 released: Reproducible models, integrated benchmarking system, reduced peak VRAM usage, broader model support, and more
Posted by -p-e-w-@reddit | LocalLLaMA | View on Reddit | 80 comments
yeah-ok@reddit
PS5’s can now be hacked to run Linux - perhaps some potential for local inference?
Posted by Thrumpwart@reddit | LocalLLaMA | View on Reddit | 76 comments
yeah-ok@reddit
PS5’s can now be hacked to run Linux - perhaps some potential for local inference?
Posted by Thrumpwart@reddit | LocalLLaMA | View on Reddit | 76 comments
yeah-ok@reddit
Introducing the IBM Granite 4.1 family of models (3B/8B/30B)
Posted by abkibaarnsit@reddit | LocalLLaMA | View on Reddit | 40 comments
yeah-ok@reddit
I'm done with using local LLMs for coding
Posted by dtdisapointingresult@reddit | LocalLLaMA | View on Reddit | 810 comments
yeah-ok@reddit
Forgive my ignorance but how is a 27B model better than 397B?
Posted by No_Conversation9561@reddit | LocalLLaMA | View on Reddit | 286 comments
yeah-ok@reddit
The Karpathy Loop - can we please get this running on llama-server (pointed at it's own code base)?!
Posted by yeah-ok@reddit | LocalLLaMA | View on Reddit | 2 comments
yeah-ok@reddit (OP)
Web OS result from Qwen3.6 35B is by far the best I tested in my laptop
Posted by Idontknow3728@reddit | LocalLLaMA | View on Reddit | 24 comments
yeah-ok@reddit
Asking the pertinent local LLM questions: "Is he alive or dead, has he thoughts within his head?"
Posted by yeah-ok@reddit | LocalLLaMA | View on Reddit | 1 comments
yeah-ok@reddit (OP)
Speculative Decoding works great for Gemma 4 31B with E2B draft (+29% avg, +50% on code)
Posted by PerceptionGrouchy187@reddit | LocalLLaMA | View on Reddit | 117 comments
yeah-ok@reddit
GLM 5.1 crushes every other model except Opus in agentic benchmark at about 1/3 of the Opus cost
Posted by zylskysniper@reddit | LocalLLaMA | View on Reddit | 151 comments
yeah-ok@reddit
Marco-Mini (17.3B, 0.86B active) and Marco-Nano (8B, 0.6B active) by Alibaba
Posted by AnticitizenPrime@reddit | LocalLLaMA | View on Reddit | 51 comments
yeah-ok@reddit
Final voting results for Qwen 3.6
Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 285 comments
yeah-ok@reddit
Marco-Mini (17.3B, 0.86B active) and Marco-Nano (8B, 0.6B active) by Alibaba
Posted by AnticitizenPrime@reddit | LocalLLaMA | View on Reddit | 51 comments
yeah-ok@reddit
Benchmarks of Radeon 780M iGPU with shared 128GB DDR5 RAM running various MoE models under Llama.cpp
Posted by AzerbaijanNyan@reddit | LocalLLaMA | View on Reddit | 26 comments
yeah-ok@reddit
Tips: remember to use -np 1 with llama-server as a single user
Posted by ea_man@reddit | LocalLLaMA | View on Reddit | 44 comments
yeah-ok@reddit
Dual DGX Sparks vs Mac Studio M3 Ultra 512GB: Running Qwen3.5 397B locally on both. Here's what I found.
Posted by trevorbg@reddit | LocalLLaMA | View on Reddit | 243 comments
yeah-ok@reddit
Tips: remember to use -np 1 with llama-server as a single user
Posted by ea_man@reddit | LocalLLaMA | View on Reddit | 44 comments
yeah-ok@reddit
I haven't experienced Qwen3.5 (35B and 27B) over thinking. Posting my settings/prompt
Posted by wadeAlexC@reddit | LocalLLaMA | View on Reddit | 76 comments
yeah-ok@reddit
I haven't experienced Qwen3.5 (35B and 27B) over thinking. Posting my settings/prompt
Posted by wadeAlexC@reddit | LocalLLaMA | View on Reddit | 76 comments
yeah-ok@reddit
I haven't experienced Qwen3.5 (35B and 27B) over thinking. Posting my settings/prompt
Posted by wadeAlexC@reddit | LocalLLaMA | View on Reddit | 76 comments