pmttyji

llama: limit max outputs of `llama_context` by am17an · Pull Request #23861 · ggml-org/llama.cpp

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 44 comments
Mellum & Granite Embedding models are ready on llama.cpp

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 1 comments
Open Models - May 2026

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 27 comments
Model: Support Step3.7-Flash by forforever73 · Pull Request #23845 · ggml-org/llama.cpp

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 6 comments
nvidia/Qwen3.6-35B-A3B-NVFP4 · Hugging Face

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 51 comments
OpenMOSS-Team/MOSS-TTS-v1.5 · Hugging Face

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 19 comments
We need some polls on many topics - 2026

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 13 comments
model : add support for talkie-1930-13b by niklassheth · Pull Request #22596 · ggml-org/llama.cpp

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 7 comments
OSCAR RotationZoo - Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 8 comments
Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 27 comments
Is Granite-4.1-30b Overshadowed by Qwen3.6 & Gemma4 models?

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 69 comments
Next year we're getting 0.5T model from Grok

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 200 comments
CUDA: add fast walsh-hadamard transform by am17an · Pull Request #23615 · ggml-org/llama.cpp

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 16 comments
Jackrong/Qwopus3.5-9B-Coder-GGUF · Hugging Face

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 72 comments
Add MiniCPM5 tokenizer support by zhangtao2-1 · Pull Request #23384 · ggml-org/llama.cpp

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 0 comments
Compilation of recent findings which could save some memory on increase performance

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 14 comments
meituan-longcat/LongCat-Video-Avatar-1.5 · Hugging Face

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 15 comments
TurboQuant isn’t just for KV: Qwen3.5-27B at near-Q4_0 quality, about 10% smaller, and finally fitting on my 16GB 5060 Ti

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 161 comments
numind/NuExtract3 · Hugging Face

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 6 comments
ByteDance-Seed/Cola-DLM · Hugging Face

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 10 comments
Open Models - April 2026 - One of the best months of all time for Local LLMs?

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 153 comments
LlamaStation v0.9 — llama.cpp GUI for Windows with multi-backend support, TurboQuant, MTP and more

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 21 comments
Sarvam-30b-quantized - Need 1-bit version GGUF

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 26 comments
Finalizing my New Desktop Rig 96GB VRAM + 128GB RAM - 3rd GPU possible in this setup?

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 47 comments
LlamaStation v0.9 — llama.cpp GUI for Windows with multi-backend support, TurboQuant, MTP and more

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 0 comments
CohereLabs/command-a-plus-05-2026-bf16 · Hugging Face

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 6 comments
Finalizing my New Desktop Rig 144 GB VRAM + 128GB RAM - 3rd GPU possible in this setup?

Posted by pmttyji@reddit | buildapc | View on Reddit | 9 comments
Thinking of getting two NVIDIA RTX Pro 4000 Blackwell (2x24 = 48GB), Any cons?

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 92 comments
How many of you tried BeeLlama.cpp? How's it? Agentic coding possible with 8GB VRAM?

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 65 comments
Why some Github projects only support wrappers instead of llama.cpp?

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 34 comments
internlm/Intern-S2-Preview · Hugging Face

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 11 comments
MagenticLite is here: A full-stack agentic experience powered by Small Models - Fara-1.5 4B, 9B & 27B

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 6 comments
sensenova/SenseNova-U1-A3B-MoT · Hugging Face

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 5 comments
New models possibly from Baidu (ERNIE) this month?

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 14 comments
AIDC-AI/Ovis2.6-80B-A3B · Hugging Face

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 28 comments
A Dark-Money Campaign Is Paying Influencers to Frame Chinese AI as a Threat

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 171 comments
llama.cpp - Custom Optimized Builds?

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 20 comments
Poor GPU Club : Tried Bonsai-8B on CPU & CUDA

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 36 comments
inclusionAI/Ling-2.6-1T · Hugging Face

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 16 comments
Peanut - Text to Image Model (Open Weights coming soon)

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 27 comments
Experts-Volunteers needed for Vulkan on ik_llama.cpp

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 13 comments
GitHub - warpdotdev/warp: Warp is an agentic development environment, born out of the terminal.

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 2 comments
Something from Mistral (Vibe) tomorrow

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 92 comments
Remote agents in Vibe. Powered by Mistral Medium 3.5.

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 0 comments
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-Unify Architecture

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 2 comments
Recent Open models from last 6 Months - Nov 2025 - Apr 2026

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 45 comments
Ternary Bonsai: Top intelligence at 1.58 bits

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 89 comments
ggml-cpu: Optimized x86 and generic cpu q1_0 dot (follow up) by pl752 · Pull Request #21636 · ggml-org/llama.cpp

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 3 comments
Excerpts from *The Technological Republic*

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 0 comments
Mixture-of-Depths Attention - arXiv

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 1 comments