pmttyji
-
llama: limit max outputs of `llama_context` by am17an · Pull Request #23861 · ggml-org/llama.cpp
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 44 comments
-
Mellum & Granite Embedding models are ready on llama.cpp
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 1 comments
-
Open Models - May 2026
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 27 comments
-
Model: Support Step3.7-Flash by forforever73 · Pull Request #23845 · ggml-org/llama.cpp
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 6 comments
-
nvidia/Qwen3.6-35B-A3B-NVFP4 · Hugging Face
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 51 comments
-
OpenMOSS-Team/MOSS-TTS-v1.5 · Hugging Face
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 19 comments
-
We need some polls on many topics - 2026
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 13 comments
-
model : add support for talkie-1930-13b by niklassheth · Pull Request #22596 · ggml-org/llama.cpp
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 7 comments
-
OSCAR RotationZoo - Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 8 comments
-
Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 27 comments
-
Is Granite-4.1-30b Overshadowed by Qwen3.6 & Gemma4 models?
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 69 comments
-
Next year we're getting 0.5T model from Grok
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 200 comments
-
CUDA: add fast walsh-hadamard transform by am17an · Pull Request #23615 · ggml-org/llama.cpp
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 16 comments
-
Jackrong/Qwopus3.5-9B-Coder-GGUF · Hugging Face
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 72 comments
-
Add MiniCPM5 tokenizer support by zhangtao2-1 · Pull Request #23384 · ggml-org/llama.cpp
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 0 comments
-
Compilation of recent findings which could save some memory on increase performance
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 14 comments
-
meituan-longcat/LongCat-Video-Avatar-1.5 · Hugging Face
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 15 comments
-
TurboQuant isn’t just for KV: Qwen3.5-27B at near-Q4_0 quality, about 10% smaller, and finally fitting on my 16GB 5060 Ti
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 161 comments
-
numind/NuExtract3 · Hugging Face
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 6 comments
-
ByteDance-Seed/Cola-DLM · Hugging Face
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 10 comments
-
Open Models - April 2026 - One of the best months of all time for Local LLMs?
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 153 comments
-
LlamaStation v0.9 — llama.cpp GUI for Windows with multi-backend support, TurboQuant, MTP and more
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 21 comments
-
Sarvam-30b-quantized - Need 1-bit version GGUF
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 26 comments
-
Finalizing my New Desktop Rig 96GB VRAM + 128GB RAM - 3rd GPU possible in this setup?
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 47 comments
-
LlamaStation v0.9 — llama.cpp GUI for Windows with multi-backend support, TurboQuant, MTP and more
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 0 comments
-
CohereLabs/command-a-plus-05-2026-bf16 · Hugging Face
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 6 comments
-
Finalizing my New Desktop Rig 144 GB VRAM + 128GB RAM - 3rd GPU possible in this setup?
Posted by pmttyji@reddit | buildapc | View on Reddit | 9 comments
-
Thinking of getting two NVIDIA RTX Pro 4000 Blackwell (2x24 = 48GB), Any cons?
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 92 comments
-
How many of you tried BeeLlama.cpp? How's it? Agentic coding possible with 8GB VRAM?
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 65 comments
-
Why some Github projects only support wrappers instead of llama.cpp?
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 34 comments
-
internlm/Intern-S2-Preview · Hugging Face
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 11 comments
-
MagenticLite is here: A full-stack agentic experience powered by Small Models - Fara-1.5 4B, 9B & 27B
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 6 comments
-
sensenova/SenseNova-U1-A3B-MoT · Hugging Face
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 5 comments
-
New models possibly from Baidu (ERNIE) this month?
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 14 comments
-
AIDC-AI/Ovis2.6-80B-A3B · Hugging Face
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 28 comments
-
A Dark-Money Campaign Is Paying Influencers to Frame Chinese AI as a Threat
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 171 comments
-
llama.cpp - Custom Optimized Builds?
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 20 comments
-
Poor GPU Club : Tried Bonsai-8B on CPU & CUDA
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 36 comments
-
inclusionAI/Ling-2.6-1T · Hugging Face
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 16 comments
-
Peanut - Text to Image Model (Open Weights coming soon)
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 27 comments
-
Experts-Volunteers needed for Vulkan on ik_llama.cpp
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 13 comments
-
GitHub - warpdotdev/warp: Warp is an agentic development environment, born out of the terminal.
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 2 comments
-
Something from Mistral (Vibe) tomorrow
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 92 comments
-
Remote agents in Vibe. Powered by Mistral Medium 3.5.
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 0 comments
-
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-Unify Architecture
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 2 comments
-
Recent Open models from last 6 Months - Nov 2025 - Apr 2026
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 45 comments
-
Ternary Bonsai: Top intelligence at 1.58 bits
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 89 comments
-
ggml-cpu: Optimized x86 and generic cpu q1_0 dot (follow up) by pl752 · Pull Request #21636 · ggml-org/llama.cpp
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 3 comments
-
Excerpts from *The Technological Republic*
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 0 comments
-
Mixture-of-Depths Attention - arXiv
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 1 comments