-
Struggling with Qwen3.6 27B / 35B locally (3090) slow responses, breaking code looking for better setup + auto model switching
Posted by Clean_Initial_9618@reddit | LocalLLaMA | View on Reddit | 2 comments
-
Prompt injection benchmark: delimiter + strict prompt took Gemma 4 from 21% to 100% defense rate (15 models, 6100+ tests)
Posted by User_Deprecated@reddit | LocalLLaMA | View on Reddit | 4 comments
-
I made a voice controlled Tic-Tac-Toe game as a learning project
Posted by dabiggmoe2@reddit | LocalLLaMA | View on Reddit | 1 comments
-
Qwen3.6 27B FP8 runs with 200k tokens of BF16 KV cache at 80 TPS on a single RTX 5000 PRO 48GB
Posted by __JockY__@reddit | LocalLLaMA | View on Reddit | 62 comments
-
Interested in agents but clewless noob. Please help
Posted by Silver-Champion-4846@reddit | LocalLLaMA | View on Reddit | 37 comments
-
AMD Strix Halo refresh with 192gb!
Posted by mindwip@reddit | LocalLLaMA | View on Reddit | 142 comments
-
How much will it cost to host something like qwen3.6 35b a3b in a cloud?
Posted by Euphoric_North_745@reddit | LocalLLaMA | View on Reddit | 140 comments
-
Follow-up: Qwen3.6-27B on 1× RTX 3090 — pushing to ~218K context + ~50–66 TPS, tool calls now stable (PN12 fix)
Posted by AmazingDrivers4u@reddit | LocalLLaMA | View on Reddit | 65 comments
-
Running Qwen-3.6-35B-A3B locally is very slow
Posted by Sad-Duck2812@reddit | LocalLLaMA | View on Reddit | 3 comments
-
Open source models are going to be the future on Cursor, OpenCode etc.
Posted by _maverick98@reddit | LocalLLaMA | View on Reddit | 144 comments
-
Questions about revisiting local LLM roleplay.
Posted by newbuildertfb@reddit | LocalLLaMA | View on Reddit | 12 comments
-
Local Rag SDK
Posted by DetectiveMindless652@reddit | LocalLLaMA | View on Reddit | 10 comments
-
We are finally there: Qwen3.6-27B + agentic search; 95.7% SimpleQA on a single 3090, fully local
Posted by ComplexIt@reddit | LocalLLaMA | View on Reddit | 84 comments
-
White House Considers Vetting A.I. Models Before They Are Released
Posted by fallingdowndizzyvr@reddit | LocalLLaMA | View on Reddit | 365 comments
-
Benching local Qwen as a Codex validator, co-agent, and challenger
Posted by robert896r1@reddit | LocalLLaMA | View on Reddit | 12 comments
-
Llama.cpp MTP support now in beta!
Posted by ilintar@reddit | LocalLLaMA | View on Reddit | 231 comments
-
DeepSeek V4 Pro matches GPT-5.2 on FoodTruck Bench, our agentic benchmark — 10 weeks later, ~17× cheaper
Posted by Disastrous_Theme5906@reddit | LocalLLaMA | View on Reddit | 15 comments
-
About Kimi K2.6
Posted by Exact_Law_6489@reddit | LocalLLaMA | View on Reddit | 42 comments
-
Amd and Nvidia cards on same rig
Posted by deathcom65@reddit | LocalLLaMA | View on Reddit | 4 comments
-
Qwen3.6-27B vs 35B, I prefer 35B but more people here post about 27B...
Posted by Snoo_27681@reddit | LocalLLaMA | View on Reddit | 162 comments
-
Which model for 32GB M2 Max?
Posted by segdy@reddit | LocalLLaMA | View on Reddit | 14 comments
-
Dual 3090 setup - performance optimization
Posted by PaMRxR@reddit | LocalLLaMA | View on Reddit | 41 comments
-
Trying to train tiny LLMs on length constrained reddit posts summarization task using GRPO on 3xMac Minis - updates!
Posted by East-Muffin-6472@reddit | LocalLLaMA | View on Reddit | 2 comments
-
Peanut - Text to Image Model (Open Weights coming soon)
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 12 comments
-
Secondary PC options
Posted by UniqueIdentifier00@reddit | LocalLLaMA | View on Reddit | 8 comments
-
Advice needed on eGPU and Mini PC
Posted by Kulidc@reddit | LocalLLaMA | View on Reddit | 19 comments
-
APEX MoE quants update: 25+ new models since the Qwen 3.5 post + new I-Nano tier
Posted by mudler_it@reddit | LocalLLaMA | View on Reddit | 35 comments
-
A Dark-Money Campaign Is Paying Influencers to Frame Chinese AI as a Threat
Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 162 comments
-
1080 Ti in 2026 - 11GB is still (barely) enough to stay relevant
Posted by srodland01@reddit | LocalLLaMA | View on Reddit | 9 comments
-
Strix Halo, Debian 13@6.16.12&6.17.8, Qwen3Coder-Q8 CTX<=131k, llama.cpp@Vulkan&ROCm, Power & Efficiency
Posted by Educational_Sun_8813@reddit | LocalLLaMA | View on Reddit | 21 comments
-
Built a Voice Agents from Scratch GitHub tutorial: mic > Whisper > local LLM (GGUF) > Kokoro > speaker, fully local, no API keys
Posted by purellmagents@reddit | LocalLLaMA | View on Reddit | 11 comments
-
MTPLX | 2.24x faster TPS | The native MTP inference engine for Apple Silicon
Posted by YoussofAl@reddit | LocalLLaMA | View on Reddit | 30 comments
-
The more I use it, the more I'm impressed
Posted by ComfyUser48@reddit | LocalLLaMA | View on Reddit | 88 comments
-
PFlash: 10x prefill speedup over llama.cpp at 128K on a RTX 3090
Posted by sandropuppo@reddit | LocalLLaMA | View on Reddit | 90 comments
-
Why is no open weight model inference provider hosting Mimo-v2.5 or Mimo-v2.5-pro?
Posted by True_Requirement_891@reddit | LocalLLaMA | View on Reddit | 30 comments
-
vibevoice.cpp: Microsoft VibeVoice (TTS + long-form ASR with diarization) ported to ggml/C++, runs on CPU/CUDA/Metal/Vulkan, no Python at inference
Posted by mudler_it@reddit | LocalLLaMA | View on Reddit | 4 comments
-
Llama.cpp quantization is broken
Posted by Ok-Importance-3529@reddit | LocalLLaMA | View on Reddit | 53 comments
-
qwen 3.6 27B looping problem
Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 12 comments
-
As MTP prepares to land in llama.cpp, Models that support MTP
Posted by segmond@reddit | LocalLLaMA | View on Reddit | 24 comments
-
vLLM Just Merged TurboQuant Fix for Qwen 3.5+
Posted by havenoammo@reddit | LocalLLaMA | View on Reddit | 19 comments
-
A simple "hack" to speed up prompt processing for Qwen 3.5/3.6 in LM Studio
Posted by GrungeWerX@reddit | LocalLLaMA | View on Reddit | 14 comments
-
Best Open Source Voice Cloning if you have lots of reference audio?
Posted by SlaveToBuy@reddit | LocalLLaMA | View on Reddit | 20 comments
-
Anyone else struggling with multi-GPU stability when running larger local models?
Posted by Lyceum_Tech@reddit | LocalLLaMA | View on Reddit | 21 comments
-
Do cheap 32GB V100s still make sense for homelab AI?
Posted by SKX007J1@reddit | LocalLLaMA | View on Reddit | 50 comments
-
ROG Flow Z13 best laptop for local LLMs?
Posted by Bombarding_@reddit | LocalLLaMA | View on Reddit | 36 comments
-
Qwen 3.6 27B BF16 vs Q4_K_M vs Q8_0 GGUF evaluation
Posted by gvij@reddit | LocalLLaMA | View on Reddit | 152 comments
-
I made Soprano-80M: Stream ultra-realistic TTS in <15ms, up to 2000x realtime, and <1 GB VRAM, released under Apache 2.0!
Posted by eugenekwek@reddit | LocalLLaMA | View on Reddit | 107 comments
-
FastDMS: 6.4X KV-cache compression running faster than vLLM BF16/FP8
Posted by randomfoo2@reddit | LocalLLaMA | View on Reddit | 14 comments
-
Building on a LLM Quants Testing Site/Ressource - Sharing a few insights from first month, so you can share your thoughts and wishes for the future.
Posted by norms_are_practical@reddit | LocalLLaMA | View on Reddit | 3 comments
-
First time GPU buyer. Got a RTX 5000 Pro. Was it a bad decision compared to two 3090s?
Posted by Valuable-Run2129@reddit | LocalLLaMA | View on Reddit | 96 comments