TheaterFire

Login with Reddit

Currently browsing tags:

  • LocalLLaMA
  • Struggling with Qwen3.6 27B / 35B locally (3090) slow responses, breaking code looking for better setup + auto model switching

    Posted by Clean_Initial_9618@reddit | LocalLLaMA | View on Reddit | 2 comments

  • Prompt injection benchmark: delimiter + strict prompt took Gemma 4 from 21% to 100% defense rate (15 models, 6100+ tests)

    Posted by User_Deprecated@reddit | LocalLLaMA | View on Reddit | 4 comments

  • I made a voice controlled Tic-Tac-Toe game as a learning project

    Posted by dabiggmoe2@reddit | LocalLLaMA | View on Reddit | 1 comments

  • Qwen3.6 27B FP8 runs with 200k tokens of BF16 KV cache at 80 TPS on a single RTX 5000 PRO 48GB

    Posted by __JockY__@reddit | LocalLLaMA | View on Reddit | 62 comments

  • Interested in agents but clewless noob. Please help

    Posted by Silver-Champion-4846@reddit | LocalLLaMA | View on Reddit | 37 comments

  • AMD Strix Halo refresh with 192gb!

    Posted by mindwip@reddit | LocalLLaMA | View on Reddit | 142 comments

  • How much will it cost to host something like qwen3.6 35b a3b in a cloud?

    Posted by Euphoric_North_745@reddit | LocalLLaMA | View on Reddit | 140 comments

  • Follow-up: Qwen3.6-27B on 1× RTX 3090 — pushing to ~218K context + ~50–66 TPS, tool calls now stable (PN12 fix)

    Posted by AmazingDrivers4u@reddit | LocalLLaMA | View on Reddit | 65 comments

  • Running Qwen-3.6-35B-A3B locally is very slow

    Posted by Sad-Duck2812@reddit | LocalLLaMA | View on Reddit | 3 comments

  • Open source models are going to be the future on Cursor, OpenCode etc.

    Posted by _maverick98@reddit | LocalLLaMA | View on Reddit | 144 comments

  • Questions about revisiting local LLM roleplay.

    Posted by newbuildertfb@reddit | LocalLLaMA | View on Reddit | 12 comments

  • Local Rag SDK

    Posted by DetectiveMindless652@reddit | LocalLLaMA | View on Reddit | 10 comments

  • We are finally there: Qwen3.6-27B + agentic search; 95.7% SimpleQA on a single 3090, fully local

    Posted by ComplexIt@reddit | LocalLLaMA | View on Reddit | 84 comments

  • White House Considers Vetting A.I. Models Before They Are Released

    Posted by fallingdowndizzyvr@reddit | LocalLLaMA | View on Reddit | 365 comments

  • Benching local Qwen as a Codex validator, co-agent, and challenger

    Posted by robert896r1@reddit | LocalLLaMA | View on Reddit | 12 comments

  • Llama.cpp MTP support now in beta!

    Posted by ilintar@reddit | LocalLLaMA | View on Reddit | 231 comments

  • DeepSeek V4 Pro matches GPT-5.2 on FoodTruck Bench, our agentic benchmark — 10 weeks later, ~17× cheaper

    Posted by Disastrous_Theme5906@reddit | LocalLLaMA | View on Reddit | 15 comments

  • About Kimi K2.6

    Posted by Exact_Law_6489@reddit | LocalLLaMA | View on Reddit | 42 comments

  • Amd and Nvidia cards on same rig

    Posted by deathcom65@reddit | LocalLLaMA | View on Reddit | 4 comments

  • Qwen3.6-27B vs 35B, I prefer 35B but more people here post about 27B...

    Posted by Snoo_27681@reddit | LocalLLaMA | View on Reddit | 162 comments

  • Which model for 32GB M2 Max?

    Posted by segdy@reddit | LocalLLaMA | View on Reddit | 14 comments

  • Dual 3090 setup - performance optimization

    Posted by PaMRxR@reddit | LocalLLaMA | View on Reddit | 41 comments

  • Trying to train tiny LLMs on length constrained reddit posts summarization task using GRPO on 3xMac Minis - updates!

    Posted by East-Muffin-6472@reddit | LocalLLaMA | View on Reddit | 2 comments

  • Peanut - Text to Image Model (Open Weights coming soon)

    Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 12 comments

  • Secondary PC options

    Posted by UniqueIdentifier00@reddit | LocalLLaMA | View on Reddit | 8 comments

  • Advice needed on eGPU and Mini PC

    Posted by Kulidc@reddit | LocalLLaMA | View on Reddit | 19 comments

  • APEX MoE quants update: 25+ new models since the Qwen 3.5 post + new I-Nano tier

    Posted by mudler_it@reddit | LocalLLaMA | View on Reddit | 35 comments

  • A Dark-Money Campaign Is Paying Influencers to Frame Chinese AI as a Threat

    Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 162 comments

  • 1080 Ti in 2026 - 11GB is still (barely) enough to stay relevant

    Posted by srodland01@reddit | LocalLLaMA | View on Reddit | 9 comments

  • Strix Halo, Debian 13@6.16.12&6.17.8, Qwen3Coder-Q8 CTX<=131k, llama.cpp@Vulkan&ROCm, Power & Efficiency

    Posted by Educational_Sun_8813@reddit | LocalLLaMA | View on Reddit | 21 comments

  • Built a Voice Agents from Scratch GitHub tutorial: mic > Whisper > local LLM (GGUF) > Kokoro > speaker, fully local, no API keys

    Posted by purellmagents@reddit | LocalLLaMA | View on Reddit | 11 comments

  • MTPLX | 2.24x faster TPS | The native MTP inference engine for Apple Silicon

    Posted by YoussofAl@reddit | LocalLLaMA | View on Reddit | 30 comments

  • The more I use it, the more I'm impressed

    Posted by ComfyUser48@reddit | LocalLLaMA | View on Reddit | 88 comments

  • PFlash: 10x prefill speedup over llama.cpp at 128K on a RTX 3090

    Posted by sandropuppo@reddit | LocalLLaMA | View on Reddit | 90 comments

  • Why is no open weight model inference provider hosting Mimo-v2.5 or Mimo-v2.5-pro?

    Posted by True_Requirement_891@reddit | LocalLLaMA | View on Reddit | 30 comments

  • vibevoice.cpp: Microsoft VibeVoice (TTS + long-form ASR with diarization) ported to ggml/C++, runs on CPU/CUDA/Metal/Vulkan, no Python at inference

    Posted by mudler_it@reddit | LocalLLaMA | View on Reddit | 4 comments

  • Llama.cpp quantization is broken

    Posted by Ok-Importance-3529@reddit | LocalLLaMA | View on Reddit | 53 comments

  • qwen 3.6 27B looping problem

    Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 12 comments

  • As MTP prepares to land in llama.cpp, Models that support MTP

    Posted by segmond@reddit | LocalLLaMA | View on Reddit | 24 comments

  • vLLM Just Merged TurboQuant Fix for Qwen 3.5+

    Posted by havenoammo@reddit | LocalLLaMA | View on Reddit | 19 comments

  • A simple "hack" to speed up prompt processing for Qwen 3.5/3.6 in LM Studio

    Posted by GrungeWerX@reddit | LocalLLaMA | View on Reddit | 14 comments

  • Best Open Source Voice Cloning if you have lots of reference audio?

    Posted by SlaveToBuy@reddit | LocalLLaMA | View on Reddit | 20 comments

  • Anyone else struggling with multi-GPU stability when running larger local models?

    Posted by Lyceum_Tech@reddit | LocalLLaMA | View on Reddit | 21 comments

  • Do cheap 32GB V100s still make sense for homelab AI?

    Posted by SKX007J1@reddit | LocalLLaMA | View on Reddit | 50 comments

  • ROG Flow Z13 best laptop for local LLMs?

    Posted by Bombarding_@reddit | LocalLLaMA | View on Reddit | 36 comments

  • Qwen 3.6 27B BF16 vs Q4_K_M vs Q8_0 GGUF evaluation

    Posted by gvij@reddit | LocalLLaMA | View on Reddit | 152 comments

  • I made Soprano-80M: Stream ultra-realistic TTS in <15ms, up to 2000x realtime, and <1 GB VRAM, released under Apache 2.0!

    Posted by eugenekwek@reddit | LocalLLaMA | View on Reddit | 107 comments

  • FastDMS: 6.4X KV-cache compression running faster than vLLM BF16/FP8

    Posted by randomfoo2@reddit | LocalLLaMA | View on Reddit | 14 comments

  • Building on a LLM Quants Testing Site/Ressource - Sharing a few insights from first month, so you can share your thoughts and wishes for the future.

    Posted by norms_are_practical@reddit | LocalLLaMA | View on Reddit | 3 comments

  • First time GPU buyer. Got a RTX 5000 Pro. Was it a bad decision compared to two 3090s?

    Posted by Valuable-Run2129@reddit | LocalLLaMA | View on Reddit | 96 comments

Next