-_Apollo-_

Qwen3.6-27B KLDs - INTs and NVFPs

Posted by Phaelon74@reddit | LocalLLaMA | View on Reddit | 29 comments

-_Apollo-_@reddit

here's another interesting one if you have time/resources: [https://huggingface.co/QuantTrio/Qwen3.6-27B-AWQ-6Bit/tree/main](https://huggingface.co/QuantTrio/Qwen3.6-27B-AWQ-6Bit/tree/main)

Qwen3.6-27B KLDs - INTs and NVFPs

Posted by Phaelon74@reddit | LocalLLaMA | View on Reddit | 29 comments

This isn’t X this is Y needs to die

Posted by twnznz@reddit | LocalLLaMA | View on Reddit | 178 comments

Qwen 3.6 27B is a BEAST

Posted by AverageFormal9076@reddit | LocalLLaMA | View on Reddit | 343 comments

-_Apollo-_@reddit

Yeah I think all of us that have run 27b on 5090s know something is odd in your setup. But glad whatever it is works great and you’re happy with it, that’s all that really matters.

Qwen 3.6 27B is a BEAST

Posted by AverageFormal9076@reddit | LocalLLaMA | View on Reddit | 343 comments

Qwen 3.6 27B is a BEAST

Posted by AverageFormal9076@reddit | LocalLLaMA | View on Reddit | 343 comments

What is the current status with Turbo Quant?

Posted by kickerua@reddit | LocalLLaMA | View on Reddit | 78 comments

My first impression after testing Gemma 4 against Qwen 3.5

Posted by ConfidentDinner6648@reddit | LocalLLaMA | View on Reddit | 77 comments

I was able to build Claude Code from source and I'm attaching the instructions.

Posted by awfulalexey@reddit | LocalLLaMA | View on Reddit | 97 comments

-_Apollo-_@reddit

kinda curious to know if claude code's harness produces any meaningful differences in output when compared to qwen3.5 via vscode github copilot chat harness.

Nemotron 3 Super - large quality difference between llama.cpp and vLLM?

Posted by BigStupidJellyfish_@reddit | LocalLLaMA | View on Reddit | 24 comments

Do not use mixed KV cache quantization

Posted by L3tum@reddit | LocalLLaMA | View on Reddit | 22 comments

Do 2B models have practical use cases, or are they just toys for now?

Posted by Civic_Hactivist_86@reddit | LocalLLaMA | View on Reddit | 85 comments

Google TurboQuant running Qwen Locally on MacAir

Posted by gladkos@reddit | LocalLLaMA | View on Reddit | 201 comments

Dual DGX Sparks vs Mac Studio M3 Ultra 512GB: Running Qwen3.5 397B locally on both. Here's what I found.

Posted by trevorbg@reddit | LocalLLaMA | View on Reddit | 243 comments

-_Apollo-_@reddit

Out of curiosity and no pressure: I don’t use Claude so idk, but ballpark how long would it take to break even on the purchases by dropping the cloud expense? I know this doesn’t take into consideration the other benefits/expenses of going local. Just wondering.

Best way to sell a RTX6000 Pro Blackwell?

Posted by BF3magic@reddit | LocalLLaMA | View on Reddit | 53 comments

Kimi K2.5 knows to wait for apps to load by taking screenshots continuously

Posted by No-Compote-6794@reddit | LocalLLaMA | View on Reddit | 17 comments

Let's GO ! Qwen3.5-Claude-4.6-Opus-Reasoning-Distilled-v2

Posted by Familiar_Wish1132@reddit | LocalLLaMA | View on Reddit | 77 comments

Gwen3.5-27b 8 bit vs 16 bit, 10 runs

Posted by Baldur-Norddahl@reddit | LocalLLaMA | View on Reddit | 68 comments

Best Private and Local Only Coding Agent?

Posted by scarlettwidow2024@reddit | LocalLLaMA | View on Reddit | 45 comments

-_Apollo-_@reddit

This is the sweet spot if you’re not completely vram starved. Only beat out by vscode with the qwen extension if you’re using a qwen3.5 model. It was trained on its own tool names so works a little better.

Microsoft DebugMCP - VS Code extension we developed that empowers AI Agents with real debugging capabilities

Posted by RealRace7@reddit | LocalLLaMA | View on Reddit | 13 comments

Microsoft DebugMCP - VS Code extension we developed that empowers AI Agents with real debugging capabilities

Posted by RealRace7@reddit | LocalLLaMA | View on Reddit | 13 comments

Omnicoder-9b SLAPS in Opencode

Posted by True_Requirement_891@reddit | LocalLLaMA | View on Reddit | 75 comments

-_Apollo-_@reddit

I tried the full size omnicoder just to see, same issues in roocode and copilot chat. Was using the recommended temperature etc too. Dunno.

My most useful OpenClaw workflow so far

Posted by mescalan@reddit | LocalLLaMA | View on Reddit | 82 comments

OmniCoder-9B | 9B coding agent fine-tuned on 425K agentic trajectories

Posted by DarkArtsMastery@reddit | LocalLLaMA | View on Reddit | 146 comments

OmniCoder-9B | 9B coding agent fine-tuned on 425K agentic trajectories

Posted by DarkArtsMastery@reddit | LocalLLaMA | View on Reddit | 146 comments

CodeGraphContext - An MCP server that converts your codebase into a graph database, enabling AI assistants and humans to retrieve precise, structured context

Posted by Desperate-Ad-9679@reddit | LocalLLaMA | View on Reddit | 40 comments

update your llama.cpp - great tg speedup on Qwen3.5 / Qwen-Next

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 104 comments

Final Qwen3.5 Unsloth GGUF Update!

Posted by danielhanchen@reddit | LocalLLaMA | View on Reddit | 281 comments

PSA: Humans are scary stupid

Posted by rm-rf-rm@reddit | LocalLLaMA | View on Reddit | 204 comments

Qwen3.5-27B Q4 Quantization Comparison

Posted by TitwitMuffbiscuit@reddit | LocalLLaMA | View on Reddit | 116 comments

Qwen3.5-27B Q4 Quantization Comparison

Posted by TitwitMuffbiscuit@reddit | LocalLLaMA | View on Reddit | 116 comments

Follow-up: Qwen3.5-35B-A3B — 7 community-requested experiments on RTX 5080 16GB

Posted by gaztrab@reddit | LocalLLaMA | View on Reddit | 192 comments

-_Apollo-_@reddit

both bartowski and unsloth just updated their available 27b models. Qwen3.5 small models dropped today. Looking forward to future updates if you are so inclined. Thank you!

Breaking : The small qwen3.5 models have been dropped

Posted by Illustrious-Swim9663@reddit | LocalLLaMA | View on Reddit | 334 comments

Breaking : The small qwen3.5 models have been dropped

Posted by Illustrious-Swim9663@reddit | LocalLLaMA | View on Reddit | 334 comments

are you ready for small Qwens?

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 174 comments

-_Apollo-_@reddit

Very cool, it might be the smaller size of the 4_0 or maybe an issue on my end. The qwen3.5 27b 4_k_l on my system is 18.41 gigs. I run in a windows 11 environment on LM studio. No k or v cache quant enabled. I can’t max out the context without it spilling into vram. Either way, pretty happy with it.

President Trump orders ALL Federal agencies in the US Government to immediately stop using Anthropic's technology.

Posted by External_Mood4719@reddit | LocalLLaMA | View on Reddit | 276 comments

President Trump orders ALL Federal agencies in the US Government to immediately stop using Anthropic's technology.

Posted by External_Mood4719@reddit | LocalLLaMA | View on Reddit | 276 comments

are you ready for small Qwens?

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 174 comments

-_Apollo-_@reddit

Can confirm. I can squeeze out a 170k context length with the 27b model and have it all sit in 32gb of vram. I think I’m on the bartowski q4_k_L or M. Either way, works great in a coding harness. Actually having trouble deciding what worked better for my setup irl usage, this or the coder next 80b moe with expert layers offloaded to system ram.

You can use Qwen3.5 without thinking

Posted by guiopen@reddit | LocalLLaMA | View on Reddit | 86 comments

-_Apollo-_@reddit

I'm guessing this doesn't apply to api use through lm studio? What about setting the thinking toggle to off in the inference settings in LM studio?

Qwen3.5 thinking for too long

Posted by SquirrelEStuff@reddit | LocalLLaMA | View on Reddit | 35 comments

-_Apollo-_@reddit

yeah; same issues. And its not just the hello prompts. it has had trouble completing real world amateur coding tasks as well. Also via LM studio. Not sure we've missing some kind of patch or what cuz this does not seem to be the universal opinion.

Qwen3.5 Model Comparison: 27B vs 35B on RTX 4090

Posted by jaigouk@reddit | LocalLLaMA | View on Reddit | 72 comments

Qwen3.5 Model Comparison: 27B vs 35B on RTX 4090

Posted by jaigouk@reddit | LocalLLaMA | View on Reddit | 72 comments

-_Apollo-_@reddit

Honestly, I don’t know how you get 27b q4 to q6 to do anything right. It’s a struggle to get it doing anything on LM studio currently. Makes me wonder if the uploads broken or if LM studio is.

Qwen 3.5 craters on hard coding tasks — tested all Qwen3.5 models (And Codex 5.3) on 70 real repos so you don't have to.

Posted by hauhau901@reddit | LocalLLaMA | View on Reddit | 234 comments

Qwen3.5 27B is Match Made in Heaven for Size and Performance

Posted by Lopsided_Dot_4557@reddit | LocalLLaMA | View on Reddit | 114 comments

Qwen 3.5 craters on hard coding tasks — tested all Qwen3.5 models (And Codex 5.3) on 70 real repos so you don't have to.

Posted by hauhau901@reddit | LocalLLaMA | View on Reddit | 234 comments

Qwen 3.5 craters on hard coding tasks — tested all Qwen3.5 models (And Codex 5.3) on 70 real repos so you don't have to.

Posted by hauhau901@reddit | LocalLLaMA | View on Reddit | 234 comments

Qwen 3.5 craters on hard coding tasks — tested all Qwen3.5 models (And Codex 5.3) on 70 real repos so you don't have to.

Posted by hauhau901@reddit | LocalLLaMA | View on Reddit | 234 comments

Qwen 3.5 craters on hard coding tasks — tested all Qwen3.5 models (And Codex 5.3) on 70 real repos so you don't have to.

Posted by hauhau901@reddit | LocalLLaMA | View on Reddit | 234 comments

-_Apollo-_@reddit

Thank you for the effort put into this and for sharing your data. Surprised that in your suite, qwen3 coder outperforms Qwen3 Coder Next [Q4_K_XL]. I’m curious, if you have resources/time later could you test unsloth/Qwen3 Coder Next [Q4_K_XL]-UD with their recommended tool calling settings in LM studio? https://huggingface.co/unsloth/Qwen3-Coder-Next-GGUF/blob/main/Qwen3-Coder-Next-UD-Q4_K_XL.gguf

Qwen3.5-35B-A3B is a gamechanger for agentic coding.

Posted by jslominski@reddit | LocalLLaMA | View on Reddit | 410 comments

Qwen Code - a powerful open-source coding agent + NO TELEMETRY FORK

Posted by Undici77@reddit | LocalLLaMA | View on Reddit | 48 comments