-_Apollo-_

Qwen3.6-27B KLDs - INTs and NVFPs

Posted by Phaelon74@reddit | LocalLLaMA | View on Reddit | 29 comments

[-]

-_Apollo-_@reddit

here's another interesting one if you have time/resources: [https://huggingface.co/QuantTrio/Qwen3.6-27B-AWQ-6Bit/tree/main](https://huggingface.co/QuantTrio/Qwen3.6-27B-AWQ-6Bit/tree/main)

Qwen3.6-27B KLDs - INTs and NVFPs

Posted by Phaelon74@reddit | LocalLLaMA | View on Reddit | 29 comments

[-]

This isn’t X this is Y needs to die

Posted by twnznz@reddit | LocalLLaMA | View on Reddit | 178 comments

[-]

-_Apollo-_@reddit

This is the source of truth. No Gotchas here.

Qwen 3.6 27B is a BEAST

Posted by AverageFormal9076@reddit | LocalLLaMA | View on Reddit | 343 comments

[-]

-_Apollo-_@reddit

Yeah I think all of us that have run 27b on 5090s know something is odd in your setup. But glad whatever it is works great and you’re happy with it, that’s all that really matters.

Qwen 3.6 27B is a BEAST

Posted by AverageFormal9076@reddit | LocalLLaMA | View on Reddit | 343 comments

[-]

Qwen 3.6 27B is a BEAST

Posted by AverageFormal9076@reddit | LocalLLaMA | View on Reddit | 343 comments

[-]

What is the current status with Turbo Quant?

Posted by kickerua@reddit | LocalLLaMA | View on Reddit | 78 comments

[-]

-_Apollo-_@reddit

lol we still don’t have mtp for qwen3.5 in llama.cpp. Some things move slow.

My first impression after testing Gemma 4 against Qwen 3.5

Posted by ConfidentDinner6648@reddit | LocalLLaMA | View on Reddit | 77 comments

[-]

-_Apollo-_@reddit

Yeah this is my experience as well. Sometimes doesn’t think enough when it’s in a tool harness like vscode/copilot chat

I was able to build Claude Code from source and I'm attaching the instructions.

Posted by awfulalexey@reddit | LocalLLaMA | View on Reddit | 97 comments

[-]

-_Apollo-_@reddit

kinda curious to know if claude code's harness produces any meaningful differences in output when compared to qwen3.5 via vscode github copilot chat harness.

Nemotron 3 Super - large quality difference between llama.cpp and vLLM?

Posted by BigStupidJellyfish_@reddit | LocalLLaMA | View on Reddit | 24 comments

[-]

-_Apollo-_@reddit

Did you also experience this with qwen 3.5 27b?

Do not use mixed KV cache quantization

Posted by L3tum@reddit | LocalLLaMA | View on Reddit | 22 comments

[-]

-_Apollo-_@reddit

Similar findings. Most models need you to use same settings for both the k and v cache

Do 2B models have practical use cases, or are they just toys for now?

Posted by Civic_Hactivist_86@reddit | LocalLLaMA | View on Reddit | 85 comments

[-]

-_Apollo-_@reddit

Whoa, what do you mean text standardization? Like transcriptions of their convos?

Google TurboQuant running Qwen Locally on MacAir

Posted by gladkos@reddit | LocalLLaMA | View on Reddit | 201 comments

[-]

-_Apollo-_@reddit

How many tokens could you fit without kvcache quant before? What about at q8 kvcache?

Dual DGX Sparks vs Mac Studio M3 Ultra 512GB: Running Qwen3.5 397B locally on both. Here's what I found.

Posted by trevorbg@reddit | LocalLLaMA | View on Reddit | 243 comments

[-]

Out of curiosity and no pressure: I don’t use Claude so idk, but ballpark how long would it take to break even on the purchases by dropping the cloud expense? I know this doesn’t take into consideration the other benefits/expenses of going local. Just wondering.

Best way to sell a RTX6000 Pro Blackwell?

Posted by BF3magic@reddit | LocalLLaMA | View on Reddit | 53 comments

[-]

-_Apollo-_@reddit

https://reddit.com/r/LocalLLaMA/comments/1s3v8ni/beware_of_scams_scammed_by_reddit_user/ A cautionary tale about buying from Reddit users

Kimi K2.5 knows to wait for apps to load by taking screenshots continuously

Posted by No-Compote-6794@reddit | LocalLLaMA | View on Reddit | 17 comments

[-]

-_Apollo-_@reddit

Any local models decent enough for computer use?

Let's GO ! Qwen3.5-Claude-4.6-Opus-Reasoning-Distilled-v2

Posted by Familiar_Wish1132@reddit | LocalLLaMA | View on Reddit | 77 comments

[-]

-_Apollo-_@reddit

Looks like 27b is there too.

Gwen3.5-27b 8 bit vs 16 bit, 10 runs

Posted by Baldur-Norddahl@reddit | LocalLLaMA | View on Reddit | 68 comments

[-]

-_Apollo-_@reddit

lol @ the faq! People really are sumtin huh? Thanks for the info!

Best Private and Local Only Coding Agent?

Posted by scarlettwidow2024@reddit | LocalLLaMA | View on Reddit | 45 comments

[-]

-_Apollo-_@reddit

This is the sweet spot if you’re not completely vram starved. Only beat out by vscode with the qwen extension if you’re using a qwen3.5 model. It was trained on its own tool names so works a little better.

Microsoft DebugMCP - VS Code extension we developed that empowers AI Agents with real debugging capabilities

Posted by RealRace7@reddit | LocalLLaMA | View on Reddit | 13 comments

[-]

-_Apollo-_@reddit

Very cool

Microsoft DebugMCP - VS Code extension we developed that empowers AI Agents with real debugging capabilities

Posted by RealRace7@reddit | LocalLLaMA | View on Reddit | 13 comments

[-]

-_Apollo-_@reddit

I haven’t downloaded it yet but if instructions are packaged with the Mcp it could bloat context. Might pair nicely with a skill file.

Omnicoder-9b SLAPS in Opencode

Posted by True_Requirement_891@reddit | LocalLLaMA | View on Reddit | 75 comments

[-]

-_Apollo-_@reddit

I tried the full size omnicoder just to see, same issues in roocode and copilot chat. Was using the recommended temperature etc too. Dunno.

My most useful OpenClaw workflow so far

Posted by mescalan@reddit | LocalLLaMA | View on Reddit | 82 comments

[-]

-_Apollo-_@reddit

Wow! Can it generate parametric models so I can build attachments and accessories around them?

OmniCoder-9B | 9B coding agent fine-tuned on 425K agentic trajectories

Posted by DarkArtsMastery@reddit | LocalLLaMA | View on Reddit | 146 comments

[-]

-_Apollo-_@reddit

Welcome :)

OmniCoder-9B | 9B coding agent fine-tuned on 425K agentic trajectories

Posted by DarkArtsMastery@reddit | LocalLLaMA | View on Reddit | 146 comments

[-]

-_Apollo-_@reddit

Copilot chat on vscode supports lmstudio through the oai extension so it should support your solution too no?

CodeGraphContext - An MCP server that converts your codebase into a graph database, enabling AI assistants and humans to retrieve precise, structured context

Posted by Desperate-Ad-9679@reddit | LocalLLaMA | View on Reddit | 40 comments

[-]

-_Apollo-_@reddit

I feel like I need this just for me, nevertheless the AI

update your llama.cpp - great tg speedup on Qwen3.5 / Qwen-Next

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 104 comments

[-]

-_Apollo-_@reddit

I gotta remember to see if this can be done in conjunction with lm studio

Final Qwen3.5 Unsloth GGUF Update!

Posted by danielhanchen@reddit | LocalLLaMA | View on Reddit | 281 comments

[-]

-_Apollo-_@reddit

Very cool, hoping the 27b variants name it there too after the upcoming weekend update

PSA: Humans are scary stupid

Posted by rm-rf-rm@reddit | LocalLLaMA | View on Reddit | 204 comments

[-]

-_Apollo-_@reddit

Where does it end. Maybe this is the fake post about a real post to catch the stupid humans. How deep does this go!? Jk

Qwen3.5-27B Q4 Quantization Comparison

Posted by TitwitMuffbiscuit@reddit | LocalLLaMA | View on Reddit | 116 comments

[-]

-_Apollo-_@reddit

Can you check some of the opus 4.6 distills too?

Qwen3.5-27B Q4 Quantization Comparison

Posted by TitwitMuffbiscuit@reddit | LocalLLaMA | View on Reddit | 116 comments

[-]

-_Apollo-_@reddit

Wow, thank you

Follow-up: Qwen3.5-35B-A3B — 7 community-requested experiments on RTX 5080 16GB

Posted by gaztrab@reddit | LocalLLaMA | View on Reddit | 192 comments

[-]

-_Apollo-_@reddit

both bartowski and unsloth just updated their available 27b models. Qwen3.5 small models dropped today. Looking forward to future updates if you are so inclined. Thank you!

Breaking : The small qwen3.5 models have been dropped

Posted by Illustrious-Swim9663@reddit | LocalLLaMA | View on Reddit | 334 comments

[-]

-_Apollo-_@reddit

Doesn’t show up as an option in lm studio yet for me.

Breaking : The small qwen3.5 models have been dropped

Posted by Illustrious-Swim9663@reddit | LocalLLaMA | View on Reddit | 334 comments

[-]

-_Apollo-_@reddit

still testing it but am also curious on other's experience. if you make a new topic for it; pls link back here as well

are you ready for small Qwens?

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 174 comments

[-]

-_Apollo-_@reddit

Very cool, it might be the smaller size of the 4_0 or maybe an issue on my end. The qwen3.5 27b 4_k_l on my system is 18.41 gigs. I run in a windows 11 environment on LM studio. No k or v cache quant enabled. I can’t max out the context without it spilling into vram. Either way, pretty happy with it.

President Trump orders ALL Federal agencies in the US Government to immediately stop using Anthropic's technology.

Posted by External_Mood4719@reddit | LocalLLaMA | View on Reddit | 276 comments

[-]

-_Apollo-_@reddit

Current US admin sucks

President Trump orders ALL Federal agencies in the US Government to immediately stop using Anthropic's technology.

Posted by External_Mood4719@reddit | LocalLLaMA | View on Reddit | 276 comments

[-]

-_Apollo-_@reddit

Good call, makes sense.

are you ready for small Qwens?

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 174 comments

[-]

-_Apollo-_@reddit

Can confirm. I can squeeze out a 170k context length with the 27b model and have it all sit in 32gb of vram. I think I’m on the bartowski q4_k_L or M. Either way, works great in a coding harness. Actually having trouble deciding what worked better for my setup irl usage, this or the coder next 80b moe with expert layers offloaded to system ram.

You can use Qwen3.5 without thinking

Posted by guiopen@reddit | LocalLLaMA | View on Reddit | 86 comments

[-]

-_Apollo-_@reddit

I'm guessing this doesn't apply to api use through lm studio? What about setting the thinking toggle to off in the inference settings in LM studio?

Qwen3.5 thinking for too long

Posted by SquirrelEStuff@reddit | LocalLLaMA | View on Reddit | 35 comments

[-]

-_Apollo-_@reddit

yeah; same issues. And its not just the hello prompts. it has had trouble completing real world amateur coding tasks as well. Also via LM studio. Not sure we've missing some kind of patch or what cuz this does not seem to be the universal opinion.

Qwen3.5 Model Comparison: 27B vs 35B on RTX 4090

Posted by jaigouk@reddit | LocalLLaMA | View on Reddit | 72 comments

[-]

-_Apollo-_@reddit

Yup, very cool!

Qwen3.5 Model Comparison: 27B vs 35B on RTX 4090

Posted by jaigouk@reddit | LocalLLaMA | View on Reddit | 72 comments

[-]

-_Apollo-_@reddit

Honestly, I don’t know how you get 27b q4 to q6 to do anything right. It’s a struggle to get it doing anything on LM studio currently. Makes me wonder if the uploads broken or if LM studio is.

Qwen 3.5 craters on hard coding tasks — tested all Qwen3.5 models (And Codex 5.3) on 70 real repos so you don't have to.

Posted by hauhau901@reddit | LocalLLaMA | View on Reddit | 234 comments

[-]

-_Apollo-_@reddit

Oh, thank you!

Qwen3.5 27B is Match Made in Heaven for Size and Performance

Posted by Lopsided_Dot_4557@reddit | LocalLLaMA | View on Reddit | 114 comments

[-]

-_Apollo-_@reddit

for me, it thought of 23 possible replies to just, "Hi" and then errored out.

Qwen 3.5 craters on hard coding tasks — tested all Qwen3.5 models (And Codex 5.3) on 70 real repos so you don't have to.

Posted by hauhau901@reddit | LocalLLaMA | View on Reddit | 234 comments

[-]

-_Apollo-_@reddit

Ty, will look into it more then

Qwen 3.5 craters on hard coding tasks — tested all Qwen3.5 models (And Codex 5.3) on 70 real repos so you don't have to.

Posted by hauhau901@reddit | LocalLLaMA | View on Reddit | 234 comments

[-]

-_Apollo-_@reddit

Is their coding app CLI only? Wondering from an amateur perspective used to IDE about how challenging the switch was.

Qwen 3.5 craters on hard coding tasks — tested all Qwen3.5 models (And Codex 5.3) on 70 real repos so you don't have to.

Posted by hauhau901@reddit | LocalLLaMA | View on Reddit | 234 comments

[-]

-_Apollo-_@reddit

Thank you!

Qwen 3.5 craters on hard coding tasks — tested all Qwen3.5 models (And Codex 5.3) on 70 real repos so you don't have to.

Posted by hauhau901@reddit | LocalLLaMA | View on Reddit | 234 comments

[-]

-_Apollo-_@reddit

Thank you for the effort put into this and for sharing your data. Surprised that in your suite, qwen3 coder outperforms Qwen3 Coder Next [Q4_K_XL]. I’m curious, if you have resources/time later could you test unsloth/Qwen3 Coder Next [Q4_K_XL]-UD with their recommended tool calling settings in LM studio? https://huggingface.co/unsloth/Qwen3-Coder-Next-GGUF/blob/main/Qwen3-Coder-Next-UD-Q4_K_XL.gguf

Qwen3.5-35B-A3B is a gamechanger for agentic coding.

Posted by jslominski@reddit | LocalLLaMA | View on Reddit | 410 comments

[-]

-_Apollo-_@reddit

Any opinions on coding intelligence/ performance compared to coder NEXT at q4_k_xl-UD?

Qwen Code - a powerful open-source coding agent + NO TELEMETRY FORK

Posted by Undici77@reddit | LocalLLaMA | View on Reddit | 48 comments

[-]

-_Apollo-_@reddit

Yes. Use the latest you can for the context you need. Everything is trade offs