Interesting-Sock3940

Replaced Claude with local Qwen3.6-27B in my multi-agent orchestrator for 2 weeks

Posted by Interesting-Sock3940@reddit | LocalLLaMA | View on Reddit | 149 comments

Interesting-Sock3940@reddit (OP)

ran it all through my own orchestrator (https://openyabby.com) the system prompt plus the JSON schemas for the tool definitions ate about 2.5k tokens per turn It's a heavy tax on a 32k context window but strictly necessary if you want to gate the execution layer

Replaced Claude with local Qwen3.6-27B in my multi-agent orchestrator for 2 weeks

Posted by Interesting-Sock3940@reddit | LocalLLaMA | View on Reddit | 149 comments

Replaced Claude with local Qwen3.6-27B in my multi-agent orchestrator for 2 weeks

Posted by Interesting-Sock3940@reddit | LocalLLaMA | View on Reddit | 149 comments

Entire world: We need more GPUs. Meanwhile, Jensen Huang:

Posted by Nunki08@reddit | LocalLLaMA | View on Reddit | 264 comments

Is Qwen3.6 current king for local agentic use?

Posted by HornyGooner4402@reddit | LocalLLaMA | View on Reddit | 150 comments

Ran K2.6 through a third-party coding benchmark: heres how the figures stand up

Posted by lucasbennett_1@reddit | LocalLLaMA | View on Reddit | 5 comments

2.5x faster inference with Qwen 3.6 27B using MTP - Finally a viable option for local agentic coding - 262k context on 48GB - Fixed chat template - Drop-in OpenAI and Anthropic API endpoints

Posted by ex-arman68@reddit | LocalLLaMA | View on Reddit | 380 comments

Interesting-Sock3940@reddit

This is the kind of local inference progress that actually matters because it turns Qwen 3.6 27B into something closer to a practical coding agent on consumer hardware. The key test is whether MTP only makes it faster or whether it also makes small coding mistakes more likely, because local agents usually fail when one wrong assumption gets carried through a whole repo task

New "major breakthrough?" architecture SubQ

Posted by Daemontatox@reddit | LocalLLaMA | View on Reddit | 37 comments

DeepSeek V4 Pro matches GPT-5.2 on FoodTruck Bench, our agentic benchmark — 10 weeks later, ~17× cheaper

Posted by Disastrous_Theme5906@reddit | LocalLLaMA | View on Reddit | 94 comments

The Ultimate LLM Fine-Tuning Guide

Posted by PromptInjection_@reddit | LocalLLaMA | View on Reddit | 8 comments

Interesting-Sock3940@reddit

solid intro. one gap thats easy to miss when youre learning sft: a finetuned model can ace held-out eval and still lose tool-call format compliance the moment you load it into a real serving harness. the guide ends right where the actual production headache starts. worth a small chapter on validating against the deployment loop, not just gguf conversion

"Second Thoughts" Been playing with adding a small transformer that reads output near the end of generation, and feeds it back near the top as a refinement loop. A quick test of 1.7B model showed drastic improvement in focused tasks (like coding)

Posted by bigattichouse@reddit | LocalLLaMA | View on Reddit | 39 comments

Interesting-Sock3940@reddit

"out of scope for third thoughts" is the spot where it gets interesting tbh. once a model reviews its own output more than twice, it stops fixing real bugs and starts inventing new ones to justify another pass. how often does your gate fire on inputs that were already fine?