cleverusernametry

What do you mean "LLMs"? Meaningless statement to make - mention which LLMs you've used. Sounds like youve just used GPT as those are to sycophantic ones. I've had no problems getting open weight models to talk in any fashion I wish - verbose/brief, straightforward/sugar coated etc.

bytedance released an open source model that attempts to do just about anything with only 3b parameters

Posted by uxl@reddit | LocalLLaMA | View on Reddit | 86 comments

[-]

cleverusernametry@reddit

Maybe because they dont actually work well

vs code , Copilot style developing with llmama.cpp ?

Posted by opUserZero@reddit | LocalLLaMA | View on Reddit | 13 comments

[-]

cleverusernametry@reddit

Hmm why would someone archive a repo if it got attention?

vs code , Copilot style developing with llmama.cpp ?

Posted by opUserZero@reddit | LocalLLaMA | View on Reddit | 13 comments

[-]

cleverusernametry@reddit

It was archived today?

None of this will ever get stolen

Posted by martin_xs6@reddit | LocalLLaMA | View on Reddit | 300 comments

[-]

cleverusernametry@reddit

F the LinkedIn speak generated by chatgpt

MiMo-V2.5-Pro - the actual best open-weights model

Posted by cjami@reddit | LocalLLaMA | View on Reddit | 63 comments

[-]

cleverusernametry@reddit

I despise your title and the implication that the best model at playing your game on your setup makes it a universal best somehow

PI agent integrated with Cline-Kanban repo: All using PI and Qwen 3.6 35B MOE UD 4K_XL

Posted by dreamai87@reddit | LocalLLaMA | View on Reddit | 10 comments

[-]

cleverusernametry@reddit

Dont bother with Cline. Its a dying project. Roo was just smart to kill it before a slow death

Deepseek flash seems like a very good replacement for Haiku at the very least

Posted by cant-find-user-name@reddit | LocalLLaMA | View on Reddit | 15 comments

[-]

cleverusernametry@reddit

You saying you have an eval is the not the same as it being shared, auditable and runnable by us

Deepseek flash seems like a very good replacement for Haiku at the very least

Posted by cant-find-user-name@reddit | LocalLLaMA | View on Reddit | 15 comments

[-]

cleverusernametry@reddit

I don't know what's more useless, benchmaxxed scores or this constant stream of vibe evals

Been using PI Coding Agent with local Qwen3.6 35b for a while now and its actually insane

Posted by SoAp9035@reddit | LocalLLaMA | View on Reddit | 202 comments

[-]

cleverusernametry@reddit

Wdym all systems are literally just that, they just add obfuscation layers that pretend to be some new capability/abstraction like skills, modes, plugins etc

Qwen 3.6 27B is a BEAST

Posted by AverageFormal9076@reddit | LocalLLaMA | View on Reddit | 343 comments

[-]

cleverusernametry@reddit

You gotta say what quant you're using

Qwen 3.6 27B is a BEAST

Posted by AverageFormal9076@reddit | LocalLLaMA | View on Reddit | 343 comments

[-]

cleverusernametry@reddit

On my m3 ultra it's 20tps. Q8

Is harness a new buzzword?

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 122 comments

[-]

cleverusernametry@reddit

Harness engineering <- you are here Context engineering Prompt engineering

LTX-2.3 based audio model outputs

Posted by manmaynakhashi@reddit | LocalLLaMA | View on Reddit | 32 comments

[-]

cleverusernametry@reddit

Wow, this is excellent. It's better than any TTS model that I know of.

Best Local LLMs - Apr 2026

Posted by rm-rf-rm@reddit | LocalLLaMA | View on Reddit | 364 comments

[-]

cleverusernametry@reddit

36B?

its all about the harness

Posted by Emotional-Breath-838@reddit | LocalLLaMA | View on Reddit | 34 comments

[-]

cleverusernametry@reddit

Another idiotic neologism "harness" and "harness engineering". The llm was always a component to be used as part of a software system. Think of it like a wheel. So far we've been seeing unicycles and we're now just starting to see primitive cars.

Fastest QWEN Coder 80B Next

Posted by StacksHosting@reddit | LocalLLaMA | View on Reddit | 39 comments

[-]

cleverusernametry@reddit

"Insanely fast" Shares no numbers at all

Comparing Qwen3.5 vs Gemma4 for Local Agentic Coding

Posted by garg-aayush@reddit | LocalLLaMA | View on Reddit | 97 comments

[-]

cleverusernametry@reddit

No. Boris Cherny himself says "agentic search" - simply grep and glob outperform rag for coding. That's all that Claude code uses. Unless you're a poor engineer or vibe coder, you're codebase will follow good/standard folder structures for your language + have good docs. That's all that the model needs to get the right context

Gemma 4 has been released

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 702 comments

[-]

cleverusernametry@reddit

Isn't thr elo from lmarena? If so, then definitely don't trust it as theyvare sus AF taking a pile of VC money

OpenCode concerns (not truely local)

Posted by Ueberlord@reddit | LocalLLaMA | View on Reddit | 185 comments

[-]

cleverusernametry@reddit

u/Reggienator3 here's the enshittification

You guys gotta try OpenCode + OSS LLM

Posted by No-Compote-6794@reddit | LocalLLaMA | View on Reddit | 186 comments

[-]

cleverusernametry@reddit

And in which of those cases have the successor been anywhere close to the adoption and support of the predecessor?

You guys gotta try OpenCode + OSS LLM

Posted by No-Compote-6794@reddit | LocalLLaMA | View on Reddit | 186 comments

[-]

cleverusernametry@reddit

Has that strategy ever worked for any of the long list of open source sowftwares that have been enshittified?

You guys gotta try OpenCode + OSS LLM

Posted by No-Compote-6794@reddit | LocalLLaMA | View on Reddit | 186 comments

[-]

cleverusernametry@reddit

Counter point: no you shouldn't. Just use cc with whatever OSS model you please. Why? Because opencode is open like Cline, Kilo etc. They're VC backed, techbro energy CEO will almost guarantee enshittification sooner or later. They already introduced subscriptions and constantly have some promotional partnership with some cloud inference provider. Guess which they're going to prioritize/optimize for? Cloud or local?

I was backend lead at Manus. After building agents for 2 years, I stopped using function calling entirely. Here's what I use instead.

Posted by MorroHsu@reddit | LocalLLaMA | View on Reddit | 417 comments

[-]

cleverusernametry@reddit

Cool no matter how thin your framework is, it's yet another thing you don't need. Llms can use bash fine but themselves and it's all you need

I was backend lead at Manus. After building agents for 2 years, I stopped using function calling entirely. Here's what I use instead.

Posted by MorroHsu@reddit | LocalLLaMA | View on Reddit | 417 comments

[-]

cleverusernametry@reddit

In any and all cases, agents have to be sandboxed. That obviates your concern

I was backend lead at Manus. After building agents for 2 years, I stopped using function calling entirely. Here's what I use instead.

Posted by MorroHsu@reddit | LocalLLaMA | View on Reddit | 417 comments

[-]

cleverusernametry@reddit

Always was

Nvidia Is Planning to Launch an Open-Source AI Agent Platform

Posted by ImaginationKind9220@reddit | LocalLLaMA | View on Reddit | 32 comments

[-]

cleverusernametry@reddit

Simon says yes https://simonwillison.net/2026/Feb/21/claws/ 💩

Qwen-3.5-27B-Derestricted

Posted by My_Unbiased_Opinion@reddit | LocalLLaMA | View on Reddit | 87 comments

[-]

cleverusernametry@reddit

Which version are you referring to?

Genuinely curious what doors the M5 Ultra will open

Posted by Blanketsniffer@reddit | LocalLLaMA | View on Reddit | 142 comments

[-]

cleverusernametry@reddit

Do you mean asked claude to make it?

Qwen 3.5 27B is the REAL DEAL - Beat GPT-5 on my first test

Posted by GrungeWerX@reddit | LocalLLaMA | View on Reddit | 213 comments

[-]

cleverusernametry@reddit

Its stupid that people use this single prompt tests and call it "real deal". The real world use case is within an existing project or for multi turn, multi file, multi functional codebase. And used within a sota harness like Claude code or opencode

I classified 3.5M US patents with Nemotron 9B on a single RTX 5090 — then built a free search engine on top

Posted by Impressive_Tower_550@reddit | LocalLLaMA | View on Reddit | 129 comments

[-]

cleverusernametry@reddit

People falling for the story: see it from the angle of it being a sales funnel for him and it makes much more sense

I classified 3.5M US patents with Nemotron 9B on a single RTX 5090 — then built a free search engine on top

Posted by Impressive_Tower_550@reddit | LocalLLaMA | View on Reddit | 129 comments

[-]

cleverusernametry@reddit

Can you please open source the code or at least the data?

Qwen3-Coder-Next: What am I doing wrong?

Posted by Septerium@reddit | LocalLLaMA | View on Reddit | 24 comments

[-]

cleverusernametry@reddit

Why kilo over roo?

Qwen3.5B VS the SOTA same size models from 2 years ago.

Posted by Uncle___Marty@reddit | LocalLLaMA | View on Reddit | 59 comments

[-]

cleverusernametry@reddit

Data was generated by Gemini?? Did you bother to even fact check it??

ScrapChat – a self-hosted UI for llama.cpp with web search, vision, and real-time slot monitoring

Posted by ols255@reddit | LocalLLaMA | View on Reddit | 5 comments

[-]

cleverusernametry@reddit

"UI" Not even a screenshot.

Qwen3 vs Qwen3.5 performance

Posted by Balance-@reddit | LocalLLaMA | View on Reddit | 134 comments

[-]

cleverusernametry@reddit

Good your discovering how nonsensical these benchmarks are

Meet SWE-rebench-V2: the largest open, multilingual, executable dataset for training code agents!

Posted by Fabulous_Pollution10@reddit | LocalLLaMA | View on Reddit | 11 comments

[-]

cleverusernametry@reddit

Naming both the same thing seems to suggest that model makers can Now train on yhr benchmark test set..

Meet SWE-rebench-V2: the largest open, multilingual, executable dataset for training code agents!

Posted by Fabulous_Pollution10@reddit | LocalLLaMA | View on Reddit | 11 comments

[-]

cleverusernametry@reddit

I'm confused. Wasn't this supposed to be a benchmark?

People are getting it wrong; Anthropic doesn't care about the distillation, they just want to counter the narrative about Chinese open-source models catching up with closed-source frontier models

Posted by obvithrowaway34434@reddit | LocalLLaMA | View on Reddit | 136 comments

[-]

cleverusernametry@reddit

Go look at published papers and citations first

GGML.AI has got acquired by Huggingface

Posted by Time_Reaper@reddit | LocalLLaMA | View on Reddit | 106 comments

[-]

cleverusernametry@reddit

Oh fuck

GGML.AI has got acquired by Huggingface

Posted by Time_Reaper@reddit | LocalLLaMA | View on Reddit | 106 comments

[-]

cleverusernametry@reddit

People in here defending HF need to learn thr phrase "show Mr an incentive and I'll show you an outcome". They are a for profit company. End of story. The only saving grace is they are French and culturally may not be as mindless extreme capitalist as a silicon valley company. (Case in point ollama)

GLM-5 is officially on NVIDIA NIM, and you can now use it to power Claude Code for FREE 🚀

Posted by PreparationAny8816@reddit | LocalLLaMA | View on Reddit | 40 comments

[-]

cleverusernametry@reddit

Would you mind adding a security section? Is there any telemetry/data that you capture?

Did anyone compare this model to the full Qwen coder? it claims to give almost identical performance at 60B

Posted by Significant_Fig_7581@reddit | LocalLLaMA | View on Reddit | 21 comments

[-]

cleverusernametry@reddit

Yes. For your sanity sake, the rule out thumb is avoid shortcuts to the maximum possible extent. Use the biggest model, least quantized, non abliterated/pruned etc that can run at the slowest acceptable pace on your hardware. Defer to quality tokens rather than more tokens faster

MiniMax-M2.5 REAP models available on HF

Posted by Look_0ver_There@reddit | LocalLLaMA | View on Reddit | 18 comments

[-]

cleverusernametry@reddit

And especially not from some random account

GPT-OSS 120b Uncensored Aggressive Release (MXFP4 GGUF)

Posted by hauhau901@reddit | LocalLLaMA | View on Reddit | 30 comments

[-]

cleverusernametry@reddit

I believe you totally