cleverusernametry

I ported NVIDIA Parakeet (speech-to-text) to ggml: same output as NeMo, faster, GGUF-quantized, no Python

Posted by mudler_it@reddit | LocalLLaMA | View on Reddit | 40 comments

I ported NVIDIA Parakeet (speech-to-text) to ggml: same output as NeMo, faster, GGUF-quantized, no Python

Posted by mudler_it@reddit | LocalLLaMA | View on Reddit | 40 comments

Stepfun 3.7 Flash is very good

Posted by -dysangel-@reddit | LocalLLaMA | View on Reddit | 89 comments

Breaking the music supply constraint

Posted by entsnack@reddit | LocalLLaMA | View on Reddit | 317 comments

Why do LLMs code better than they talk?

Posted by iMakeSense@reddit | LocalLLaMA | View on Reddit | 81 comments

Why do LLMs code better than they talk?

Posted by iMakeSense@reddit | LocalLLaMA | View on Reddit | 81 comments

cleverusernametry@reddit

What do you mean "LLMs"? Meaningless statement to make - mention which LLMs you've used. Sounds like youve just used GPT as those are to sycophantic ones. I've had no problems getting open weight models to talk in any fashion I wish - verbose/brief, straightforward/sugar coated etc.

bytedance released an open source model that attempts to do just about anything with only 3b parameters

Posted by uxl@reddit | LocalLLaMA | View on Reddit | 86 comments

vs code , Copilot style developing with llmama.cpp ?

Posted by opUserZero@reddit | LocalLLaMA | View on Reddit | 13 comments

vs code , Copilot style developing with llmama.cpp ?

Posted by opUserZero@reddit | LocalLLaMA | View on Reddit | 13 comments

None of this will ever get stolen

Posted by martin_xs6@reddit | LocalLLaMA | View on Reddit | 300 comments

MiMo-V2.5-Pro - the actual best open-weights model

Posted by cjami@reddit | LocalLLaMA | View on Reddit | 63 comments

PI agent integrated with Cline-Kanban repo: All using PI and Qwen 3.6 35B MOE UD 4K_XL

Posted by dreamai87@reddit | LocalLLaMA | View on Reddit | 10 comments

Deepseek flash seems like a very good replacement for Haiku at the very least

Posted by cant-find-user-name@reddit | LocalLLaMA | View on Reddit | 15 comments

Deepseek flash seems like a very good replacement for Haiku at the very least

Posted by cant-find-user-name@reddit | LocalLLaMA | View on Reddit | 15 comments

Been using PI Coding Agent with local Qwen3.6 35b for a while now and its actually insane

Posted by SoAp9035@reddit | LocalLLaMA | View on Reddit | 202 comments

Qwen 3.6 27B is a BEAST

Posted by AverageFormal9076@reddit | LocalLLaMA | View on Reddit | 343 comments

Qwen 3.6 27B is a BEAST

Posted by AverageFormal9076@reddit | LocalLLaMA | View on Reddit | 343 comments

Is harness a new buzzword?

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 122 comments

LTX-2.3 based audio model outputs

Posted by manmaynakhashi@reddit | LocalLLaMA | View on Reddit | 32 comments

Best Local LLMs - Apr 2026

Posted by rm-rf-rm@reddit | LocalLLaMA | View on Reddit | 364 comments

its all about the harness

Posted by Emotional-Breath-838@reddit | LocalLLaMA | View on Reddit | 34 comments

cleverusernametry@reddit

Another idiotic neologism "harness" and "harness engineering". The llm was always a component to be used as part of a software system. Think of it like a wheel. So far we've been seeing unicycles and we're now just starting to see primitive cars.

Fastest QWEN Coder 80B Next

Posted by StacksHosting@reddit | LocalLLaMA | View on Reddit | 39 comments

Comparing Qwen3.5 vs Gemma4 for Local Agentic Coding

Posted by garg-aayush@reddit | LocalLLaMA | View on Reddit | 97 comments

cleverusernametry@reddit

No. Boris Cherny himself says "agentic search" - simply grep and glob outperform rag for coding. That's all that Claude code uses. Unless you're a poor engineer or vibe coder, you're codebase will follow good/standard folder structures for your language + have good docs. That's all that the model needs to get the right context

Gemma 4 has been released

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 702 comments

OpenCode concerns (not truely local)

Posted by Ueberlord@reddit | LocalLLaMA | View on Reddit | 185 comments

You guys gotta try OpenCode + OSS LLM

Posted by No-Compote-6794@reddit | LocalLLaMA | View on Reddit | 186 comments

You guys gotta try OpenCode + OSS LLM

Posted by No-Compote-6794@reddit | LocalLLaMA | View on Reddit | 186 comments

You guys gotta try OpenCode + OSS LLM

Posted by No-Compote-6794@reddit | LocalLLaMA | View on Reddit | 186 comments

cleverusernametry@reddit

Counter point: no you shouldn't. Just use cc with whatever OSS model you please. Why? Because opencode is open like Cline, Kilo etc. They're VC backed, techbro energy CEO will almost guarantee enshittification sooner or later. They already introduced subscriptions and constantly have some promotional partnership with some cloud inference provider. Guess which they're going to prioritize/optimize for? Cloud or local?

I was backend lead at Manus. After building agents for 2 years, I stopped using function calling entirely. Here's what I use instead.

Posted by MorroHsu@reddit | LocalLLaMA | View on Reddit | 417 comments

I was backend lead at Manus. After building agents for 2 years, I stopped using function calling entirely. Here's what I use instead.

Posted by MorroHsu@reddit | LocalLLaMA | View on Reddit | 417 comments

I was backend lead at Manus. After building agents for 2 years, I stopped using function calling entirely. Here's what I use instead.

Posted by MorroHsu@reddit | LocalLLaMA | View on Reddit | 417 comments

Nvidia Is Planning to Launch an Open-Source AI Agent Platform

Posted by ImaginationKind9220@reddit | LocalLLaMA | View on Reddit | 32 comments

Qwen-3.5-27B-Derestricted

Posted by My_Unbiased_Opinion@reddit | LocalLLaMA | View on Reddit | 87 comments

Genuinely curious what doors the M5 Ultra will open

Posted by Blanketsniffer@reddit | LocalLLaMA | View on Reddit | 142 comments

Qwen 3.5 27B is the REAL DEAL - Beat GPT-5 on my first test

Posted by GrungeWerX@reddit | LocalLLaMA | View on Reddit | 213 comments

cleverusernametry@reddit

Its stupid that people use this single prompt tests and call it "real deal". The real world use case is within an existing project or for multi turn, multi file, multi functional codebase. And used within a sota harness like Claude code or opencode

I classified 3.5M US patents with Nemotron 9B on a single RTX 5090 — then built a free search engine on top

Posted by Impressive_Tower_550@reddit | LocalLLaMA | View on Reddit | 129 comments

I classified 3.5M US patents with Nemotron 9B on a single RTX 5090 — then built a free search engine on top

Posted by Impressive_Tower_550@reddit | LocalLLaMA | View on Reddit | 129 comments

Qwen3-Coder-Next: What am I doing wrong?

Posted by Septerium@reddit | LocalLLaMA | View on Reddit | 24 comments

Qwen3.5B VS the SOTA same size models from 2 years ago.

Posted by Uncle___Marty@reddit | LocalLLaMA | View on Reddit | 59 comments

ScrapChat – a self-hosted UI for llama.cpp with web search, vision, and real-time slot monitoring

Posted by ols255@reddit | LocalLLaMA | View on Reddit | 5 comments

Qwen3 vs Qwen3.5 performance

Posted by Balance-@reddit | LocalLLaMA | View on Reddit | 134 comments

Meet SWE-rebench-V2: the largest open, multilingual, executable dataset for training code agents!

Posted by Fabulous_Pollution10@reddit | LocalLLaMA | View on Reddit | 11 comments

Meet SWE-rebench-V2: the largest open, multilingual, executable dataset for training code agents!

Posted by Fabulous_Pollution10@reddit | LocalLLaMA | View on Reddit | 11 comments

People are getting it wrong; Anthropic doesn't care about the distillation, they just want to counter the narrative about Chinese open-source models catching up with closed-source frontier models

Posted by obvithrowaway34434@reddit | LocalLLaMA | View on Reddit | 136 comments

GGML.AI has got acquired by Huggingface

Posted by Time_Reaper@reddit | LocalLLaMA | View on Reddit | 106 comments

GGML.AI has got acquired by Huggingface

Posted by Time_Reaper@reddit | LocalLLaMA | View on Reddit | 106 comments

cleverusernametry@reddit

People in here defending HF need to learn thr phrase "show Mr an incentive and I'll show you an outcome". They are a for profit company. End of story. The only saving grace is they are French and culturally may not be as mindless extreme capitalist as a silicon valley company. (Case in point ollama)

GLM-5 is officially on NVIDIA NIM, and you can now use it to power Claude Code for FREE 🚀

Posted by PreparationAny8816@reddit | LocalLLaMA | View on Reddit | 40 comments

Did anyone compare this model to the full Qwen coder? it claims to give almost identical performance at 60B

Posted by Significant_Fig_7581@reddit | LocalLLaMA | View on Reddit | 21 comments

cleverusernametry@reddit

Yes. For your sanity sake, the rule out thumb is avoid shortcuts to the maximum possible extent. Use the biggest model, least quantized, non abliterated/pruned etc that can run at the slowest acceptable pace on your hardware. Defer to quality tokens rather than more tokens faster

MiniMax-M2.5 REAP models available on HF

Posted by Look_0ver_There@reddit | LocalLLaMA | View on Reddit | 18 comments

GPT-OSS 120b Uncensored Aggressive Release (MXFP4 GGUF)

Posted by hauhau901@reddit | LocalLLaMA | View on Reddit | 30 comments