This is where we are right now, LocalLLaMA
Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 411 comments
the future is now
Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 411 comments
the future is now
ArtdesignImagination@reddit
ok and?
Low-Opening25@reddit
this is false. running LLM on a laptop, battery would be done in 30 mins.
Pleasant-Shallot-707@reddit
They have plugs on airplanes now
Low-Opening25@reddit
on somez
Pleasant-Shallot-707@reddit
And there’s no reason to think he doesn’t have a plug available to him
Low-Opening25@reddit
ok stickler
Pleasant-Shallot-707@reddit
lol just accept your comment provided zero value and learn from this
N1AK@reddit
Plenty of laptops have power these days, and decent internet; I've run Qwen 3.6 33B A3B on a MacBook Pro for over 90 mins on battery. I used to do work flights overnight and try to sleep, now I tend to do daytime flights as I've got decent internet and power so can work pretty much as normal.
Dry_Yam_4597@reddit
Cool cool.
But this type of dramatic writing.
Is super annoying.
It's as if the writers wants to share something dramatic.
They can just calm their tits down.
yaosio@reddit
This morning I fired up my laptop and started my LLM.
All I did was tell it to make a world changing app.
I didn't make a harness.
I didn't tell it how to do it.
I didn't tell it how to change the world.
It made an app of a purple monkey that sits on your desktop.
Friendly, fast, purple.
If you're not already doing this you're left behind.
nomickti@reddit
Purple. Monkey. Dishwasher.
Tommy3Tanks@reddit
Sounds like a strong password.
iamapizza@reddit
Interior. Crocodile Alligator?
No, I drive a Chevrolet.
Movie theater.
nomickti@reddit
I myself Mr. Gerbik. half-shark, half-man, skin like alligator.
Carrying a dead walrus. check it.
BadgerOfDoom99@reddit
Dangling in space.
A thousand lives yet unlived.
Hairy Donkey balls.
marastinoc@reddit
Bears.
Beets.
Battlestar Galactica.
Orson_Welles@reddit
It’s not just a dishwasher. It’s a dishwisher.
chris415@reddit
no, dishwursher
rasmadrak@reddit
iWash... It just wash.
ovrlrd1377@reddit
uWish
vabello@reddit
Man
Woman
Person
Camera
TV
Covfefe
octoo01@reddit
Banzi Buddy?
KindaSortaGood@reddit
That shit was straight malware
randylush@reddit
I call this. The second revolution of AI.
Bonzupii@reddit
GitHub repo to the purple monkey app?
You had me at "Friendly, fast, purple."
I know this is probably 100% a joke.
But I'm holding out hope...
That the purple monkey app is real.
Please don't let me down.
yaosio@reddit
You won't believe this. My LLM? It isn't just intelligence.
It's time travel.
It went back in
time
and made the purple monkey
https://en.wikipedia.org/wiki/BonziBuddy
Bonzupii@reddit
Brb making a BonzupiiBuddy
Larimus89@reddit
What model you find best?
Poromenos@reddit
This isn't humans. This isn't LLMs. This is Claude writing.
Ok_Scientist_8803@reddit
Reminds me of:
This is not just food.
This is M&S food
gnnr25@reddit
But what if this is
is the next
e = mc^(2) + AI
The possibilities are memefull
qzrz@reddit
When the formula was invented there was no AI, so for them AI = 0. That's why it works!
beryugyo619@reddit
it's still funny how it just implies one or more of A and I equals 0
the_ai_wizard@reddit
you should post this shit to LinkedIn and watch the marketers run with this
FujiKeynote@reddit
(most people miss this)
taurusApart@reddit
It's LinkedIn Speak.
Which is essentially 4chan greentext.
But for insufferable corporate assholes.
seanmacproductions@reddit
Best take I’ve seen all year
SkyFeistyLlama8@reddit
LinkedIn being Corporate 4chan is.
TheSEOVicc@reddit
Yeah it’s better to connect sentences more and not go full gooroo spam mode on LI
Downtown-Key9504@reddit
Real. Got me wanting to say “ With a soul-stirring sigh of finality, I surrendered a titan of earthly burden to the porcelain abyss, severing the heavy chains of internal discord. A crystalline clarity now permeates my being, as if the very stars have realigned to honor this profound and hallowed evacuation of the spirit. “
Every morning
KarmaBitesDogma@reddit
This is the most amusing post I’ve read on any sub in the last year, and, of equal importance, I’m hoping to hell that an actual Homo sapiens has crafted it.
BlobbyMcBlobber@reddit
It's just a LinkedIn hype post. Not only the styling is annoying, but the conclusion is also way too exaggerated. This is not the second AI revolution or whatever. 99.999% of people are not even interested in running local AI or buying the hardware for it. I love local AI but it's a tiny niche and will probably stay as such for a long time.
txgsync@reddit
I hate that you are right. Even on M5 Max, waiting minutes on prefill is nowhere it’s at.
I am noodling on a Pi approach that focuses intensely on re-using KV cache prefixes to try to keep prefill times reasonable on large contexts. But it’s like… sure, I could do that. Or throw $0.30 at Opus and it will finish that job in seconds instead of minutes.
conscientious_obj@reddit
Also once you think about his argument it kinda falls flat. Allow me doubt my own headstart because I built a Frankenstein setup after weeks of experimenting with llama.cpp, vllm, mlx deciding finally to go with llama and some bleeding edge turboquant PR that works reasonably well when it doesn't crash with my Pi terminal app.
Any person joining open models when they are more mature will be able to nullify my advantage in 1 day without wasting the time and money I am currently pouring into this hobby.
WhichWall3719@reddit
It's infecting newspaper writing too, absolutely insufferable trying to read about some event that happened locally when it's being
spread out.
On five lines.
With half of a detail or location.
On each line.
mutexsprinkles@reddit
Tabloids have been a sentence per paragraph forever, that's not a new thing. It's because their audience is not very good at reading or thinking.
LinkedIn dramaspam is different because their audience is...oh.
Ok_Scientist_8803@reddit
This feels like:
A linkedin/instagram post.
I opened a LLM on my MacBook.
It coded me a whole fizzbuzz program in python.
It would have taken my employees £500.
This cost me £0.
Embrace local LLMs or be left behind.
This is the era of AI
"Comment AI and I'll send you an instruction pack to start a 7 figure business"
Shawnj2@reddit
All of these people are probably just Mac minis running openclaw with the Claude api in a tech bro’s basement
Dry_Yam_4597@reddit
"so much compute", as karpathy would say, used for no good other than cringy spam.
Jungle_Llama@reddit
Yup, Monkey Tennis
Foreign_Risk_2031@reddit
His entire life depends on what he says becoming true
Dry_Yam_4597@reddit
How else will he get promoted.
How else can he sacrifice his family for long meetings.
How else can he make those presentations that no one will care about in two days from now.
How else can he prove he's worth it.
If not by pleasing others like him.
Craving for attention.
or something :P gosh these people...
Foreign_Risk_2031@reddit
He’s the CTO of hugging face
Dry_Yam_4597@reddit
He may be.
But there is always a bigger title to get.
A better job to chase.
here_n_dere@reddit
True fact his profile says open to work 😅
McSendo@reddit
I remember one of their "free" courses would asked you to connect to HF's api (it has a free quota and will ask you to put credit in), when you could've just loaded the 100mb model in your LOCAL machine.
They need to make their money i guess.
Torodaddy@reddit
100mb model? Is that the intelligence of a labradoodle?
last_llm_standing@reddit
I know the dude personally, he's a fine lad and actually engaging
Dry_Yam_4597@reddit
I apologize then, nothing personal.
But man that writing style.
Frank_Lamingo@reddit
let's be honest here. he didn't write this
saposmak@reddit
Be that as it may:
If you care about your career, don't stop reading.
I'm imparting wisdom right now.
I've discovered a higher truth, and you need to listen.
This is farts running on a MacBook Pro, full blast.
The future is leaving you behind.
I'm CTO, I know what I'm talking about.
victorsmonster@reddit
it's giving r/LinkedInLunatics
Hot_Growth_9643@reddit
lol that’s the sole reason he used his ai on the plane to write this bollocks!
arekkushisu@reddit
The James Clear way of writing.. lol
Whyme-__-@reddit
I don’t know if you remember but It’s better than how people on quora use to write.
Question: “what’s your favorite color” Answer: “So I was in elementary school but now I’m 45 years old and I found this color palette on the floor bla bla bla….”
kbderrr@reddit
Yeah, the whole post feels like it was vibed and I think bro thought he was posting on LinkedIn.
LocoLanguageModel@reddit
People often ask me if it's possible to run something powerful locally.
I always tell them the same thing.
How dare you speak to me.
Andrew_hl2@reddit
I fucking hate Threads on AI stuff... every other post is like this.
Endflux@reddit
I do love me some Haikus
unculturedperl@reddit
They forgot to bold random sentences to truly convey the impact.
Ok_Study3236@reddit
Twitter was already bad, but then they started paying everyone for tweets it's officially now a radioactive toilet
Zeeplankton@reddit
This is how everyone on Xshitter writes it's so annoying
cms2307@reddit
Yeah this seems to be some type of mental disorder that effects Twitter users especially right wingers
Evening_Ad6637@reddit
No it’s not especially right wingers. It feels like almost everyone on twitter has this disorder.
The final and most deadly stage of this disease is when you start every fucking post with:
"🚨 BREAKING … "
Geez..
Icy_Distribution_361@reddit
Totally. But it gets followers probably. Don’t ask which class though
ttkciar@reddit
Setting people's expectations too high is going to cause backlash, when first-time users fire up Qwen3.6-27B and it falls far short of Sonnet, let alone Opus.
Qwen3.6-27B is really good for its size, and certainly good enough for agentic code-gen for most people/use-cases, but Chaumond is overstating its abilities by rather a lot.
Turtlesaur@reddit
On the plus side, because of this post I learned Pi coding agent doesn't have anything to do with running something in a raspberry Pi
WhoTookPlasticJesus@reddit
That's what I assumed until I read your post.
Why on Earth would you name anything computing-related "pi"?
Regular_Working6492@reddit
It was originally called shitty-coding-agent and the author Mario Zechner only grudgingly renamed it to something more serious.
4onen@reddit
If I remember correctly, the author was going for an un-googleable name on purpose, so he wouldn't have to deal with as many issues if people did actually pick it up, because people couldn't find the repo.
gnurcl@reddit
Same
watergoesdownhill@reddit
Yeah, at least some value came out of that post.
CraftedCalm@reddit
Wait, it’s not?
amejin@reddit
Thank you.
Additional-Acadia954@reddit
I was annoyed when I understood that too
DavethegraveHunter@reddit
That’s what I assumed it was, too…
BloodyShirt@reddit
These people must be getting paid by Apple.. I tried a few Qwen models on my 128gb M5 Max hoping for something sort of reminiscent of sonnet (I didn't even dare think of opus) and it spent 20 minutes delivering nothing useful at all just making a small space heater and firing off new tools. Maybe I'm the outlier using this stuff for coding and large mature repo's but I don't think there's a few hundred grand worth of option out there that gets me locally ai enabled like I've become accustomed to with cloud based infra.
dtdisapointingresult@reddit
There's no "a few Qwen models", it's just 27B that's great, MAYBE 122B/397B (I didn't try those).
If you are using 35B A3B, I don't care what the benchmarks say, it's nowhere close to 27B. People like it because it's fast, but that speed doesn't take into account the fact that it relies a lot on trial and error due to not having the intelligence to approach the problem properly. 27B may be slower but it's gonna take smarter decisions and the session length might end up the same.
Even with 27B, you gotta maximize your odds of success by meeting it half-way, it's a small local model after all.
I'm also currently looking into doing more extreme things, like a search-and-replace proxy to patch out bloat from Claude Code's system prompt. I mean look at this shit. https://raw.githubusercontent.com/asgeirtj/system_prompts_leaks/refs/heads/main/Anthropic/claude-code.md I bet I can shave off 5k toks from pointless guardrails nonsense and tools I know I'll never use like Github. This should slightly improving accuracy on everything due to better attention.
ClintonKilldepstein@reddit
I did try 122B and 397B. 397B is worth the download, 122B was not. Qwen3.6-35B is excellent for some modes like Orchestrator, Architect & Ask tasks. I save 27B strictly for coding and debug.
my_name_isnt_clever@reddit
Have you actually tried the 3.6 35b? I've replaced 3.5 122b with it, it's that good.
dtdisapointingresult@reddit
I did for a couple of nights. It could do the basics well but failed at impressing me. I think my issue is that I use this tool like a pair programmer, interactively, where I'm looking at what they're doing in real-time. I don't just leave it running overnight and come back to something that works. So I notice when it's completely off-base.
To give you an example, I had a launcher script that was giving a JSON error. It's a bash script that calls docker that runs 'bash -c "command -arg1 -arg2 ..."'. It was failing in the JSON I give as arg2, a 'json.loads' error from the app inside docker.
Well 35B didn't even consider quote escaping. It just kept throwing shit at the wall, making random fixes. It even started reading the the source code of the app inside the docker container.
This level of intelligence is a dealbreaker for me. I don't care about results if they're produced this way, even if they end up being correct after exhausting every other option. It cannot possibly lead to maintainable code.
my_name_isnt_clever@reddit
That's very different from my experience, what do you use as the coding scaffolding?
I set it up at the same time as hermes-agent, and I haven't had to use any other models so far. I have it do tasks for research and maintaining a LLM wiki, managing a Minecraft server through tmux, coding in python and TS for LLM tooling, and coding in Nix for my NixOS setup. It handles all these tasks cleanly with minimal issues. I've had it get stuck in a loop maybe three times since it dropped.
dtdisapointingresult@reddit
I mostly use Qwen Code, but I try to use Claude Code about 20% of the time to have something to compare to (same model on both). I'm reading up on Pi right now, it might end up being my primary.
Currently most of my local tasks are some form of LLM tooling. For example doing tests, getting a model to run with certain parameters, trying to get cool apps that don't work on ARM (my DGX Spark) to build.
I think any language task like maintaining a wiki should be considered easy for any model. Managing a Minecraft server, it would depend on the sort of work involved Starting/stopping services, following easy doc, there should be no issues there.
But if you say it's doing good at coding in Python and TS, then that surprises me. Maybe Ubuntu + bash + docker is a harder task than I give it credit for.
my_name_isnt_clever@reddit
This is what I'll say, compared to Qwen 3.5 122b it's just as capable at agentic tasks, but it's not as intuitive with the unexpected. It usually does a great job but sometimes needs a nudge in the right direction more than larger models. It's worth it for the speed IMO, but we will see how I feel about 3.6 122b.
I'm experimenting with having the local agent delgate planning to a cloud frontier model with deeper thinking, then the local agent implements from there. That pattern seems like a great middle ground so far.
BloodyShirt@reddit
Thanks for the advice! I haven’t really devoted much time to it tbh just started exploring but again, haven’t had a ton of time to tinker.
chodtoo@reddit
I can’t run anything above 10b model on my MacMini M4 Pro 48GB . I can only reliably run gemma4:e4b or qwen3.5:9b. Qwen being way better at reasoning.
tmvr@reddit
What do you mean? You have 36GB of default VRAM allocation, of course you can run much larger models. The dense models like Qwen3.6 27B or Gemma 4 31B will be slow of course even at Q4 sizes due to the 276GB/s max bandwidth, but the Qwen3.6 35B A3B or Gemma 4 26B A4B will fit even at Q6 or Q8 and large context while giving you very fast decode speeds.
chodtoo@reddit
If any one is successful running qwen3.5:26b on a MacMini M4 48GB I would like to know your ollama config.
bnightstars@reddit
If Qwen3.5-9B was so good Qwen3.6-9B will be a great alternative to 35B for Mac users.
LewdKantian@reddit
I run the same hardware and looping 35B A3B in Ralphify produces meaningful code for a lot of my projects. Just "one-shotted" a Lightrag pipeline with local LLM and Obsidian integration and MacOS functionality through Karabiner and Hammerspoon. Pretty decent for a small model like this. Looping, bite-sized tasks to iterate over and clear success criteria help a lot.
BloodyShirt@reddit
I’m sure I’ve got plenty of room for improvement but it dropped the ball pretty hard just trying to consume my mature repos and memories. Next 10 hour flight without WiFi maybe I’ll have time to play with it again but for now.. Claude’s got me hooked unfortunately
Crafty_Peanut_2653@reddit
what kind of macbook lol. dont u need like 32 gigs ram
FullstackSensei@reddit
I think it really depends on how you prompt it. I've been using both 3.6 models since they came out and haven't felt the need to fire minimax nor 3.5 397B since. I did a couple of comparisons with minimax 2.7 Q8_K_XL and 27B was on par.
I give the model a pretty detailed description of what I want, at least a full page Worth description of what to do, where and how to do it the prompt also points the model to where it can find documentation in the project and I encourage it to use it (surprisingly, this "encouragement" really works). System prompt also sets a bunch of guidelines, such as (again) encouraging the model to split tasks into multiple bite sized tasks and writing a markdown file in a temp directory within the project documenting what each task did. Again, this works quite effectively. It leaves very little for the model to guess about and lets it focus on the coding part.
Within this scope and controlled way of doing things, both Qwen 3.6 models are very capable.
CatConfuser2022@reddit
Please share your instructions if possible :)
FullstackSensei@reddit
I treat the LLM like a junior dev fresh out of uni, who's on their first day on the job. So, I point it to the documentation directory, which (thanks to LLMs) has a "directory" markdown file summerizing the contents of each documentation file. I instruct it to read the requirements, specifications and architecture documents. I instruct it to follow existing conventions it sees in the source. I instruct it to create a markdown file in a temporary directory inside the project detailing what it did and how.I instruct it to break out the task into byte sized sub tasks and to create those sub tasks, and instruct it to pass all the above instructions as part of the prompt to the agents of those sub tasks.
The rest is specific to what I want to do. I give very detailed instructions of what needs to be done, where it needs to be done and how I want it done. Before submitting the prompt to the agent, I paste in a chat and ask the LLM to point any ambiguities or contradictions in the language of the prompt and ask me about them. If there are any, the LLM will point them out. If there aren't, the LLM will come buck with a bunch of silly questions, and if so, it's ready to go cooking.
Imaginary-Unit-3267@reddit
Would it be accurate to say that getting it to understand exactly what you want it to do is a large portion of the entire problem?
FullstackSensei@reddit
Isn't that always the problem, not only with LLMs, but also in real life?
If you can express your thinking clearly, you can communicate with anyone and anything effectively. It's not that easy, but not that hard either if you try to be conscious about what implicit assumptions you're making that the other might not be aware of or know. That's why I use the junior dev on their first day on the job analogy, and then rubber duck it in a chat with the LLM if I'm still not sure.
It goes very much against the trend of vibe coding things fast, but my objective is to delegate work and still have it done the way I want it, so I can maintain it. It's 10x slower than one shotting a few lines, but still 10x faster than writing the code by hand.
philmarcracken@reddit
If you can make a flow chart, the LLM can build it for you.
It doesn't have to be about what lawyers do, which is straighten english out and avoid loopholes. Thats why legalese is so verbose, its taken how a computer might try and understand it, completely literally. And also why they get paid the big bucks, because english is so full of holes
FullstackSensei@reddit
I agree that prose is quite.... verbose. It's something I have been thinking about for a while. But what's the alternative?
Flow charts can also take time to create, and I don't know how to pass them to the LLM in an effective manner. Mermaid is nice, but LLMs seem to frequently make mistakes spitting it out that I don't trust it as an input format. Thought about UML, but same problem, plus it takes more time.
I've stuck with prose also because it's what LLMs have been trained on, issue -> code.
The thing about "legalese" with LLMs for coding tasks is that it significantly reduces how big and how good the model has to be to complete a certain task.
Imaginary-Unit-3267@reddit
I agree. For me, the reason I don't just vibe code things is precisely because I'm not a dev, I'm not a genius programmer, and I know that if I don't make sure I understand everything every step of the way, whatever the AI produces will be unmaintainable for me. I am finding myself very ironically being forced to learn software engineering just to make a helper for my (independent, non-academic) philosophy research, which is what I'm actually interested in!
VertigoOne1@reddit
I always tell the devs to think like this. You know things, the llm knows things, among these things it knows is how to translate languages really really well, and not just english to german, but english to Java and typescript, and typescript to C#. it needs to know what you are saying really well to translate really well, so the more effort you put in, the better it does it. This is not vibecoding, this is systems design and engineering. Every time i’ve been let down by an llm it was ultimately my own fault. Be honest, are you cruxing opus to make up for your laziness? It will let you down too, just like a genius senior dev will too if you give him crap.
miversen33@reddit
Humans have this problem with humans too :)
Pyros-SD-Models@reddit
I also think people are speaking from belief rather than actual experience, because they haven’t really tried Qwen3.6-27B. For coding agent tasks, Qwen3.6-27B inside Pi mops the floor with Sonnet inside Claude Code.
Or they’re judging adjacent tasks, but yeah, obviously Qwen3.6-27B will not meticulously search half the internet and write the most perfect plan ever. It can do it, but it doesn’t extract the learnings as well as something like Opus or GPT-Pro would. But nobody is talking about that, since OP is clearly referring to coding tasks, not planning tasks.
Double_Cause4609@reddit
Is it possible that both he and you are correct?
Is it possible that he has a strong prior in software engineering, and in his field of expertise he's able to manage the agent in ways that are limited in scope such as to also limit the difference between different models?
For his use case, the models may actually genuinely be quite close.
But to an average vibe coder who is not directing the model to do the right thing, who is unclear about their requirements, or who expects too much out of a single step of the pipeline, it's possible that there may be a much larger difference in a less constrained environment.
Poromenos@reddit
No, I suspect it's the other way around: To an experienced, professional developer, these models are very far apart. To an average vibe coder who YOLOs a bunch of tickets to the model, maybe they can't tell them apart, sure.
mrjackspade@reddit
As an experienced programmer, even Claude Opus is infuriatingly stupid. a lot
Just as a non-programming example (for reference) I'm having an issue where I'm having issues connecting to a server. \~70% of the connections fail.
Claude runs a test, one IPV4 and one IPV6 connection. IPV4 fails and IPV6 succeeds.
Claude then confidently states that my issue is caused by IPV4 connections failing.
Claude does things like this and I wonder how the fuck anyone even succeeds to vibe code anything without existing software developer experience.
aw2xcd@reddit
One more example: I had an Opus 4.6 generated Mac app failing to start because the splash screen image was missing and its solution was to do all kinds of tests to check if the bundled logo exists and fail silently without suggesting that the image is missing.luckily I picked this up in the PR but imagine all the things that get through because I don’t have the mental capacity to read thousands of lines this thing spits out every minute.
mrjackspade@reddit
I have wasted so many fucking hours debugging because Claude defaults to failing silently for everything, even mission critical functions.
I was having it work on a reddit client, and it's first draft, caught and swallowed any errors that occurred on ANY call returning null or empty collections whenever an error occurred.
Poromenos@reddit
Simple: They don't. You need to correct its plans a lot. However, after you've agreed on a good plan, I've found that the subsequent implementation has very, very few bugs.
LosingID_583@reddit
It's possible for LLMs to one-shot, especially if you ask them a boilerplate-style task like a simple 2D game or app. I imagine that this is the level of apps that most non devs ever succeed at vibe-coding.
Houdinii1984@reddit
I was bragging to my work mate that I one-shot a complicated pipeline dashboard that had a ton of moving parts. And it did. Once I left planning mode, got code generated, and saw it, I only had minor annoyances to fix. But the planning session on that dash took days because I kept coming up with edge cases that would certainly pop up.
That same day my hubby, a completely non-programmer, made a tool for work, and got it to create his tool in mostly one shot, too.
So both of us are sitting there talking about our one shot apps, and both apps weren't even on the same plane of existence.
I don't really have a point outside 'its all relative' but it's kinda neat to exist in this time period where words are randomly gaining and losing (sometimes simultaneously) meaning in real time.
optomas@reddit
Your suspicion has merit. I can offer a counter example, however. A very specific use case; C11 openGL CUDA interop, scientific visualization. The preamble is "wc coding_practices.md 158 1033 8570 coding_practices.md" 158 Lines, 8750 bytes.
The primary difference is context length, not code quality. Which is kind of a feature for developing programmers, no? Enforces separation of concerns in a very non-forgiving way. Once the habit of limiting translation unit length to 200 LOC is burned in ... it's difficult to not think in terms of modules and nodes.
TLDR; Not in my experience, but I come from the era of hardware limitations. I was already writing in a style that naturally fits into local LLM limitations. For monolithic programmers, I think your suspicion is spot on.
ttkciar@reddit
I can see how you might think that, if you didn't know I was a senior software engineer with 47 years of programming experience.
Or maybe I'm just too old to use these new-fangled tools correctly? /s
More seriously, my perception is that it's the other way around -- to inexperienced programmers, it seems like the less-capable models are better at codegen than they really are, because their standards for code quality are lower.
Either way, it is possible that both he and I are correct (like you said), because there are subjective and skill-relative factors impacting the perception of codegen competence.
xienze@reddit
I think there's a similar dynamic happening even with experienced developers. There's definitely a certain kind of developer that produces heaps and heaps of absolutely dogshit spaghetti code that does in fact work and meets the requirements quickly. Solve the immediate problem and move on to the next thing as quickly as possible is their MO. I can totally understand the love these kind of developers have for AI. It's probably producing code very similar to what they already crank out, perhaps even better. And when requirements shift or bugs come up, what does the AI do? Tack more shit onto the function that's already 3000 lines long and call it a day. Just like these guys do. And, it'll generate loads of unit tests to boot!
The problem is that this code, much like the code they've been writing their entire career, doesn't really handle edge cases, new requirements, and unexpected scenarios with the kind of elegance that more, shall we say, thoughtfully-written code does. And that's why I think the other class of experienced developer can't stand AI code.
RTDForges@reddit
I’m not as experienced as you. I’m in the 20+ years area, so you definitely have more knowledge and experience. Having said that I’ve been working to seriously understand AI as a tool and in the process found that a lot of my experience as a developer made me much worse at using AI. And I personally can say that after doing extensive work on my local infrastructure I get about 95% of what Claude gave me out of my local models now. A LOT of the magic isn’t the LLM itself. And Claude is a tool made for the general consumer base. Take an individual who can create that same type of tool but tailored to their specific case and it seems like way less of an outlandish claim.
Imaginary-Unit-3267@reddit
Can you expand on exactly what those "blind spots" are? What existing habits did you have that don't serve you well with AI, and what did you have to learn to replace them?
d5vour5r@reddit
I'm similar , 30+ years and I find the locals models are great at one shot coding, struggle on reasonable sized projects even with specific direction. I still use a mix of local and frontier models, I comments from younger co-workers and friends who try these models after seeing posts and comments like the plane thing and are then disappointed at the results. Even seen people buying m5 pro Max laptops, the with the result of local coding regret dropping that much money on hardware.
_PunyGod@reddit
I regret apple wouldn’t let me pay more for 256/512 or 1tb of memory
MastodonFarm@reddit
Or…he might be full of shit because he has an axe to grind (as telegraphed by the comment about “monopolized closed source”).
tertain@reddit
Did you know software engineers can use Opus too? Opus is much more capable.
ttkciar@reddit
Sir, this is LocalLLaMA.
olibui@reddit
Same knowledge. Frontier models with 2T prams wins a local model any day. Stop trying to lie to yourself
EbbNorth7735@reddit
Not going to lie. I don't find Gemini Pro any better than Qwen3.5 122B at many tasks. Perhaps my use cases aren't hard enough or obscure enough but I'm guessing I'm going to be completely blown away by Qwen 3.6 122B and wonder what in the hell I would need anything better for. There's a point where if you are able to feed a model the required information in context and it can determine the next step to take I can't imagine we'll need another model. Sure more models will continue to be released but for the average user there will be a point when a local model can perform 99% of the tasks they throw at it. After that the next Gen will just be a slightly smaller model able to do the same work load and so on. It feels more about harness development/comparison than model comparison.
Haiku-575@reddit
Your perspective might change if you tried Opus. Gemini does feel closer to Qwen 3.6 27B than to Opus, so I'll hand you that at least.
ruuurbag@reddit
Not to take away anything from anyone’s points, but Gemini 3 Pro is pretty terrible at coding. It’s not really a fair point of comparison when talking about closed source frontier models.
Glebun@reddit
It's the same as not being able to tell the difference between a shitty and great cut of steak if you cook them well-done.
ddchbr@reddit
Good take
ChemistNo8486@reddit
I don’t know… I have a 5090 and I have been using Q4 with 130K of context on the VRAM and the results are insane.
Probably not as good as Sonnet 4.6, but its definitely as good as 4.5; I can just leave the tasks there and bro will be working for 45-50 minutes on a single prompt without messing up its quality.
I know that my set up is not how the average user will use 3.6-27B, but I don’t think that the guy is over-hyping, at least for the non-super technical level of coding that most people need. The bottle neck is the price.
Plabbi@reddit
Have you tried different quantization of either the model or KV cache?
I have a 5090 and am using Q5_M / KV Q8 and can fit entire 262k context in VRAM, but don't really know if I should sacrifice the context size for either better model or F16 KV.
Would be nice if there was some standard test for this.
pyrojoe@reddit
I just setup Qwen 3.6-27B on my 5090 last night in vLLM. I'm using cyankiwi/Qwen3.6-27B-AWQ-INT4 with an fp8 kv cache with 200k context. (Everything just barely fits) I haven't had it write any code yet but it runs pretty fast and seems intelligent.
Hows the math work out that you can fit the full context in VRAM? I'd expect you to be slightly over in your setup.
Plabbi@reddit
I am running Unsloth q5_k_m (21.35GB file size) using llama-server in Win 11 and after filling the context with data I still have 1GB VRAM left according to HWiNFO64. The context is around 10GB.
Performance varies from 50 t/s when empty down to 30 t/s when full context.
Zc5Gwu@reddit
Is he though? I don’t think most people realize how strong a 30b model actually is. It’s rare that the dense model would hallucinate common facts for example. It’s like Wikipedia in your pocket.
ResidentPositive4122@reddit
Yes, 1000%. The creators of dsv4, a 1.6T model have openly said that there is still a gap to Opus.
Thing is, the small models are really cool, have become truly useful and we're lucky to have them. But exaggerating about their capabilities doesn't do any good. I'll take a local gpt5-mini / haiku level model any day of the week, and be happy about it. I think the small qwens, gemmas, even gpt-oss-20b can be used for real work, in the right setup and with a lot of elbow grease. But having used the SotA models as well, I agree with OOP 100%. Let's keep it real.
Far-Low-4705@reddit
honestly, i dont use closed models anymore, just because local models are free and i dont get rate limited after 5 messages like u do on free tiers, and local models score better than low end free closed models, so i wouldnt know
But, imo, i really do thing we have local models better than haiku... haiku kinda gets destroyed on benchmarks.
And ik benchmarks arent everything, but they do mean something. and i mean ofc closed will always be better, but the real question is if local models are better than the last usable closed models.
do we have a local model better than haiku 3.5 - yes.
imho, once a local model becomes capable enough, like qwen 3.6, it doesnt really matter for the majority of use cases.
novelide@reddit
Given it's pretty easy to rack up $20/month in electricity, I think a fairer comparison is with the $20 tier on cloud models. But when you hit usage limits with the equivalent of 1 prompt/hour (approximately what I get with Opus 4.7), local models still win in many cases even though the capabilities are definitely much lower.
Far-Low-4705@reddit
not really, my pc is already gonna be on, im only running a 35b a3b MOE model, and power draw cant be anymore than 200w tops. also 90% of the time the LLM is idle waiting for a request.
I recently graduated college, and i mostly just used them to check my work/math for engineering problems, and with the free teir, you couldnt use thinking models, or if u could u got like 5 messages/day, and it sucked.
It was just nice to be able to be more liberal in the messages, and if i wanted to, i could regenerate the response 3-5 times and see if the LLM got the same answer each time to see how confident it was.
FullstackSensei@reddit
Have the Qwen people said 3.6 27B is on par with Opus in everything?
ResidentPositive4122@reddit
No, the bloke in the plane did.
FullstackSensei@reddit
Please read my comment here
2Norn@reddit
im sorry but its mumbo jumbo
you basically said "just prompt better idk use markdown instructions or something"
and then a single anectodal evidence
do that 300 times for varying tasks of varying hardness levels and if u still think that then i'll believe you
FullstackSensei@reddit
Quite frankly, I couldn't care less whether you believe me or not. If you can't understand what I said, ask your local LLM to explain it to you. ✌🏻
2Norn@reddit
that's like your problem man
you are the one who thinks 27b is like opus, not me
it sits between sonnet and haiku, a bit closer to sonnet, anyone who thinks its like opus is hard coping
FullstackSensei@reddit
I'm running half a dozen instances of it in parallel and I'm quite happy with it. If that's offending to you, that's actually your problem, not mine.
spawncampinitiated@reddit
your happiness is not a benchmark
2Norn@reddit
i dont care what you use. use gemma or bonsai or whatever and think its like opus. thats not what im interested in.
but fact of the matter is the claim is not true. unnecessary hype that fools people.
best thing u can do with models like this is use them as worker/executor only and hope that they can give you sonnet/5.4 mini/glm 5 turbo etc performances or at least come very close. but more often than not they are closer to nano or haiku. but it's getting better.
InsideYork@reddit
Oi!
oe_throwaway_1@reddit
they give cameras to ANYBODY these days
iMakeSense@reddit
What do those setups look like
HopePupal@reddit
i think that's actually the worst test of a small dense model. they're great for staying on task and keeping their shit together over a long context window. they're less great for world knowledge — i don't expect something like Qwen 27B to be able to store that much trivia, and in fact it happily made up a bunch of shit about the neighborhoods of the major city i live in (transit lines that don't exist, etc.) that larger models have more room to store.
cromagnone@reddit
You’re not wrong. Last night I was idly watching old films on Netflix while 3.6 27B was downloading over a shitty connection. Both the film and the download finished at the same time so my test run command was “write me a feminist critique of “In The Line of Fire” starring Clint Eastwood.” It was like I’d dropped acid.
tat_tvam_asshole@reddit
that's why we have tool calling. the data doesn't have to all reside in the model weights and in fact very often is better to craft a response treating the Internet like a RAG database
sellyme@reddit
So's the $20 USB I loaded a Kiwix install and db dump on to in 2009.
Wikipedia in your pocket is cool but it's not exactly revolutionary any more.
coding9@reddit
Enjoying 4 minutes of coding when your laptop burns you and uses the entire battery haha. Plugged in, a little better but sooo slow
WhyNoAccessibility@reddit
It all depends on the memory and processor. I first started with a MacBook air M1 at 8gb which kept crashing out (hot swap); but now upgraded to a MacBook pro M4 pro 24GB.
It can still be slow sometimes for very large prompts. But I'm not having the constant pink screen restarts and battery drain
Due_Duck_8472@reddit
No it doesn't, it's about architecture, and component selection.
A Macbook isn't designed to be a compute center.
It will die a horrible fiery death and the battery will be toast in a few prompts.
WhyNoAccessibility@reddit
It's not designed to be sure, but that doesn't mean that it can't be run as one.
If you're not running anything else it wouldn't toast the battery. Two months of me trying this and my battery health is still above 96% (for a 2020 MacBook it's not that bad). The battery didn't toast instantly either, I'd get at least a few hours out of it.
I just couldn't do jack else
gusfromspace@reddit
30b is incredible, even something like 14b is good, if you know how to feed it the right data.
diddlysquidler@reddit
Also Mac battery will last about 40 minutes.
AngelOfLastResort@reddit
What model would you say is pretty close to Sonnet in performance that could run locally? Is it important to have a good RAG setup?
ttkciar@reddit
According to benchmarks, GLM-5.1 ranks slightly better than Claude Sonnet and slightly worse than Claude Opus. I cannot say from experience, though, since my hardware is insufficient to host it.
It really depends on your use-case. For general Q&A I have found that Wikipedia-backed RAG is really great for improving the quality of inference and cutting down on hallucinations, but for creative writing RAG does nothing whatsoever.
RAG is a really complicated and nuanced technology. There's an entire subreddit dedicated to it: r/RAG
AngelOfLastResort@reddit
Sorry if this is a stupid question but how much vram would you need to run GLM 5.1 locally? I see it has a total of 754 billion parameters with only 40 billion active at a time. With no mention of quantisation.
Grok said that good RAG was essential to a code local LLM coding experience lol!
ttkciar@reddit
My go-to quantization is Q4_K_M; I have yet to regret using it.
At that level of quantization, GLM-5.1 weights would take about 468GB, the inference overhead (mostly context K/V caches) would be another 56GB, and if it's a multi-GPU rig there would be about 14GB of overhead per GPU beyond the first.
For a four-GPU setup, that would come to 566GB of VRAM.
You're not going to get that on a laptop, but could cluster about ten MI210 on old Xeons for about $60K and gang them together with llama.cpp's
rpc-server, or wait for the first-generation eight-GPU hyperscaler servers to age out and appear on the second-hand market (probably some time 2030'ish), or if you're rich you could buy one now for about a quarter-million dollars.AngelOfLastResort@reddit
Okay, so it's not a local model then? Offline sure but not local.
ttkciar@reddit
No, you can download the weights from Huggingface, and I have done exactly that. If you're willing to wait for pure-CPU inference you can host it on a fairly inexpensive $2K Xeon with a buttload of DDR4 memory, but that would be far too slow for interactive use.
AngelOfLastResort@reddit
You can't run it on a desktop or a laptop. So it's not local. It's built for multi GPU server environments. It's not a local model.
There isn't a desktop you can configure today that would be able to run it. Not even 2 x 5090s. So it's not a local model.
ttkciar@reddit
It's a model you can run on your own hardware, so it's local.
AngelOfLastResort@reddit
Homelab!=local
Weekest_links@reddit
Also people need to know their Mac book air or pro might not have the vram or GPU/cpu cores to even handle 27B
InnovativeBureaucrat@reddit
I heard about Gemma and jumped on the local LLM bandwagon and so far it’s just been embarrassing. I thought I had a powerful system76 laptop (the fan is certainly powerful, and runs constantly) but turns out you need a Mac with shared memory to do much and this 4gb GPU is a joke.
This computer was $1700 in 2022. I thought it would be able to do more.
I’m thinking I should lean into cloud hosting, but it’s been exhausting to figure out.
I installed Hermes and Gemma but takes 5 minutes to respond to a prompt
tmvr@reddit
You don't need a Macbook, but you need a bit more VRAM. With an 8GB GPU and DDR5 system RAM you will get very usable speeds with the 35B A3B MoiE model.
IrisColt@reddit
The conceited goal is actually selling more MacBook Pros, heh
bluehands@reddit
This is interesting to me because my immediate response was, "so what?"
It is 1995 and getting online isn't easy, fast or very useful. People who just reject the internet now can feel smug & superior for another 10 or 15 years but everyone is online by 2010 or 2015.
Except for AI it is to going to be 15 or 20 years before it is absolutely everywhere, it's going to be 3 or 5 at most.
Time-Heron-2361@reddit
Context -> that something no one in the local llm community mentions. What good is I can run a model on my 48gb laptop if the context cannot exceed 32k? Its practically useless.
ttkciar@reddit
Yeah, that's a whole can of worms, but it's a highly relevant can of worms.
Agentic codegen really needs a lot of context, which means not only do you need a high context limit (and memory to match), but also a model whose competence does not drop off too rapidly at long context.
Also, the impact of K and V cache quantization on inference competence is more pronounced for codegen than it is for other kinds of tasks, which means your options to stretch memory are even more limited -- q8_0 is the most you want for codegen, and turboquant doesn't save you.
These issues are frequently masked from the user's perspective, at least at first, because non-agentic tasks frequently do not require high context (fewer than 2K tokens, in the common case, barring RAG) so K and V caches fit in VRAM, making inference very fast. It is not until they try to use that model for "serious" work that the cache spills to system memory and performance tanks.
These measurements are relevant:
https://old.reddit.com/r/LocalLLaMA/comments/1suh3sz/gemma_4_and_qwen_36_with_q8_0_and_q4_0_kv_cache/
olibui@reddit
Hey. Whatever gets you likes right? Facts are irrelevant
JacketHistorical2321@reddit
So...?
cmdr-William-Riker@reddit
I think the expectations for sonnet and opus are too high. It falls short constantly
Time_Cat_5212@reddit
Okay, so dum dums dum dum, and it gives everyone else more time to get ahead.
The more dum dums blame people and make stupid assumptions, the better, IMO.
kiwibonga@reddit
Skill issue.
bigsmokaaaa@reddit
Here's the thing though, when you start using opus/5.5 as an orchestrator, it offloads all the heavy lifting locally, and you save 85% with very little downside. Not much help on a plane tho, but that models pretty good without it
Ok-Measurement-1575@reddit
"Who the hell wrote this?!!"
Plabbi@reddit
What quantization did you use?
victorsmonster@reddit
lol of course he's the CTO of hugging face
This may all be true but this is the least objective source to get any information from
jacek2023@reddit (OP)
What source of information do you think is the best?
victorsmonster@reddit
https://wikifeet.com/
Clipthecliph@reddit
To be fair, here. You can actually separate the noise from what really works in comments of posts like yours. Until now, I had the impression that model would just work, but had some doubts. Now, after reading the comments, it’s clear it doesn’t. So thanks anyway.
Organic-Importance9@reddit
I have phi4-mini on every PC I own because its easier than digging through the offline manuals to figure out bash and zsh commands.
whoisyurii@reddit
isn't gemma 4 e2b / e4b a better option these days?
Organic-Importance9@reddit
Probably. I have that on one computer. Phi4 is just kind of habit
No_Count2837@reddit
Qwen3.6 27B "feels" like Opus. Not sure about that one. Maybe I haven’t tested enough.
kwinz@reddit
"MoST PeOpLe HaVEn't reAlIzeD ThIs yet."
KaiserFerdl@reddit
That’s why yo got a 48Gb Ram MacBook Pro.
roguefunction@reddit
What are the MacBook specs?
twistsouth@reddit
All of them.
g0pherman@reddit
How much ram the macbook needs to run that?
imetatroll@reddit
I have yet to work with a model that is even close to open ai's newest models. Maybe if I bought a special rig that cost thousands of dollars? But I suspect even then the token response speed would be terrible.
Chinmay101202@reddit
moment 2.0 soon.
sammcj@reddit
Wonder why they're running llama.cpp instead of MLX on Apple Silicon? Seems like throwing a lot of performance away (even with larger context sizes)
kelembu@reddit
how much performance are we talking about?
sammcj@reddit
20-35% roughly. Yeah I believe Ollama did or at least partially, however it's way behind things like oMLX
stinkycatuncle@reddit
https://youtube.com/@stinkyloud?si=LUVD6HN4WFuZeC9B
fxj@reddit
try opencode with it. thank me later
awsom82@reddit
What he trying to achieve? Why he is using Pi?
shamerli@reddit
I’ve been doing this and lecturing on it for the oast 3 years !
Legal-Tie-2121@reddit
M5?
Glad-Programmer-5505@reddit
Nice
OutlandishnessIll466@reddit
Yes it is super cool for the entire 15 minutes the battery lasts 😄 I never heard my MacBooks fan until I ran a model on it.
scott2449@reddit
Try Gemma with something like Forge 🤩
Chinmay101202@reddit
yep, 2.0 coming.
Express_Quail_1493@reddit
🔥🔥🔥
Clipthecliph@reddit
I have 16gb ram, is there any way it would still work? Even with swap if necessary
pppreddit@reddit
I am running 27B via omlx (Qwen3.6-27B-bf16) on my M4 Max 128gb and it takes forever to respond. omlx dashboard shows 38.8 tok/s for prompt processing and 3.7 tok/s generation
Pleasant-Shallot-707@reddit
Yeah. M4 is mid for pp
pppreddit@reddit
tbh, I am disappointed in how many mistakes it makes in the process, such as duplicating lines, then correcting itself, then going back and forth making corrections, it's such a waste of time
Melodic_Reality_646@reddit
says the dude rocking a 128gb ram m5max… in gpu poor language that’s like linustechtips saying a private jet is affordable.
ea_man@reddit
Well yesterday I bought a used GPU for 250 to run QWEN 27B.
root0777@reddit
Which gpu you got?
ea_man@reddit
AMD 6800
No-Refrigerator-1672@reddit
Even counting in all the crazy price hikes we have now, bulting up a PC that can run Qwen 3.6 27B fast enoung to use it for agents is below $1000, if you accept buying second hand parts. While I do agree that this is significant amount, there are many people who will spend this much on a phone, so it's fair.
phreaqsi@reddit
Legit question.
If you had $1000 right now, what would you buy with it to run Qwen 3.6 27B?
I don't mind second hand (although it'll be hard for me to source locally).
CheatCodesOfLife@reddit
Don't do MI50s.
No-Refrigerator-1672@reddit
If I'd have to target for $1000 exactly, and limit myself to Ebay and local markets, I'd choose 2x V100 SXM2 16GB cards with PCIe adapters - those go for $270 a piece on Ebay, so a pair will have enough VRAM to run Q4-Q6 quantized version at reasonable speed. Then I'd aim for used AM4 motherboard and ryzen 2600g - you'd find a combo for another $150. Make sure to buy a G series CPU, as V100 have no graphical output, and your system will fail to boot wihtout integrated graphics. To run this system, it'll be enough to use 16GB of DDR4 memory, as we're going to have entire model in VRAM, so it's $100 for RAM. Then another $100 for 750W PSU that can handle dual cards, and, say, $50 for a case - and you got yourself almost complete system for $940, I did not included SSDs and HDDs into the spec. You can get better with buying parts from China, more on that later.
I'll address some possible criticism beforehand. First, a popular choice for running self-hosted AI is AMD Mi50 32GB. Although it's capable of running 27B model within a single sard, it has some major limitations on software compatibility side of things, as well as right now it's price/performance ratio is very bad. Go for it if you can find one at $250, but it's not worth it to pay more. However, I'd insist on running Nvidia Volta or newer, cause then you can run vLLM, which has huga performance advantage over llama.cpp, especially for running multiple agents in parallel. People could also note that V100 have pretty bad idle power consumption, so you don't want to run this setup 24/7, but for a workstation that you use, say, 4 hours a day it'll be acceptable. Also, you can potentially buy CMP100-210 16GB, it's the same chip as V100 but in a mining package and for just $180 - however, mining versions have severely crippled PCIe bus, so it's performance in dual card case will be terrible, look it up at your own risk.
Alternatively, if we assume that you already have a decent PC, and only need GPUs - there are more interesting option for you. Alibaba.com provides quite a lot of upgraded gaming GPUs with double the VRAM. You can get 2080ti 22GB at 270 eur, 3080 20gb at 370 eur, and honourable mentions to 4080 32gb at 1300 eur, all prices exclude import taxes. A single 2080ti 22gb is a fantastic replacement for both my 2xV100 idea and Mi50: it still allows you to fit 27B model in Q4, have some KV cache space, and decimate both options in terms of speed for very comparable price. A pair of 2080ti 22gb will cost you a bit under $1000 when you factor in shipping fees, import taxes, etc; and has enough VRAM to run the same model at very long context lengths, which will improve your coding opportuunities significantly. A pair of 3080 20gb will be just a bit over $1000, and a 4080 32gb will be significantly overbudget, but it has the advantages of running the model in a single card at low power consumption and very good speed. If you're interested, I've detailed my experience with dual modded 3080 setup in this post, including details on how to purchase things from Alibaba.
u/LPitkin, I'm tagging you too so I don't have to send out this responce twice.
victorsmonster@reddit
lol yeah simple as that
No-Refrigerator-1672@reddit
That's actually how the entire world works: you either save money and spend your time and effort, or buy a quick and ready to go solution for significantly more money. Applicable to any field ever.
Acceptable_Pear_6802@reddit
Mac mini m4, 32gb ram, 256gb ssd
soshulmedia@reddit
How about a single used MI50 32GB in whatever rig you can build around it. I can run Qwen3.5 27B (didn't test 3.6 yet) @ UD_Q6_K_XL, 32kctx, ~15+ tok/s for short prompts.
LPitkin@reddit
I had no idea that it could be that affordable. Can you give me an example build?
ea_man@reddit
I run Qwen3.6-27B.i1-IQ3_XXS on a 6700xt, it costs like 200$.
I just bought a 6800 for 260 to run a bigger q4.
bnolsen@reddit
From what I understand your want to hit q8 with 3.6 27b if possible. I'm on a strict halo so I run it q8_k_xl
soshulmedia@reddit
I have qwen3.5 on a single MI50 and it works, see my other comment above.
AshuraBaron@reddit
You don't think a $5,000+ laptop is affordable? What are you, poor? /s
Time_Cat_5212@reddit
I thought my $2500 laptop was expensive until it lasted me 5+ years. $500 a year for a device I spend 40+ hours a week using is not a bad deal!
AshuraBaron@reddit
Wealthy people have the luxury of thinking long term, while poor people do not. They can't afford the upfront cost, so they are locked into cheaper options that do not last as long. It's a vicious cycle that is a feature, not a bug.
SufficientPie@reddit
In what sense is a vicious cycle a feature
Imaginary-Unit-3267@reddit
It's a feature if you're one of the elites who loves (nonconsensually) pissing on everyone else.
Time_Cat_5212@reddit
Yeah everything's just a big conspiracy by the elites to fuck everyone else over
Sheesh. I don't miss being 20 years old
SufficientPie@reddit
Oh I see
TFABAnon09@reddit
Aka the Sam Vimes Boot Theory of Economics
Time_Cat_5212@reddit
It's neither a feature nor a bug; it's just the way resources work.
Ea-Nasir, however, could not Klarna a shipment of copper bars. Today, we have options!
vulgrin@reddit
Easily fixed! Just tell Claude “make me unpoor” /s
iMakeSense@reddit
r/povertylocalllama
Toastti@reddit
On a m5 macbook air (32gb ram ideally) the qwen 3.6 32b actually runs really well, totally usable at smaller quants. Just need to make sure prompt caching is on and do expect to wait a bit for a initial response. But for sure usable
jacek2023@reddit (OP)
I’ve used Linux on the desktop since 1997, and throughout all that time I’ve seen people think that open source is about saving money. They believe we use “free software” because we don’t want to pay for things.
DominusIniquitatis@reddit
Kind of a problem with how it goes in English. In my language, we essentially have "free" for "as in freedom" and "costless" for "as in beer". No ambiguity.
jacek2023@reddit (OP)
I recommend reading https://www.gnu.org/philosophy/free-sw.html and https://en.wikipedia.org/wiki/The_Cathedral_and_the_Bazaar
(In Polish these words also sound different)
DominusIniquitatis@reddit
Already read the first one years ago. :)
Plabbi@reddit
Yeah those people.. I am totally not using it just because it's free.
AvidCyclist250@reddit
to not be shackled
for some, i guess that only means financially
tat_tvam_asshole@reddit
I'm just here for the free beer.
Ell2509@reddit
Wow. I had no idea but that makes sense.
Koksny@reddit
It's M5 Max, your comparison would be more apt if it were DGX Spark.
0xd34db347@reddit
That's a fairly small expense for a professional tool in a first world country. Hell my landscapers truck and the riding lawnmower it pulls both cost multiple times more.
69_________________@reddit
My M1 Max 64GB runs Qwen3.6 27B GGUF and you can get the same model used for around $1,300
jasmine_tea_@reddit
Pretty much
Individual_Zombie457@reddit
What does it have to do with anything?
He's talking about the model capabilities, not his laptop performance. He didn't even mention which M chip he has or the RAM.
InterstellarReddit@reddit
⚰️⚰️⚰️
AllNamesAreTaken92@reddit
You can measure how irrelevant and u reflected your opinion is by noticing you are making a comparison to a model you acknowledge to have never used.
You should be able to figure out you shouldn't post this all by yourself.
Terrible-Reputation2@reddit
"Headstart" makes it sound like we're in a race. Can we just not? This tech is clearly going to take over most areas in our lifetimes. People talk about a wave and say to ride it so you don't get crushed, but I think we humans actually have a choice to make collectively. Does this need to be something that crushes people so a few can become ultra-rich trillionaires? Or can we try to control that human greed and let the few get very, very rich, while also raising everyone's floor so they can live fulfilling lives? I'd say the most fulfilling way to live for most of us isn't trying to get and stay ahead of some crushing wave until we finally burn out and get crushed by it too.
SamSlate@reddit
what are you doing that's non trivial on a mackbook?
Ptxcv@reddit
What's with this overhyped-style writing lately... it's annoying.
Kalcinator@reddit
Everytime someone say "I'm not goona lie [...]" I immediately assume they usually lie
electrosaurus@reddit
These sorts of posts are just, gross.
I_HAVE_THE_DOCUMENTS@reddit
Might it just be a person who is really excited about something? What makes this "self-service hype".
electrosaurus@reddit
He co-owns Huggingface...
Pleasant-Shallot-707@reddit
So?
MartiniCommander@reddit
Running Qwen3.6 27B in OpenClaw so far has been a pain. Maybe it's using oMLX to host it. But the Gemma 4 31B is much more responsive. Qwen starts a task and just never lets me chat with it again but Gemma 4 stays live and I've been very impressed with it. Running the Gemma431B RotorQuant 8bit version and it's nice and snappy. The Gemma 4 26B-A4 is flat out instant. I haven't tried the Qwen3.6 35B-A3B but that will be in the mix.
Pleasant-Shallot-707@reddit
I did notice that 3.6 likes to just stop mid conversation.
Downtown-Art2865@reddit
most of these “it’s close to opus” takes quietly assume a very tight loop and curated context
Pleasant-Shallot-707@reddit
So, well designed tools and discipline….this shit is coming for everyone, even cloud model users as providers move to token based pricing and hallucinations and failed attempts start costing real money
covertpirates@reddit
I tried Qwen3.6 27B on open code (unsloth q8), but found it wasn’t performing very well. Qwen3.5 (same quant) did a lot better. Not sure if it’s my config or maybe I just got to wait for an update.
Pleasant-Shallot-707@reddit
3.6 fixed issues in 3.5 so it’s weird you’re getting better output from it
LetterheadFresh5728@reddit
Ya it's great for making a python script to write a poem if you have 15 minutes
These chatgpt LinkedIn bots are driving me insane
NitinJadhav@reddit
One day this will happen. Man just wants to be first to comment.
DrDisintegrator@reddit
But if that was Claude Mythos, it would be asking you where you'd like to have the plane land at.
:)
Status_Contest39@reddit
Thanks to Qwen!
gurilagarden@reddit
I wish this were true, but it's not, yet. It strongly depends on your language, task complexity, and error tolerance. The quality difference between local and frontier is very measurable, especially when comparing <40b to frontier. I have to this point compared locals to frontier on 7 different projects ranging from full-stack web apps, to python projects, and small rust apps, using the same harnesses and processes, and the output differences are visibly and functionally apparent. I recently stumbled into a meaningful quality gap between sonnet and opus when reasoning through a python-based desktop app. So my question for this asshole is simple. Where exactly would the headstart be? Even when we inevitably get there in the next few weeks, all it means is that my grandmother will be able to vibe-code doom, so if anything further advancement only levels the playing field.
Nsiem@reddit
"huge headstart for the second AI revolution"
We've been hearing this since the inception of LLMs, we were going to be "left behind" if we didn't prompt engineer, we were "not gonna make it" if we weren't using RAG.
All of this is funny to me because the AI models got better and left these old "methods" behind, I didn't need to build out a complex prompt engineering system to code because now we have that built into our models, I didn't need to learn RAG because agentic models come out of the box. It's always a "race" yet I never felt like I had to learn everything as it was coming out because by the time I would have learned it or integrated it into my workflow, it became obsolete. I utilize Claude code daily, I build out AI workflows in my projects easily, I leverage skills and plan mode to remove the need to prompt engineer perfectly, I work with my model to come to a solution rather than needing to handhold it all the way through completion.
I didn't participate in the race and yet I'm all caught up with those who are supposed to be "leaving me behind" 😂
MikePounce@reddit
I compiled llamacpp with CUDA, and regardless of using Q8 or Q4 GGUF of Qwen3.6-27B from Unsloth, I get 10 token per second on a RTX 5090. It is useable but rather slow. Am I doing something wrong?
NitinJadhav@reddit
Helped, thx
SnuffleBag@reddit
I have nothing against local LLMs, but what exactly is the head-start here?
I can probably count on one hand the amount of hours I spend every year where I don't have Internet but wish I did because I needed it for working. So I can now spend those handful of hours using a significantly worse coding agent instead of just working on something else?
smallDeltaBigEffect@reddit
Cloud inference is extremely unprofitable at the moment and prices will just go up from now on. A 70b model from last year is now worse than a 9b one. The trend will continue, for both I guess
SnuffleBag@reddit
Sure, but right now that's not my problem.
For this future that's supposedly now, the price of that laptop pays for about 6 years of a pro/max cloud model that currently gives significantly better results.
Yes, prices will go up for sure. Quality and/or latency will become worse. But that future is not here yet. And at the end of the day, I'll still be on the hook for some $8000 laptop to participate in that future when it does arrive - in no small part thanks to cloud inference vacuuming up all the components.
I have no doubt local inference will become huge and hugely important. But the practical future is not here yet when it comes to comparing to frontier cloud models.
smallDeltaBigEffect@reddit
Curious to see. Let’s look back in a year
!remindme 1 year
RemindMeBot@reddit
I will be messaging you in 1 year on 2027-04-25 12:32:28 UTC to remind you of this link
CLICK THIS LINK to send a PM to also be reminded and to reduce spam.
^(Parent commenter can ) ^(delete this message to hide from others.)
Healthy_Bedroom5837@reddit
sure is , check out box on android its got alot ! https://github.com/jegly/Box
akira3weet@reddit
qwen3.6-27b is very cool, best part is it's a good size to barely fit in my setup, 16GB+6GB VRAM.
nsshing@reddit
While i do believe we are gonna run a Opus4.6 on Macbook Pro, a Qwen 27B cannot do much useful things.
OverjoyedBanana@reddit
Sovereignty: dude running a chineese model that has been produced god knows how with illegally donwloaded media.
spencer_kw@reddit
every time someone claims a 27b model matches opus i ask them to run it on a codebase they actually know well. not a benchmark, not a toy project, their actual production code with all the weird conventions and edge cases
the models are genuinely impressive for their size but the overclaiming does more harm than good. sets people up to be disappointed and makes the whole local community look like it can't self-assess
GFrings@reddit
I mean you can also just point them at the actual benchmarks
arguingwithabot@reddit
Ya that should be the benchmark: can you use it for a production code base. Gemma 4 on a maxed out macbook m4/m5 actually comes pretty close for my team but we still can’t justify moving off Anthropic right now.
Maybe someday! But for finance dept it’s a capex vs opex question and capex isn’t favored these days across the board.
It does seem that local/edge AI is encroaching on frontier SaaS but it’s not always the practitioners choice.
Funkahontas@reddit
Oh definitely someday. Maybe this year lol
deepspace86@reddit
Yeah I always ask people to try to pull down an open source project and just try to have their hyped up model try to identify one small piece of tech debt to fix with a TDD workflow and it typically does not go well.
9r4n4y@reddit
So do you think 27b or 35b matches opus 4.5?
matrik@reddit
With all due respect to qwen team, comparing a 27B model with a 5T model? Dude..
sooki10@reddit
While I do love the model, and it is impressive for local coding, it is quite far from opus and he should avoid that comparison as it weakens his point.
Zeeplankton@reddit
I fucking hate twitter. This guy is maybe ok, but it's full of this exact type of weird hyperbole and lying.
Like 3.6 is good but no, it's not opus, and no, you're not getting actual work done with it on a MBP. the TPS crushes utility.
tmvr@reddit
It's not even Sonnet 4.5 imho. Well, the "old" Sonnet 4.5 from a few weeks ago, before the recent shenanigans. Whatever it was doing end of this week, it felt like a different model compared to the one I was using for months before. I stuck to 4.5 even after the release of 4.6 so I've noticed when it changed how it behaves.
tat_tvam_asshole@reddit
honestly, truly honestly, I work in the field and if the sophistication of open source agentic orchestration could approach what flagship has, you'd be surprised how much closer in real capability they are/could be. so much of the intelligence isn't even in the model itself per se
s-Kiwi@reddit
Claude Code source was leaked, we can literally 1:1 copy what flagship has
tat_tvam_asshole@reddit
lol, you think client side is all there is?
how quaint
dan-lash@reddit
Don’t even need to, you can use CC with your own models / servers
Important_Quote_1180@reddit
LoRa adapters are making this less true in my work flow in research and creative writing. Game blueprints are still a frontier but it’s getting better every day it seems.
jacek2023@reddit (OP)
I understand that based on benchmarks reddit people say that "you are not allowed to compare local models to sota models". But in my home project I replaced codex/GPT 5.4 with pi/gemma 26B and it's fun to work this way. So I can do things no matter what reddit thinks.
ILikeBubblyWater@reddit
Close to Opus? Yeah right.
jrexthrilla@reddit
I’m thought this was linkedinlunatics
thecuriousrealbully@reddit
Can we have a small LLM that transforms text to unslop, unhype and make it normal human text.
TronAres25@reddit
I still don’t understand how any of this works lol
havnar-@reddit
Not using oMLX for the wold to see, what a putz
4DXP@reddit
Very cool. How is the speed? Tested it before and it was pretty annoying
sergeialmazov@reddit
Ok, that’s why American airline adopt internet on board so quickly
andy_potato@reddit
Qwen 3.6 is a good model. But putting it in the same league as Claude or Codex is just delusional.
debtofmoney@reddit
How much memory is needed to run Qwen 3.6 27B on an MBP, and which quantization model is most suitable?
jokedoem123@reddit
Am I the only one having trouble running these LLM's on my 16GB RAM (which I found to be pretty decent for any other tasks)?
Itchy_elbow@reddit
Are you running the MLX optimized version? Runs a lot faster on apple silicon via LM Studio than the GGUF. Glm-4.7-flash is also pretty decent, as is gemma4:26b, all on apple silicon with full GPU offload if you have more than 24GB RAM.
olstrom@reddit
I’m starting to wonder if it is not part of the marketing strategy of Apple and NVIDIA to pay all these people to share their shitty analyze. How is local close to frontier models in cloud? And why would these apple silicon be way better compared to cheaper solutions ? They never talk about performances. I also can run a model on a raspberry pi zero.
Winston-Turtle@reddit
offtopic: should i use llama cpp when i’m on mac? or should i use mlx-lm instead?
CheatCodesOfLife@reddit
llama.cpp unless the model isn't supported by ie ie, deepseek-4
Lucky_Yam_1581@reddit
Before leaving US i had the choice to trade my 32 GB m1 max with 64 gb mac mini with just 500-600 dollars. I thought i’ll wait till i could save upto 128 GB. Worst decision ever! This model could run so well on even 64 GB!
Synor@reddit
Burning hot screaming laptops in plane. Yes.
I_HAVE_THE_DOCUMENTS@reddit
There's no way a laptop fan would ever be audible on a plane over the background noise.
IntroductionLive4027@reddit
Just vibe coded a mitmproxy plugin to block yt ads using qwen 5.6 on a 5080
rkh4n@reddit
does it work though?
tainted_vagina@reddit
It's interesting to watch these models improve so quickly. Buying a laptop in a year that can run the latest local model better than the current sonnet will be a very interesting time for AI.
I do wonder at what point, if any, these large companies find themselves looking at subscription numbers and realising their long term investment strategy may not work out.
AntisocialTomcat@reddit
Please stop shitting on this guy, he’s both inspiring and chill. Like in the restauration business, it’s pretty rare. On another note, he’s the CTO of Hugging Face, so not a clown like Musk or Altman, he’s pretty knowledgeable. I agree this kind of LinkedIn cringey writing style is spreading like wildfire, though.
Icy_Distribution_361@reddit
I don’t understand these people who seem to have a need to write long hype posts on x.com. Or maybe I do, because it’s always a subtle form of self aggrandizement; me so smart for doing this.
Dry_Yam_4597@reddit
Have you been in a corporate office as of recently? It's a mad house. It's as if they've all gone mad, or simpler, corporate selects the worst among the gossipers and the drama queens.
Icy_Distribution_361@reddit
No I have not. I’m a therapist luckily. I don’t do corporate office
Dry_Yam_4597@reddit
You must have a lot of clients from that area :)
Icy_Distribution_361@reddit
Some. I work in a country with good health care, so we also see a lot of people who wouldn’t be able to pay for it themselves. All layers of society really.
tat_tvam_asshole@reddit
can confirm
En-tro-py@reddit
Well... You don’t get to the top without a little NPD, now, do you…?
rdsf138@reddit
The person is just quite literally praising open source. I have no idea where you read the rest, but Reddit does have the proclivity of character-assassinating people for no good reason.
darthrobe@reddit
I did this today as well. No WiFi necessary.
Pretty_Challenge_634@reddit
Sick, that jet engine gonna be working overtime powering a refridgerator.
the_ai_wizard@reddit
One thing ive not understood w/ open source: how often can we expect them to retrain the model with CURRENT_INFORMATION?
Tons of APIs change, language updates, etc. Do they work well enough with tools like search?
WithoutReason1729@reddit
Your post is getting popular and we just featured it on our Discord! Come check it out!
You've also been given a special flair for your contribution. We appreciate your post!
I am a bot and this action was performed automatically.
Local_Phenomenon@reddit
I just want to feel the same as he does. It is cool not gonna lie but I know that flagship is where it's at.
Kinky_No_Bit@reddit
Hardware is now catching up to what AI is demanding of it. As we improve the hardware, and the software is improved, each generation will get better. We are barely at gen 2 right now for hardware. Wait till we can have local hardware designed to run AI, built in, and running for a few generations of improvement. Combined with next gen RAM. its gonna be really interesting.
dealingwitholddata@reddit
Is there a guide for setting this up? Havent done local models since OG llama, right in llama.cpp cli
Due_Duck_8472@reddit
But it's all lies ... LIES ... you can't run a model like that with any meaningful productivity, on a small cheap laptop .. IT'S.JUST.NOT.POSSIBLE.YOU. STUPI.....
What is really up with all these false witnesses on this board, spewing out "facts" and pure fantasies .. claiming that "Yes it's possible to outsmart a 1.5T model with a tiny quant of a 27B model".
LIES!
And for what?! The "algorithm"? For likes? For kicks and giggles?
I tried .. it works, horribly slow, and it's stupider than the village idiot.
Plabbi@reddit
He didn't say that it outsmarted Opus, he in just said it was getting close to being as good.
And I am running Qwen 3.6 27B Q5_M and it's not slow, I am getting 50 t/s which works fine.
Ok-Employment6772@reddit
Its nowhere near opus, but the small local models are indeed getting very good
AvidCyclist250@reddit
why is the noob overselling?
Neex@reddit
If I wanted to read posts on X.com, I would go to X.com…
ptear@reddit
The masses will likely be using one of the frontier models and from people using Starlink from their flight. At least that's where I think the mainstream will be.
Beginning_Ad1977@reddit
What would be an on pair laptop setup or minimal requirements to get comparable performance of a MacBook pro on a Linux machine?
jacek2023@reddit (OP)
I don't use a MacBook for LLMs, but I think 3090s are a better choice.
2Norn@reddit
that is a bit sensational but we are getting closer for sure.
new generation nvidia cards will come, new macbook pro, mac studio are right around the corner, models getting better, quantization getting better. tech is moving forward from like 5-6 different routes in parallel.
we are just on the verge. i give it 6 months and then we can have actual sonnet performance on the go forever free.
jacek2023@reddit (OP)
"new generation nvidia cards will come" I was hoping for new GPUs from AMD/Intel
logic_prevails@reddit
Rip batterty life though
InterstellarReddit@reddit
Don’t worry you can plug into the shitty airline outlets that give you Max $45 something like that
Acceptable_Drink_434@reddit
Did you know that power outlets and chargers can be used to send data packets? 😈
logic_prevails@reddit
My shit always maxes it out lol
phreaqsi@reddit
nah, it's in airplane mode.
/s
otterquestions@reddit
They think qwen is going to be opensource forever, like it’s a charitable exercise or something.
dwittherford69@reddit
Has bro ever used Opus? This would be closer to Sonnet 3.5
kiwibonga@reddit
Fuck Apple and Fuck Macs.
But that is cool.
jacek2023@reddit (OP)
I don't consider myself an Apple fanboy, see this picture and judge for yourself ;)
https://www.reddit.com/r/LocalLLaMA/comments/1osnnfn/how_to_build_an_ai_computer_version_20/
kiwibonga@reddit
I don't need a flowchart to know I would stab a mac with a screwdriver...
Popular-Factor3553@reddit
Whats quant?
Better-Struggle9958@reddit
One question: do you often work with your LLM on a plane? Do you have space to put your laptop in front of you on the plane? It seems like business class is trying to prove something to us?
microdave0@reddit
MutinybyMuses@reddit
1000 per second?! That’s rookie numbers
power97992@reddit
Qwen 3.6 27b is probably worse than sonnet 4.6… he is overhyping it but u can get good results with glm 5.1 and ds v4 pro and flash.
BeaveItToLeever@reddit
No probably about it, it's certainly still quite a long shot from sonnet 4.6. If there was a local equivalent of Sonnet 4.6 that could run on a MacBook, that would be an actual revolution. Sonnet is still an incredibly impressive model
power97992@reddit
Q2-q3 Minimax 2.7 can run on a macbook , but At this quant, quality may deteroriate a lot
iamapizza@reddit
This is where we are? Frankly embarrassing to be associated with future linked in lunatics like this.
tken3@reddit
How much Ram would you need for a model like this?
o0genesis0o@reddit
Love that model, but this is so BS that it would cause more harm than good. That model no where near opus class, and running a dense 27B on whatever M series chip that MacBook is is going to be crawling in agentic coding, making it unusable.
Informal_Warning_703@reddit
Okay, but saying “monopolistic closed source API” makes you sound like an unhinged ideologue, which calls into question your ability to accurately assess the quality of closed source models compared to open source models… or actually just open weights models.
_lavoisier_@reddit
Compared to Opus? Lol, of course not
Odd-Government8896@reddit
Dudes laptop battery lasted 36 seconds. Long enough to sbap the pic. What a fun flight. Hope it was local.
G1fty_14@reddit
I’ve just been doing some coding with the same setup. I found that for more simple work, it’s quite powerful. It did get stuck in some tasks and I had to help it find its way, and in one particular task, I had to do the implementation itself.
With that said, it’s running locally on my laptop and still produces some good stuff is quite incredible
iMrParker@reddit
Lol the 16" MacBook pro fans are loud as hell when doing inference. I can't imagine sitting next to this guy on the plane. I guess the plane would drown out the sound
AshuraBaron@reddit
New feature idea for Airpods Pro 5. ANC specifically tuned to macbook pro fan sound during local LLM use.
Direct_Turn_1484@reddit
Yeah the plane is pretty loud too. Guys probably sitting in first class too, so a lot of people are going to have fancy noise cancelling headphones.
OopsWrongSubTA@reddit
qwen3.6-27B, pi-dev, llama.cpp/ollama, ... OK.
But how do you 'sandbox' pi-dev ? (I'm a linux user)
ResidentPositive4122@reddit
Devcontainers for 99.99% of usecases. Quick, easy to do per project, integrated w/ vscode and comes with lots of batteries included (port fwd, etc).
micro-vms + some glue for the cases where you're afraid the model is gonna try to find priv esc in the devcontainer :)
Grouchy_Ad_4750@reddit
You could lock it inside docker container and only expose it to dir with code
Fit-Produce420@reddit
This is why we didn't get an open weight 130B dense Gemma 4 that was leaked - it's too good, there's no need to pay per token and it fits on reasonable hardware.
toothpastespiders@reddit
I've been stubborn as hell about not upgrading my system since costs skyrocketed. I'd do it for a 130b dense gemma 4, no question. I'd probably do it for a 130b MoE. I'm loving the 31b. But man, I just keep thinking what the same model bumped up that much would be like.
jacek2023@reddit (OP)
based on my experience with gemma 26B this may be true, 124B was a threat to Gemini
Fit-Produce420@reddit
It was a threat to everyone.
They'll release it when they need something to show off, and it won't be a trillion parameters.
goatchild@reddit
'Most people haven't...' Fuck off
maraluke@reddit
Before it’s revealed this photo is generated by gpt-image-2
Important_Quote_1180@reddit
I’m about to do exactly this but with Qwen3.6 29b A3B. It’s a reap of the 35b MOE. My CC downloaded and 1 shot the config getting 60-80 toks going down to 40 after 200k context. We live in magical times
the_koom_machine@reddit
op unironically takes his news from people who pay for the twitter blue checkmark
ForeverPrior2279@reddit
Is llama.cpp better than omlx for mac?
AXYZE8@reddit
No, oMLX is best app/engine you can use on Mac.
If you are wondering anout this post - that guy from post is co-founder of Hugging Face. Hugging Face acquired GGML (so llama.cpp) 2 months back https://reddit.com/r/LocalLLaMA/comments/1r9vywq/ggmlai_has_got_acquired_by_huggingface/
DarkArtsMastery@reddit
I confirm that. I have been personally transitioning to local first in the last few weeks and I'd say for 95% of cases local is definitelly there with the quality of big proprietary models.
bobaburger@reddit
Great. But tbh, I don't think it's safe and polite to run local LLM on an airplane, mid flight 😂
Crafty-Confidence975@reddit
He’s got the right word there. It “feels” like using a coding agent with a frontier model. Because it doesn’t fail immediately and seems to be doing stuff. But it’s definitely not on the level of the frontier models.
magnus-m@reddit
power consumption and speed is a concern. also agent harnesses speed up using sub agents, and therefor need concurrent support request support -> more vram/ram need.
still cool. i have used oss-20b on a plane, but for searching in codex and not antigenic coding.
jacek2023@reddit (OP)
"power consumption and speed is a concern" you at all the people crying on Claude Code limits, it's just a start
ProfessionalJackals@reddit
Copilot just released GPT 5.5 for 7.5x multiplier. GPT 5.4 was 1x... You think Claude Code people are crying, things are going even more whaky over there. Very sure that they older cheaper models are going to go out of the door fast.
stoppableDissolution@reddit
Oh, for sure. It feels like stealing to have 5.4 run for 10-15 minutes at the const of one request and it absolutely could not last long, lol. Was nice tho.
jacek2023@reddit (OP)
I use Claude Code for work so I see CC crying. I also pay for ChatGPT Plus so I am familar with Codex crying :)
magnus-m@reddit
yeah it is getting good. i really like the release of good small/med size model
cosmicr@reddit
/r/linkedinlunatics
rebelSun25@reddit
Sigh. Such a hyperbolic statement and it feeds others to do the same. I just saw a post on X, where someone got qwen to create a simplistic FPS camera walkabout 3D demo, and called it "Complete raycasting engine"
Let's do better