Went to the monthly AI dev meetup

Posted by nathandreamfast@reddit | LocalLLaMA | View on Reddit | 38 comments

Usual crowd. Everyone's on Claude or Codex, nobody's really sure how any of it actually works, and that's fine, that's the vibe. Then there's this guy. The Claude guy. You know the type even before he speaks. First thing he wants to know is what I'm running. I tell him: GLM, custom multi-agent setup, local small LLM routing traffic between GLM 5.1, Kimi K2.6, MiMo v2.5-Pro and a few OpenRouter models, all hitting a bleeding edge llama.cpp build I access over WireGuard wherever I am. He looks at me like I'm speaking another language. "So... not Opus?" Not Opus. Not Codex. Not anything with a pricing page and a friendly little UI. He doesn't know what to do with this information. Someone throws out a challenge. Build a working browser game, go. I paste the prompt in, agents fan out and start doing their thing, and I close my laptop lid. That's the whole move. Years of refining this XFCE4 setup means they just keep working with the lid down. Autonomously. While I get a coffee. I crack the lid once to check progress and the guy next to me is staring at the compaction logs scrolling past. "What is that." I tell him it's Qwen3.6-35B-A3B-uncensored-heretic-Q5_K_S.gguf doing over 200 tokens per second just eating through context compaction on local hardware. He goes quiet. Fair enough. The Claude guy is not having a good time. Toggling between plan mode and build mode. Sweating a bit. The kind of focused where you can tell things aren't going well but he hasn't admitted it yet. My Telegram pings. App's done, deployed, playable in the browser. I didn't touch anything after I closed the lid. His screen is half a game that doesn't work. He stares at it, closes the laptop, and walks straight out without a word. One of his mates looks over at me. "You just made a big mistake today buddy." I thought about it for a second. "Don't mess with local LLM guys bro." Nobody said anything after that.

Reply to Post

38 Comments

[-]

ExoticYesterday8282@reddit

The funniest part is that local setups always sound fake until someone watches them actually work. People think “local AI” means opening LM Studio once every two weeks. Then they see autonomous agents still running after the laptop lid closes and suddenly the vibe changes.

[-]

Nyghtbynger@reddit

Does the author assumes that sleep mode is disabled when shuting down the lid, or is it just using Tmux on a distant server ?

[-]

nathandreamfast@reddit (OP)

Well the setup does have some truth to it. I never liked sleep mode. If I close my laptop lid, it runs as normal except the screen is just black.

[-]

Nyghtbynger@reddit

wow. Does it ruins your battery ?

[-]

nathandreamfast@reddit (OP)

Could be why I always have it plugged into power lol, I don't use it much with battery.

[-]

Spare-Leadership-895@reddit

yeah, i'd keep pi/hermes on the agent side and let the gateway/router do the dumb routing. regex + length checks catch more than you'd expect, and then a tiny model can handle the ambiguous stuff. once planning, tool routing, and edits are all in one loop, it gets messy real fast.

[-]

false79@reddit

bwahahahaha this is such a good read. People just don't understand the fire they are playing with.

[-]

nathandreamfast@reddit (OP)

just a fun little sh*t post, sort of like I imagined what would happen if I did go to the local ai meeting the night before. Turns out it was nothing like the above. Still I was the only one who didn't use claude or codex, and there really was a claude guy sort of like that.

[-]

iansltx_@reddit

Local "full cycle AI software" or whatever meetup last night was a lot of folks dual-wielding CC + Codex because folks are bracing themselves for subsidies evaporating. Presentations were about figuring out value delivered mapped back to tokens spent. People were starting to get non-frontier-curious and even local-curious (though the presenter mentioned running a model on a Mac mini lol) because folks know the pricing tidal wave is gonna hit. Presenter said subsidies will be gone in a year. I bet they'll be mostly gone (at most 70% off API pricing for frontier labs) by year-end. Which will cut usage to the point that these places will have a harder time cornering the market on hardware, which makes local even more viable...even if MiniMax M3 is the last local model we get (it won't be the last local model we get).

[-]

nathandreamfast@reddit (OP)

Local is indeed the future, it's why I believe in it so much. Power to the people really.

[-]

cheechw@reddit

This is clearly satire that's going over everybody's heads lol. I thought it was funny, OP.

[-]

nathandreamfast@reddit (OP)

What else to expect from Reddit lol, was just a bit of fun really. I made it obvious ai slop because well, that's part of the bit. Ironically enough I had used sonnet to refine it, but don't tell anyone. ;) lol

[-]

AuggieKC@reddit

These local AI fanfics are getting weird.

[-]

nathandreamfast@reddit (OP)

I sort of like the niche genre, local llm ego porn sort of stuff. :)

[-]

mr_Owner@reddit

Went to similar event for work, it felt like this: https://preview.redd.it/tluco36byq3h1.jpeg?width=1080&format=pjpg&auto=webp&s=4d1d51a55ac42623e82151cc9c1aa382473d87b3

[-]

datbackup@reddit

> The Claude guy is not having a good time. Toggling between plan mode and build mode. Sweating a bit. The kind of focused where you can tell things aren't going well but he hasn't admitted it yet. I mean I don’t want to say it since this was a pretty entertaining read but in this paragraph suddenly my slopdar started pinging hard

[-]

Top-Rub-4670@reddit

> I tell him: GLM, custom multi-agent setup, local small LLM routing traffic between GLM 5.1, Kimi K2.6, MiMo v2.5-Pro and a few OpenRouter models, all hitting a bleeding edge llama.cpp build I access over WireGuard wherever I am. > You know the type even before he speaks. Yeah, I know the type.

[-]

areslica@reddit

Thinking about remote connection too. Are you also doing remote wake up or just keep the server running?

[-]

More-Curious816@reddit

now make it spy thriller Jason Bourne fanficion

[-]

wallphaser231@reddit

Once I flexed my function-gemma setup on my S23 ultra running on 12 GB ram, and this highly unimpressed person said "so what, I run opus on my phone" and pulls out the claude app 😭. I LOST.

[-]

UnWiseSageVibe@reddit

This is interesting, Can you describe your setup a bit more indepth ?

[-]

nathandreamfast@reddit (OP)

Sure. The setup is accurate enough, although not the actual events lol. The router thing I am experimenting with isn't too great. I had tried a mix of just regex detection and length stuff. A smaller LLM with a prompt to determine how to route sort of worked. Maybe some fine tuning. When it works though it's great! Complex stuff goes to cloud, simple file edits or tools will use llama.cpp. Otherwise I do have a wireguard to llama.cpp. Opencode has agents and sub agents that can work autonomously, which is a mix of cloud providers and local. It does use Qwen 3.6 35b for compaction too, fitting in the 5090 it can do 200tps steady.

[-]

Icy-Pay7479@reddit

I’m getting 70 t/s with 27b. Have you tested it? I’m worried about the quality of 35b for coding.

[-]

nathandreamfast@reddit (OP)

As much as I love local LLM, I more rely on cloud based llms to do the planning and heavy lifting, and sometimes qwen can be good for the file edits and code updates from the plan as it's much faster. Compaction too. Regardless if local or cloud is writing code, after I'll usually have another llm sub agent code review it after the task is done, which catches stuff most of the time. It'll fix up from the feedback and then continue if the reviewer is happy. I'd more rely on the 27b for coding. 35b is ok for other things that don't require too much thinking. With MTP I was able to get about 100 TPS with the 27b with this config. model = /models/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-NVFP4-Q8_0.gguf chat-template-file = /models/qwen3.6_merged_template.jinja ctx-size = 196608 b = 8192 ub = 2048 parallel = 2 spec-type = draft-mtp spec-draft-n-max = 2 spec-draft-p-min = 0.75 cache-reuse = 1024 ctx-checkpoints = 128 mmproj = /models/Qwen3.6-27B-mmproj-BF16.gguf temp = 0.6 top-p = 0.95 top-k = 20 min-p = 0.05 presence-penalty = 0.6 repeat-penalty = 1.0 cache-type-k = q8_0 cache-type-v = q8_0 sleep-idle-seconds = 300 reasoning-budget = 16384 chat-template-kwargs = {"preserve_thinking": true}

[-]

Icy-Pay7479@reddit

Thanks for sharing. I’m 100% aligned with your philosophy, I just haven’t found the right stack. I’m playing with pi and Hermes, but a gateway with a router might be better. It sounds like you’re taking that approach. If you haven’t already, that would be worth its own post. Lots of folks interested in this topic!

[-]

jikilan_@reddit

Nice story. When will be the next episode

[-]

mike7seven@reddit

LLM written world of fantasy. The verbiage, the punctuation and prose, all LLM giveaways .

[-]

nathandreamfast@reddit (OP)

That is exactly what it is lol, the local setup though does have some truth to it.

[-]

thread-e-printing@reddit

Stop larping

[-]

ConsciousEar877@reddit

What hardware did u use?

[-]

nathandreamfast@reddit (OP)

I wish I could run all the big stuff locally, I have a 5090 and 4090 in one desktop. It's enough but.. still never seems enough. :(

[-]

nathandreamfast@reddit (OP)

[-]

desexmachina@reddit

I have the same sentiment as you. But TBH, I was super disappointed in the doomerism sentiment on this sub when OpenClaw came out. I didn’t get it.