DeltaSqueezer

Why don't we still have any games with AI agents used as NPC characters?

Posted by Another__one@reddit | LocalLLaMA | View on Reddit | 110 comments

AI assisted music creation

Posted by DeltaSqueezer@reddit | LocalLLaMA | View on Reddit | 6 comments

Would you use a very fast context layer on top of your existing OpenCode/Claude Code instance?

Posted by Winter_Educator_2496@reddit | LocalLLaMA | View on Reddit | 31 comments

Would you use a very fast context layer on top of your existing OpenCode/Claude Code instance?

Posted by Winter_Educator_2496@reddit | LocalLLaMA | View on Reddit | 31 comments

DeltaSqueezer@reddit

I guess tracking usage is one way. I'm not sure how you disambiguate. I currently have over 100 tabs open and over 20 YT tabs open. Which one does it pick? It what's the probability of getting the right one? I prefer to just paste in the url so you can ask: "<my prompt here> on this video: URL". This way url is specified, no ambiguity and no tracking required. I also work regularly across 3 different machines, so you'd also need to sync across them or have gaps failures due to not having the context across machines. It's an interesting idea and might be right for some people.

Would you use a very fast context layer on top of your existing OpenCode/Claude Code instance?

Posted by Winter_Educator_2496@reddit | LocalLLaMA | View on Reddit | 31 comments

Would you use a very fast context layer on top of your existing OpenCode/Claude Code instance?

Posted by Winter_Educator_2496@reddit | LocalLLaMA | View on Reddit | 31 comments

Is agenting usage increasing CPU usage for you?

Posted by superloser48@reddit | LocalLLaMA | View on Reddit | 10 comments

NVIDIA’s Vera CPU in Detail: High Perf Chip Takes Aim at Broader AI Server Market

Posted by -protonsandneutrons-@reddit | hardware | View on Reddit | 27 comments

GPU Prices. Buy now, or buy later?

Posted by knob-0u812@reddit | LocalLLaMA | View on Reddit | 108 comments

DeltaSqueezer@reddit

In my local market the RTX Pro 6000 cost $8,300. I ordered one on credit and then chickened out and cancelled it. Now it costs $11,000. A 30%+ price rise in a few months. My fear is that we are the early ones and so this is only going to get worse. I was hoping that next get GPUs might come out and push prices lower or allow more performance for same dollar, but now I'm wondering whether demand is going to grow way faster than supply and keep prices going upwards. Nvidia has no real competition in the discrete GPU space for AI and no incentive to reduce prices. Heck, it's hardly worth their time to even create and market such products - from a financial perspective they should just design and produce datacenter products for the next few years.

I built my own HNSW from scratch, here is what I learned

Posted by Scared_Animator9241@reddit | LocalLLaMA | View on Reddit | 2 comments

DeltaSqueezer@reddit

It's always good to get hands dirty and do the implementation, only then do you _really_ know. I have a vague idea of how transformers work, but I don't really know and couldn't implement one from scratch from memory. Until I implement it, I will not really understand it. Unfortunately, time is a limited resource and so I have to pick and choose what to go deep on and have to 80/20 the rest.

Cost Analysis of my $6.4k Local LLM Server

Posted by 1ncehost@reddit | LocalLLaMA | View on Reddit | 73 comments

DeltaSqueezer@reddit

>I have ZAI's best plan, which is currently $144/mo, and it is allowing me about 4.5M input tokens and 200k output tokens of GLM 4.7 per day. Why is your limit so low? I'm using GLM-5.1 on the middle tier plan and in the last 30 days I have well over 1 Trillion tokens total (input and output).

Whisper.cpp is underwhelming

Posted by Larkonath@reddit | LocalLLaMA | View on Reddit | 19 comments

For those creating personal assistants locally - how has short/long term memory impacted your experience?

Posted by GrungeWerX@reddit | LocalLLaMA | View on Reddit | 51 comments

DeltaSqueezer@reddit

I've implemented things, but so far have not felt the need to implement memory. Then again, my AI assistant is definitely just a tool and not a 'he' or 'she'.

PCIe Gen5 Switch vs new MB

Posted by NaiRogers@reddit | LocalLLaMA | View on Reddit | 18 comments

PCIe Gen5 Switch vs new MB

Posted by NaiRogers@reddit | LocalLLaMA | View on Reddit | 18 comments

here it is: Benchmark-Yourself app - compete against open source LLMs and get your score - 5 benchmarks available - Add your results to your CV or linkedIn (if you dare)... or just paste them below for community shaming.

Posted by JLeonsarmiento@reddit | LocalLLaMA | View on Reddit | 21 comments

Upgrade path from 4x 3090s

Posted by anitamaxwynnn69@reddit | LocalLLaMA | View on Reddit | 166 comments

Vulnerability found in framework used by VLLM, many MCP servers, and other LLM tools

Posted by Hrethric@reddit | LocalLLaMA | View on Reddit | 90 comments

DeltaSqueezer@reddit

Great. Just spent the last few hours upgrading and updating: fun fact - Starlette 1.0 has breaking changes to how it uses Jinja2 templates internally. I gave up and gave my agent SSH access and got it to fix it for me in the end...

New DeepSWE benchmark finds Claude Opus cheats

Posted by DeltaSqueezer@reddit | LocalLLaMA | View on Reddit | 92 comments

Behold! Probably the most ghetto local AI server:

Posted by MackThax@reddit | LocalLLaMA | View on Reddit | 301 comments

DeltaSqueezer@reddit

You have 3D printed parts and metal struts! That's practically professional! Look at attemps from a couple of years back when GPUs were just balanced in a pile 😂

New DeepSWE benchmark finds Claude Opus cheats

Posted by DeltaSqueezer@reddit | LocalLLaMA | View on Reddit | 92 comments

New DeepSWE benchmark finds Claude Opus cheats

Posted by DeltaSqueezer@reddit | LocalLLaMA | View on Reddit | 92 comments

DeltaSqueezer@reddit (OP)

I use GLM extensively and it does miss things. My setup has it looping and checking implementation vs plan and typically it takes 3-5 loops to implement everything, even if it is working in 1-2 steps.

New DeepSWE benchmark finds Claude Opus cheats

Posted by DeltaSqueezer@reddit | LocalLLaMA | View on Reddit | 92 comments

DeltaSqueezer@reddit (OP)

Open models lower down in rankings: https://preview.redd.it/w3g2tjakym3h1.png?width=735&format=png&auto=webp&s=59785204876d417cc21bbe3e0dc46a953bc78f23

New DeepSWE benchmark finds Claude Opus cheats

Posted by DeltaSqueezer@reddit | LocalLLaMA | View on Reddit | 92 comments

Stop QwenLLama! Every other 4th post in this sub is about Qwen models in the past month

Posted by prselzh@reddit | LocalLLaMA | View on Reddit | 43 comments

DeltaSqueezer@reddit

Oh. I was just wishing for another Qwen post. Thanks for starting one :P Gemma is also interesting, but the KV cache cost was way too much. I might look again when TurboQuant is more mature.

[ Removed by moderator ]

Posted by 1337Captain@reddit | LocalLLaMA | View on Reddit | 3 comments

[ Removed by moderator ]

Posted by 1337Captain@reddit | LocalLLaMA | View on Reddit | 3 comments

I pioneered AI slop in 2019 with my Tensorflow rig. (24GB back then, too.) AMA.

Posted by Equal_Giraffe8866@reddit | LocalLLaMA | View on Reddit | 6 comments

Gemma is so much better than Qwen, prove me wrong

Posted by Mountain_Patience231@reddit | LocalLLaMA | View on Reddit | 62 comments

DeltaSqueezer@reddit

They are both good and have their uses. It's funny that LLMs are mirroring the distinction in humans that are sometimes split between: numbers people and words people. I hope they can somehow manage to combine the strength of both into a single model as it isn't convenient to switch models, plus on some tasks you want both strengths instead of splitting the task into multiple steps between different models.

Tencent Hy 30B/7B/1.8B

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 29 comments

Tencent Hy 30B/7B/1.8B

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 29 comments

Anyone else fighting Blackwell GSP timeout in production passthrough? How are you handling recovery without a host reboot?

Posted by Prestigious-Pop-3735@reddit | LocalLLaMA | View on Reddit | 10 comments

Anyone else fighting Blackwell GSP timeout in production passthrough? How are you handling recovery without a host reboot?

Posted by Prestigious-Pop-3735@reddit | LocalLLaMA | View on Reddit | 10 comments

Multiple RTX 3090 - P2P driver, NVLink or what can be done?

Posted by HumanDrone8721@reddit | LocalLLaMA | View on Reddit | 85 comments

Multiple RTX 3090 - P2P driver, NVLink or what can be done?

Posted by HumanDrone8721@reddit | LocalLLaMA | View on Reddit | 85 comments

Multiple RTX 3090 - P2P driver, NVLink or what can be done?

Posted by HumanDrone8721@reddit | LocalLLaMA | View on Reddit | 85 comments

DeltaSqueezer@reddit

there are different nvlink adapters, you need the right one. 3090 are limited to one nvlink so you can only connect them pairwise at best. tensor parallel is very sensitive to latency so using nvlink and putting on a pcie switch and enabling p2p will have a huge impact.

got my first "rm -rf /" today

Posted by DeltaSqueezer@reddit | LocalLLaMA | View on Reddit | 143 comments

got my first "rm -rf /" today

Posted by DeltaSqueezer@reddit | LocalLLaMA | View on Reddit | 143 comments

DeltaSqueezer@reddit (OP)

"rm is disabled in this shell. Use trash-rm, trash-put, del, or trash instead." Agent: "rm is disabled. let me try again" <|toolcall|>rm "rm is disabled in this shell. Use trash-rm, trash-put, del, or trash instead." Agent: "still not working. let me try another way" <|toolcall|>mkfs.ext4

Pasting textin AI chat app takes too long

Posted by Specialist_Ruin_9333@reddit | LocalLLaMA | View on Reddit | 13 comments

got my first "rm -rf /" today

Posted by DeltaSqueezer@reddit | LocalLLaMA | View on Reddit | 143 comments

got my first "rm -rf /" today

Posted by DeltaSqueezer@reddit | LocalLLaMA | View on Reddit | 143 comments

got my first "rm -rf /" today

Posted by DeltaSqueezer@reddit | LocalLLaMA | View on Reddit | 143 comments

I really would like to see the "visualisation" functionality that Gemini has lokally.

Posted by HistoricalStrength21@reddit | LocalLLaMA | View on Reddit | 6 comments

What is the point of MoE models, beyond being faster?

Posted by ihatebeinganonymous@reddit | LocalLLaMA | View on Reddit | 135 comments

The power of structured workflows and small local models

Posted by DeltaSqueezer@reddit | LocalLLaMA | View on Reddit | 45 comments

DeltaSqueezer@reddit (OP)

I hadn't realised this flow state is related to dopamine. No wonder I couldn't stop. Was working on it until 5am and again and had to wake up at 7am for work. Dead today.

favorite Agentic Coding Harness

Posted by chibop1@reddit | LocalLLaMA | View on Reddit | 74 comments

NEW BITNET MODELS!

Posted by Silver-Champion-4846@reddit | LocalLLaMA | View on Reddit | 46 comments

M5 vs DGX Spark vs Strix Halo vs RTX 6000

Posted by Signal_Ad657@reddit | LocalLLaMA | View on Reddit | 261 comments

DeltaSqueezer@reddit

Absolutely right. 8k is still a lot, but now the tendency is just to throw the whole codebase into context and run off that. I'd say 8k is even rather generous for a website chat feature. I'm guessing you could probably do it in 4k or even 2k, but I guess 8k gives headroom and ease of RAGing in data.

M5 vs DGX Spark vs Strix Halo vs RTX 6000

Posted by Signal_Ad657@reddit | LocalLLaMA | View on Reddit | 261 comments

The power of structured workflows and small local models

Posted by DeltaSqueezer@reddit | LocalLLaMA | View on Reddit | 45 comments

DeltaSqueezer@reddit (OP)

I think we've been working in the same direction. I'm also using worktrees for parallel feature development, but I'm proceeding cautiously here on automation and trying to avoid difficult merges. Self-improvement is also key. I also used it in the generation of this map-reduce skill where the agent would try to one shot the task with the skill. Analyze failures and places for improvement and iteratively modify the skill to converge on better results until it can reliably one shot workflow requests. In other places, I have it automatically catch and log errors. Later they will be used to feedback modifications for improvement but that follow up step hasn't been done yet. Re failures, when run workflow in managed mode, it checkpoints all workers and can re-run failed ones. I'm also working on post-step assessment/failure detection with internal retry which can run within the worker to avoid the worker failing.