DeltaSqueezer

Why don't we still have any games with AI agents used as NPC characters?

Posted by Another__one@reddit | LocalLLaMA | View on Reddit | 110 comments

[-]

DeltaSqueezer@reddit

even if you solve all of the above: it doesn't really add anything to the gameplay.

AI assisted music creation

Posted by DeltaSqueezer@reddit | LocalLLaMA | View on Reddit | 6 comments

[-]

DeltaSqueezer@reddit (OP)

Thanks for the comprehensive answer. I guess it shows the lack of investment in this area when ancient (by AI standards) RVC is still the best!

Would you use a very fast context layer on top of your existing OpenCode/Claude Code instance?

Posted by Winter_Educator_2496@reddit | LocalLLaMA | View on Reddit | 31 comments

[-]

Would you use a very fast context layer on top of your existing OpenCode/Claude Code instance?

Posted by Winter_Educator_2496@reddit | LocalLLaMA | View on Reddit | 31 comments

[-]

I guess tracking usage is one way. I'm not sure how you disambiguate. I currently have over 100 tabs open and over 20 YT tabs open. Which one does it pick? It what's the probability of getting the right one? I prefer to just paste in the url so you can ask: "<my prompt here> on this video: URL". This way url is specified, no ambiguity and no tracking required. I also work regularly across 3 different machines, so you'd also need to sync across them or have gaps failures due to not having the context across machines. It's an interesting idea and might be right for some people.

Would you use a very fast context layer on top of your existing OpenCode/Claude Code instance?

Posted by Winter_Educator_2496@reddit | LocalLLaMA | View on Reddit | 31 comments

[-]

DeltaSqueezer@reddit

and how does it get the URL of the video?

Would you use a very fast context layer on top of your existing OpenCode/Claude Code instance?

Posted by Winter_Educator_2496@reddit | LocalLLaMA | View on Reddit | 31 comments

[-]

DeltaSqueezer@reddit

No. I provide accurate context for the LLM so that it gives the right answer.

Is agenting usage increasing CPU usage for you?

Posted by superloser48@reddit | LocalLLaMA | View on Reddit | 10 comments

[-]

DeltaSqueezer@reddit

My LLM box has CPU pegged at 100% during inference. Seems partly CPU bound.

NVIDIA’s Vera CPU in Detail: High Perf Chip Takes Aim at Broader AI Server Market

Posted by -protonsandneutrons-@reddit | hardware | View on Reddit | 27 comments

[-]

DeltaSqueezer@reddit

So how many organs do I need to sell to get one?

GPU Prices. Buy now, or buy later?

Posted by knob-0u812@reddit | LocalLLaMA | View on Reddit | 108 comments

[-]

DeltaSqueezer@reddit

In my local market the RTX Pro 6000 cost $8,300. I ordered one on credit and then chickened out and cancelled it. Now it costs $11,000. A 30%+ price rise in a few months. My fear is that we are the early ones and so this is only going to get worse. I was hoping that next get GPUs might come out and push prices lower or allow more performance for same dollar, but now I'm wondering whether demand is going to grow way faster than supply and keep prices going upwards. Nvidia has no real competition in the discrete GPU space for AI and no incentive to reduce prices. Heck, it's hardly worth their time to even create and market such products - from a financial perspective they should just design and produce datacenter products for the next few years.

I built my own HNSW from scratch, here is what I learned

Posted by Scared_Animator9241@reddit | LocalLLaMA | View on Reddit | 2 comments

[-]

DeltaSqueezer@reddit

It's always good to get hands dirty and do the implementation, only then do you _really_ know. I have a vague idea of how transformers work, but I don't really know and couldn't implement one from scratch from memory. Until I implement it, I will not really understand it. Unfortunately, time is a limited resource and so I have to pick and choose what to go deep on and have to 80/20 the rest.

Cost Analysis of my $6.4k Local LLM Server

Posted by 1ncehost@reddit | LocalLLaMA | View on Reddit | 73 comments

[-]

DeltaSqueezer@reddit

>I have ZAI's best plan, which is currently $144/mo, and it is allowing me about 4.5M input tokens and 200k output tokens of GLM 4.7 per day. Why is your limit so low? I'm using GLM-5.1 on the middle tier plan and in the last 30 days I have well over 1 Trillion tokens total (input and output).

Whisper.cpp is underwhelming

Posted by Larkonath@reddit | LocalLLaMA | View on Reddit | 19 comments

[-]

DeltaSqueezer@reddit

whisper has been trained with certain audio lengths in mind. you need to break down audio into chunks.

For those creating personal assistants locally - how has short/long term memory impacted your experience?

Posted by GrungeWerX@reddit | LocalLLaMA | View on Reddit | 51 comments

[-]

DeltaSqueezer@reddit

I've implemented things, but so far have not felt the need to implement memory. Then again, my AI assistant is definitely just a tool and not a 'he' or 'she'.

PCIe Gen5 Switch vs new MB

Posted by NaiRogers@reddit | LocalLLaMA | View on Reddit | 18 comments

[-]

DeltaSqueezer@reddit

PCIe Gen5 Switch vs new MB

Posted by NaiRogers@reddit | LocalLLaMA | View on Reddit | 18 comments

[-]

DeltaSqueezer@reddit

IMO, no. If I were to start again, I'd go with PCIe switch and attach to my existing cheap consumer motherboard. Aliexpress also has PCIe switches.

here it is: Benchmark-Yourself app - compete against open source LLMs and get your score - 5 benchmarks available - Add your results to your CV or linkedIn (if you dare)... or just paste them below for community shaming.

Posted by JLeonsarmiento@reddit | LocalLLaMA | View on Reddit | 21 comments

[-]

DeltaSqueezer@reddit

Same!

Upgrade path from 4x 3090s

Posted by anitamaxwynnn69@reddit | LocalLLaMA | View on Reddit | 166 comments

[-]

DeltaSqueezer@reddit

Yeah, there's a big gap, I think I'd be shooting for GLM-5.1 and then you're talking about a lot more hardware.

Vulnerability found in framework used by VLLM, many MCP servers, and other LLM tools

Posted by Hrethric@reddit | LocalLLaMA | View on Reddit | 90 comments

[-]

DeltaSqueezer@reddit

Great. Just spent the last few hours upgrading and updating: fun fact - Starlette 1.0 has breaking changes to how it uses Jinja2 templates internally. I gave up and gave my agent SSH access and got it to fix it for me in the end...

New DeepSWE benchmark finds Claude Opus cheats

Posted by DeltaSqueezer@reddit | LocalLLaMA | View on Reddit | 92 comments

[-]

DeltaSqueezer@reddit (OP)

Did you evaluate also GLM-5.1? I'm curious how you ranked it vs GPT and Claude.

Behold! Probably the most ghetto local AI server:

Posted by MackThax@reddit | LocalLLaMA | View on Reddit | 301 comments

[-]

DeltaSqueezer@reddit

You have 3D printed parts and metal struts! That's practically professional! Look at attemps from a couple of years back when GPUs were just balanced in a pile 😂

New DeepSWE benchmark finds Claude Opus cheats

Posted by DeltaSqueezer@reddit | LocalLLaMA | View on Reddit | 92 comments

[-]

DeltaSqueezer@reddit (OP)

Qwen 3.6 plus and deepseek v4 pro also do quite badly.

New DeepSWE benchmark finds Claude Opus cheats

Posted by DeltaSqueezer@reddit | LocalLLaMA | View on Reddit | 92 comments

[-]

DeltaSqueezer@reddit (OP)

I use GLM extensively and it does miss things. My setup has it looping and checking implementation vs plan and typically it takes 3-5 loops to implement everything, even if it is working in 1-2 steps.

New DeepSWE benchmark finds Claude Opus cheats

Posted by DeltaSqueezer@reddit | LocalLLaMA | View on Reddit | 92 comments

[-]

DeltaSqueezer@reddit (OP)

Open models lower down in rankings: https://preview.redd.it/w3g2tjakym3h1.png?width=735&format=png&auto=webp&s=59785204876d417cc21bbe3e0dc46a953bc78f23

New DeepSWE benchmark finds Claude Opus cheats

Posted by DeltaSqueezer@reddit | LocalLLaMA | View on Reddit | 92 comments

[-]

DeltaSqueezer@reddit (OP)

Good model behaviour, bad testing setup.

Stop QwenLLama! Every other 4th post in this sub is about Qwen models in the past month

Posted by prselzh@reddit | LocalLLaMA | View on Reddit | 43 comments

[-]

DeltaSqueezer@reddit

Oh. I was just wishing for another Qwen post. Thanks for starting one :P Gemma is also interesting, but the KV cache cost was way too much. I might look again when TurboQuant is more mature.

[ Removed by moderator ]

Posted by 1337Captain@reddit | LocalLLaMA | View on Reddit | 3 comments

[-]

DeltaSqueezer@reddit

magic ;)

[ Removed by moderator ]

Posted by 1337Captain@reddit | LocalLLaMA | View on Reddit | 3 comments

[-]

DeltaSqueezer@reddit

Looks fun!

I pioneered AI slop in 2019 with my Tensorflow rig. (24GB back then, too.) AMA.

Posted by Equal_Giraffe8866@reddit | LocalLLaMA | View on Reddit | 6 comments

[-]

DeltaSqueezer@reddit

How profitable is it per item on average? What is the distribution?

Gemma is so much better than Qwen, prove me wrong

Posted by Mountain_Patience231@reddit | LocalLLaMA | View on Reddit | 62 comments

[-]

DeltaSqueezer@reddit

They are both good and have their uses. It's funny that LLMs are mirroring the distinction in humans that are sometimes split between: numbers people and words people. I hope they can somehow manage to combine the strength of both into a single model as it isn't convenient to switch models, plus on some tasks you want both strengths instead of splitting the task into multiple steps between different models.

Tencent Hy 30B/7B/1.8B

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 29 comments

[-]

DeltaSqueezer@reddit

I don't blame them. I wouldn't get caught under stupid EU AI laws either.

Tencent Hy 30B/7B/1.8B

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 29 comments

[-]

DeltaSqueezer@reddit

Thanks for sharing. I was impressed with how capable the 1.5 series was and look forward to testing the new ones.

Anyone else fighting Blackwell GSP timeout in production passthrough? How are you handling recovery without a host reboot?

Posted by Prestigious-Pop-3735@reddit | LocalLLaMA | View on Reddit | 10 comments

[-]

DeltaSqueezer@reddit

post on the Nvidia forums, they're pretty good at resolving.

Anyone else fighting Blackwell GSP timeout in production passthrough? How are you handling recovery without a host reboot?

Posted by Prestigious-Pop-3735@reddit | LocalLLaMA | View on Reddit | 10 comments

[-]

DeltaSqueezer@reddit

Is this the same issue that required a firmware update?

Multiple RTX 3090 - P2P driver, NVLink or what can be done?

Posted by HumanDrone8721@reddit | LocalLLaMA | View on Reddit | 85 comments

[-]

DeltaSqueezer@reddit

Try this as starting point: https://www.reddit.com/r/notinteresting/comments/wdwj3c

Multiple RTX 3090 - P2P driver, NVLink or what can be done?

Posted by HumanDrone8721@reddit | LocalLLaMA | View on Reddit | 85 comments

[-]

DeltaSqueezer@reddit

suggest you search reddit. this has been discussed before with numbers provided.

Multiple RTX 3090 - P2P driver, NVLink or what can be done?

Posted by HumanDrone8721@reddit | LocalLLaMA | View on Reddit | 85 comments

[-]

DeltaSqueezer@reddit

there are different nvlink adapters, you need the right one. 3090 are limited to one nvlink so you can only connect them pairwise at best. tensor parallel is very sensitive to latency so using nvlink and putting on a pcie switch and enabling p2p will have a huge impact.

got my first "rm -rf /" today

Posted by DeltaSqueezer@reddit | LocalLLaMA | View on Reddit | 143 comments

[-]

DeltaSqueezer@reddit (OP)

Redirects are quite dangerous that's why they are caught by my bash tool.

got my first "rm -rf /" today

Posted by DeltaSqueezer@reddit | LocalLLaMA | View on Reddit | 143 comments

[-]

DeltaSqueezer@reddit (OP)

"rm is disabled in this shell. Use trash-rm, trash-put, del, or trash instead." Agent: "rm is disabled. let me try again" <|toolcall|>rm "rm is disabled in this shell. Use trash-rm, trash-put, del, or trash instead." Agent: "still not working. let me try another way" <|toolcall|>mkfs.ext4

Pasting textin AI chat app takes too long

Posted by Specialist_Ruin_9333@reddit | LocalLLaMA | View on Reddit | 13 comments

[-]

DeltaSqueezer@reddit

It happens on facebook too.

got my first "rm -rf /" today

Posted by DeltaSqueezer@reddit | LocalLLaMA | View on Reddit | 143 comments

[-]

DeltaSqueezer@reddit (OP)

what is the underlying microvm? firecracker? something else?

got my first "rm -rf /" today

Posted by DeltaSqueezer@reddit | LocalLLaMA | View on Reddit | 143 comments

[-]

DeltaSqueezer@reddit (OP)

I have different layers: bwrap for net and limiting file system, bash limited to safe whitelisted commands by default

got my first "rm -rf /" today

Posted by DeltaSqueezer@reddit | LocalLLaMA | View on Reddit | 143 comments

[-]

DeltaSqueezer@reddit (OP)

As well as direct exfil attack, the LLM scooping up .env and other credentials is also a risk.

I really would like to see the "visualisation" functionality that Gemini has lokally.

Posted by HistoricalStrength21@reddit | LocalLLaMA | View on Reddit | 6 comments

[-]

DeltaSqueezer@reddit

Try this:

What is the point of MoE models, beyond being faster?

Posted by ihatebeinganonymous@reddit | LocalLLaMA | View on Reddit | 135 comments

[-]

DeltaSqueezer@reddit

Besides faster? Cheaper.

The power of structured workflows and small local models

Posted by DeltaSqueezer@reddit | LocalLLaMA | View on Reddit | 45 comments

[-]

DeltaSqueezer@reddit (OP)

I hadn't realised this flow state is related to dopamine. No wonder I couldn't stop. Was working on it until 5am and again and had to wake up at 7am for work. Dead today.

favorite Agentic Coding Harness

Posted by chibop1@reddit | LocalLLaMA | View on Reddit | 74 comments

[-]

DeltaSqueezer@reddit

I think this is the future. Everyone will create their own custom UI that suits how they work.

NEW BITNET MODELS!

Posted by Silver-Champion-4846@reddit | LocalLLaMA | View on Reddit | 46 comments

[-]

DeltaSqueezer@reddit

It's good that someone is still working on this. I hope that optimized bitnet hardware will eventually arrive.

M5 vs DGX Spark vs Strix Halo vs RTX 6000

Posted by Signal_Ad657@reddit | LocalLLaMA | View on Reddit | 261 comments

[-]

DeltaSqueezer@reddit

Absolutely right. 8k is still a lot, but now the tendency is just to throw the whole codebase into context and run off that. I'd say 8k is even rather generous for a website chat feature. I'm guessing you could probably do it in 4k or even 2k, but I guess 8k gives headroom and ease of RAGing in data.

M5 vs DGX Spark vs Strix Halo vs RTX 6000

Posted by Signal_Ad657@reddit | LocalLLaMA | View on Reddit | 261 comments

[-]

DeltaSqueezer@reddit

Well played.

The power of structured workflows and small local models

Posted by DeltaSqueezer@reddit | LocalLLaMA | View on Reddit | 45 comments

[-]

DeltaSqueezer@reddit (OP)

I think we've been working in the same direction. I'm also using worktrees for parallel feature development, but I'm proceeding cautiously here on automation and trying to avoid difficult merges. Self-improvement is also key. I also used it in the generation of this map-reduce skill where the agent would try to one shot the task with the skill. Analyze failures and places for improvement and iteratively modify the skill to converge on better results until it can reliably one shot workflow requests. In other places, I have it automatically catch and log errors. Later they will be used to feedback modifications for improvement but that follow up step hasn't been done yet. Re failures, when run workflow in managed mode, it checkpoints all workers and can re-run failed ones. I'm also working on post-step assessment/failure detection with internal retry which can run within the worker to avoid the worker failing.