shbong

I burned a weekend making the models "remember" me. The fix had nothing to do with trying to run bigger models locally

Posted by shbong@reddit | LocalLLaMA | View on Reddit | 19 comments

shbong@reddit (OP)

Yes, I agree graphs are complex af BUT for certain use cases in my opinion there is not trade-off, we must find a way to make graphs work, they are such a powerful structure. For the entity deduplication and graph creation I approached differently, I harnessed the "pre-update" phase, by running a swarm of agents that read and discuss together the new data that needs to be appended considering and reading the current graph state while discussing, I found this way being better for the end graph structure

How much VRAM needed for Qwen 3.6 27B Q8 with 262K context?

Posted by My_Unbiased_Opinion@reddit | LocalLLaMA | View on Reddit | 79 comments

I burned a weekend making the models "remember" me. The fix had nothing to do with trying to run bigger models locally

Posted by shbong@reddit | LocalLLaMA | View on Reddit | 19 comments

I burned a weekend making the models "remember" me. The fix had nothing to do with trying to run bigger models locally

Posted by shbong@reddit | LocalLLaMA | View on Reddit | 19 comments

shbong@reddit (OP)

https://preview.redd.it/je7akklq925h1.jpeg?width=736&format=pjpg&auto=webp&s=1b0a780973e82e19b6ec2e4a255b5528192e228e

How much VRAM needed for Qwen 3.6 27B Q8 with 262K context?

Posted by My_Unbiased_Opinion@reddit | LocalLLaMA | View on Reddit | 79 comments

shbong@reddit

Like in what circumstances? Also if someone else has tried did you noticed a degradation in performances (the model being accurate and smart) about the things you are asking with such big context windows?

I burned a weekend making the models "remember" me. The fix had nothing to do with trying to run bigger models locally

Posted by shbong@reddit | LocalLLaMA | View on Reddit | 19 comments

What memory system are you using for your agents?

Posted by Mr_Moonsilver@reddit | LocalLLaMA | View on Reddit | 55 comments

shbong@reddit

I built my own memory system not because others weren't great, but because I've been developing agents that required memory for a couple of years. Initially, I had to integrate memory directly into my projects. Later, I extracted this work into a separate "memory-like" project that eventually became BrainAPI. The advantage of having my own system is that I can continuously develop projects on top of it, allowing me to edit and extend its capabilities as needed, without the limitations of other memory systems. For instance, to enhance adaptability, I developed a plugin ecosystem that can modify the core engine's functionalities

I burned a weekend making the models "remember" me. The fix had nothing to do with trying to run bigger models locally

Posted by shbong@reddit | LocalLLaMA | View on Reddit | 19 comments

How much VRAM needed for Qwen 3.6 27B Q8 with 262K context?

Posted by My_Unbiased_Opinion@reddit | LocalLLaMA | View on Reddit | 79 comments

shbong@reddit

Is it really worth having a context of 262K tokens? Considering the attention span of LLMs having such big context window can be counter productive as said in the "Lost in the Middle" paper

I burned a weekend making the models "remember" me. The fix had nothing to do with trying to run bigger models locally

Posted by shbong@reddit | LocalLLaMA | View on Reddit | 19 comments

Stop asking what model to run. There are literally only two.

Posted by Wrong_Mushroom_7350@reddit | LocalLLaMA | View on Reddit | 549 comments

Anyone else experimenting with memory for LLMs?

Posted by shbong@reddit | LocalLLaMA | View on Reddit | 41 comments

shbong@reddit (OP)

I've been developing a project that has evolved into an ecosystem now that it supports plugins. Recently, I created two plugins: a chatbot inference plugin and a chatbot memory plugin. The inference plugin exposes two API endpoints to serve a Large Language Model (LLM) with either streaming or simple API responses. The memory plugin enhances the messages sent to the LLM by adding extra context. These two plugins can function together when installed in the same BrainAPI instance. The chatbot plugin allows the LLM to directly use MCP tools to navigate a knowledge graph, effectively creating a fully knowledge-aware agent straight out of the box using just the command-line interface. The project can be found at https://github.com/Lumen-Labs/brainapi2. The setup I used, consisting of PostgreSQL, pgvector, and NetworkX, works exceptionally well

Calling it now Microsoft is buying Unsloth.

Posted by Wrong_Mushroom_7350@reddit | LocalLLaMA | View on Reddit | 289 comments

Gemma4 27b vs GPT-OSS 20b -- Has anyone compared them ?

Posted by shbong@reddit | LocalLLaMA | View on Reddit | 19 comments

Gemma4 27b vs GPT-OSS 20b -- Has anyone compared them ?

Posted by shbong@reddit | LocalLLaMA | View on Reddit | 19 comments

Gemma4 27b vs GPT-OSS 20b -- Has anyone compared them ?

Posted by shbong@reddit | LocalLLaMA | View on Reddit | 19 comments

A local agent (that works with local models) that is easy to set up.

Posted by Valuable-Run2129@reddit | LocalLLaMA | View on Reddit | 3 comments

shbong@reddit

I still have to try Gemma4 but from what I've seen so far local small models tend to struggle with tool usage and structured output (for eg. often if requested to output json misses one parenthesis or quotes, breaking the whole output)

4B models on smartphone

Posted by Sudden_Vegetable6844@reddit | LocalLLaMA | View on Reddit | 9 comments

shbong@reddit

BUT just the fact that we now are in the position of discussing the quality and being able to run LLMs locally in our smartphone is a something big

If you haven't yet given Gemma 4 a go...do it today

Posted by No-Anchovies@reddit | LocalLLaMA | View on Reddit | 206 comments

shbong@reddit

Yeah I saw the benchmarks and also that on some benchmarks it's better than sonnet 4.. and it runs on your local machine..

Why retrieval breaks once documents stop being static

Posted by EnoughNinja@reddit | LocalLLaMA | View on Reddit | 1 comments

I'm shocked (Gemma 4 results)

Posted by Potential-Gold5298@reddit | LocalLLaMA | View on Reddit | 78 comments

shbong@reddit

This is going to change the "local ai" space, such powerful model that can run easily locally with quantization, beating bigger models like sonnet 4..

M1 Max vs M4 Max vs M5 Max

Posted by br_web@reddit | LocalLLaMA | View on Reddit | 4 comments

which macbook configuration to buy

Posted by Ayuzh@reddit | LocalLLaMA | View on Reddit | 11 comments

Thoughts on the almost near release Avocado?

Posted by shbong@reddit | LocalLLaMA | View on Reddit | 6 comments

shbong@reddit (OP)

yes they had planned to release it on march but since it didn't met their expectations they moved the launch to may so they can improve it meanwhile

Built a graph-based "memory layer" for agents - Qwen > LLaMA for us, GPT-OSS 20B fast but tooling issues

Posted by shbong@reddit | LocalLLaMA | View on Reddit | 4 comments

shbong@reddit (OP)

I would love to, I have a discord channel dedicated to memory, rag and this kind of stuff, maybe you can jump in there?

Built a graph-based "memory layer" for agents - Qwen > LLaMA for us, GPT-OSS 20B fast but tooling issues

Posted by shbong@reddit | LocalLLaMA | View on Reddit | 4 comments

shbong@reddit (OP)

27B? Wow, your GPU must be on fire! I've thought about buying some GPUs many times but I'm still relying on my trusty MacBook

OpenAI pivot investors love

Posted by PaceImaginary8610@reddit | LocalLLaMA | View on Reddit | 124 comments

I feel personally attacked

Posted by HeadAcanthisitta7390@reddit | LocalLLaMA | View on Reddit | 219 comments

LLMs finally remembering: I’ve built the memory layer, now it’s time to explore

Posted by shbong@reddit | LocalLLaMA | View on Reddit | 4 comments

shbong@reddit (OP)

You don't see probably the "memory layer" in the article because the article talks about a tutorial and the part about integrating memory layer is just few lines and does not require to lo learn more flaky abstraction

LLMs finally remembering: I’ve built the memory layer, now it’s time to explore

Posted by shbong@reddit | LocalLLaMA | View on Reddit | 4 comments

shbong@reddit (OP)

I mean if we should follow this philosophy we will not have most of the technology that we have, we will not have LLMs too

Anyone else experimenting with memory for LLMs?

Posted by shbong@reddit | LocalLLaMA | View on Reddit | 41 comments

Anyone else experimenting with memory for LLMs?

Posted by shbong@reddit | LocalLLaMA | View on Reddit | 41 comments

Anyone else experimenting with memory for LLMs?

Posted by shbong@reddit | LocalLLaMA | View on Reddit | 41 comments

shbong@reddit (OP)

I didn't meant to say that it's a trivial task on the contrary I want to underline that it's a really complex task, a real challenge nowdays that can bring so much more quality to AI agents or chatbots in general

Anyone else experimenting with memory for LLMs?

Posted by shbong@reddit | LocalLLaMA | View on Reddit | 41 comments

Anyone else experimenting with memory for LLMs?

Posted by shbong@reddit | LocalLLaMA | View on Reddit | 41 comments

Anyone else experimenting with memory for LLMs?

Posted by shbong@reddit | LocalLLaMA | View on Reddit | 41 comments

shbong@reddit (OP)

I had a theory but still not had been able to apply it due to speed constraints, in short: \- resolve coreferences in the texts you are about to process/save \- extract triplets embedding the phrase, subject and object \- create the nodes on the graph \- while searching do a vector search on both nodes and entire phrases and retrieve 1 level depth \- let the llm explore paths on the graph with a tool

Anyone else experimenting with memory for LLMs?

Posted by shbong@reddit | LocalLLaMA | View on Reddit | 41 comments

8x RTX 3090 open rig

Posted by Armym@reddit | LocalLLaMA | View on Reddit | 391 comments