Technical question can you mask/hide parts of the KV cash for a request.

Posted by Noxusequal@reddit | LocalLLaMA | View on Reddit | 3 comments

We have the following idea have two llm agents interact but each also has an internal monolog.

It would suck to reload the whole context between each llm request.

So there are 2 questions: 1. Can you incramentally update the kv cache ? Only adding the last line of dialog that was produced in the last prompt. 2. Can you hide parts of the kv cache ? So that we dont have to reload it between the agents taking turns. Since we could just hide the part of the kv cash that refers to the internal monolog of the other agent.

Do any implementation like this exsist or if not would that even be technically possible ?