How does the system prompt actually work? does it differ per provider and per model? Also how does it impact prompt caching?

Posted by haodocowsfly@reddit | LocalLLaMA | View on Reddit | 12 comments

So I’m reading: https://developers.openai.com/cookbook/examples/prompt_caching_201 and https://platform.claude.com/docs/en/build-with-claude/prompt-caching and it says that the cache should be stable wrt to tools > system prompt > message content.

I’m a bit confused about the system prompt part. From what I remember about genma when i briefly played around with it, from what I understand, the format should be:

“””
[message history] (stripped of system prompt)

and then in the next message:
system: [attached system prompt]
user: (new message)
“””

Doesn’t that mean the most important part of the cache is “message history content” and not the tools/system prompt? Or are there other strategies for the system prompt?

I’m trying to figure this out because I noticed this:

https://haowjy.github.io/blog/75-percent-redundant-reads (sorry for some of the AI slop, especially at the bottom, haven’t had time to clean up my theory/experiment especially).

The main technique I’m trying to figure out is if we can ditch most “tool results” and put them into the system prompt dynamically as sort of an exact “working memory” for the most recent tools (especially reads) which always have the most up to date contents of something, so that the message history doesn’t get polluted with constant re-reads.