Just saw the anthropic "emotion concepts" post. Do local model runners have support for arbitrary probes like that?

Posted by willrshansen@reddit | LocalLLaMA | View on Reddit | 16 comments

This post: https://www.anthropic.com/research/emotion-concepts-function

The way they generate the "emotion vectors" seems like it would be entirely viable to run locally, and also applicable for arbitrary concepts like "blue", "five", or "cars".

I think it would be really neat to highlight input or output based on concept activation, or have graphs of concept activation vs slight variation of prompt.

Are there local model runners that can already do that?