How to get realtime logging of LLM activity?

Posted by dtdisapointingresult@reddit | LocalLLaMA | View on Reddit | 17 comments

(yes this is a long post and I used some markdown formatting, like I always did to organize my comments long before the invention of LLMs. [For example in 2021.](https://reddit.com/r/selfhosted/comments/rcvih1/you_should_know_about_using_zerotier_or_tailscale/). I'm gonna block any tidepod-eating zoomer who calls me a bot, like what happened on my last long post) I'm a local LLM user, and I want to achieve 2 things: 1. **Realtime** monitoring of LLM's activity: I need a real-time view, whether by web UI or bash command, where I see the token stream. What I really want here is to spy on what the LLM is doing when I'm using some random AI app. Many apps hide what's going on because it's more "user friendly." That's usually fine for fast & intelligent cloud models, not so much when you're using a small, almost-regarded local model like Qwen that is completely off-base half the time. I don't want to wait 5 minutes for a bad response to finish, I'm a micro-manager and want to interrupt early when I see the LLM is on the wrong path. 2. History of all prompts + responses: if I have the realtime monitoring, might as well log the data so I can analyze it for educational purposes later on Various options I thought about: 1. Using an established logging engine like LiteLLM. I haven't been able to find one with realtime monitoring. They wait for the response to finish before it's saved + observable. 2. Vibecode my own realtime monitoring/logging proxy to put in front of vllm/llama-server, + web dashboard 3. Use someone else's version of #2. These are impossible to find. I know they exist because I've seen a couple linked in random comments over the years, but they're not showing up in search. I'd appreciate any advice here.