Looking for software that processes images in realtime (or periodically).

Posted by My_Unbiased_Opinion@reddit | LocalLLaMA | View on Reddit | 5 comments

Are there any projects out there that allow a multimodal llm process a window in realtime? Basically im trying to have the gui look at a window, take a screenshot periodically and send it to ollama and have it processed with a system and spit out an output all hands free. Ive been trying to look at some OSS projects but havent seen anything (or else I am not looking correctly). Thanks for yall help.