Which Agent to execute Tasks + TTS, SST?

Posted by Njee_@reddit | LocalLLaMA | View on Reddit | 2 comments

Okay hey everyone,

Question: Which tool allows me to interact with an Agent (preferably opencode or similar), upload files to filesystem (Agents directory), TTS+STT, From my Phone?!

I want to talk to an Agent while riding my bicycle.

In Theory not a problem at all and i feel like im missing something?!

For example using Claudecode, when im on my PC, its easy to start an mcp, maybe put some skill somewhere to allow ClaudeCode to interact with my Kanban board, which exists within Nextcloud, search the web with SearxNG. If i wanted some more flexibility I could even put my credentials somehwere and allow it to curl. For example to uploads files into a service directly from the filesystem. Not that i am doing it right now, because im at my PC anyways and doing it myself is faster than typing. But i would like to do that from remote, interactively.

Especially in terms of Claude Code or maybe even Opencode i Imagine the interaction would be nice as i can really see myself talking, making a plan for a certian task, have it research and then have a good foundation to basically just write a small todo and notes based on some research and planning.

I had a look at OpenwebUI, with Open Terminal integration. Good: Nice WebUI, works well on Phone. Uploading a file into filesystem is possible from phone. I have never setup tts+sst but i imagine thats doable too. But: im missing the Plan/Execute feel that i get from some agents as its just my model with terminal access and not one of these CLI tools.

Existing WebUIs for Opencode usually do not come with TTS+STT and also do not allow me to upload files into the filesystem from my phone.

Up to now i have not looked into HermesAgent, OpenClaw etc... But i already suspect that this is basically what im looking for? However, im not that much into using it via telegram etc. i feel like that cant be the point of going Local!? Also i dont know about the CLI experience?!

For Hardware: im thinking about runnign TTS+STT on a 3060 12gb, LLM on a 5090, preferably via vLLM. Ive been using Qwen27b nvfp4 with a couple of parallel requests possible and i do like the interaction via OpenCode.

Thanks in advance!