Which Agent to execute Tasks + TTS, SST?
Posted by Njee_@reddit | LocalLLaMA | View on Reddit | 2 comments
Okay hey everyone,
Question: Which tool allows me to interact with an Agent (preferably opencode or similar), upload files to filesystem (Agents directory), TTS+STT, From my Phone?!
I want to talk to an Agent while riding my bicycle.
In Theory not a problem at all and i feel like im missing something?!
For example using Claudecode, when im on my PC, its easy to start an mcp, maybe put some skill somewhere to allow ClaudeCode to interact with my Kanban board, which exists within Nextcloud, search the web with SearxNG. If i wanted some more flexibility I could even put my credentials somehwere and allow it to curl. For example to uploads files into a service directly from the filesystem. Not that i am doing it right now, because im at my PC anyways and doing it myself is faster than typing. But i would like to do that from remote, interactively.
Especially in terms of Claude Code or maybe even Opencode i Imagine the interaction would be nice as i can really see myself talking, making a plan for a certian task, have it research and then have a good foundation to basically just write a small todo and notes based on some research and planning.
I had a look at OpenwebUI, with Open Terminal integration. Good: Nice WebUI, works well on Phone. Uploading a file into filesystem is possible from phone. I have never setup tts+sst but i imagine thats doable too. But: im missing the Plan/Execute feel that i get from some agents as its just my model with terminal access and not one of these CLI tools.
Existing WebUIs for Opencode usually do not come with TTS+STT and also do not allow me to upload files into the filesystem from my phone.
Up to now i have not looked into HermesAgent, OpenClaw etc... But i already suspect that this is basically what im looking for? However, im not that much into using it via telegram etc. i feel like that cant be the point of going Local!? Also i dont know about the CLI experience?!
For Hardware: im thinking about runnign TTS+STT on a 3060 12gb, LLM on a 5090, preferably via vLLM. Ive been using Qwen27b nvfp4 with a couple of parallel requests possible and i do like the interaction via OpenCode.
Thanks in advance!
grace-turner3@reddit
for the mobile +agent +tts+stt combo check out openclaw with telegram integration despite your hesitation. it's actually just using telegram as a ui layer then the agent and models still run local on your hardware. telegram handles the mobile interface + built-in voice messages (which become your sst input). alternative: openwebui +piper tts (runs on cpu, lightweight) +faster-whisper for sst. both can run on the 3060 alongside your vllm setup on the 5090. you can expose openwebui via tailscale for secure mobile access without cloud dependencies.
for the agent part: hermesagent supports file operations and has better plan/execute flow than raw terminal access. worth testing if you want that claude code feel locally.
ai_guy_nerd@reddit
Getting a proper plan/execute loop on mobile is a nightmare because most mobile agents are just wrappers for a chat window. If the goal is to talk to an agent while cycling, the key is separating the brain from the interface.
One way to handle this is running a dedicated agent server on a VPS and using a native app for the control room. OpenClaw does this with an iOS app that lets you monitor and trigger agent runs without wrestling with a browser.
Another route is the Termux + SSH approach, but that is a lot of friction for a bike ride. A dedicated native bridge is usually the only way to get TTS/STT working reliably without the phone killing the process in the background.