PersonaPlex 7B on Apple Silicon with massive memory leak in full-duplex mode. Anyone get this working?

Posted by Excellent_Koala769@reddit | LocalLLaMA | View on Reddit | 3 comments

I've been trying to run NVIDIA's PersonaPlex 7B (the full-duplex speech-to-speech model based on Moshi) locally on an M5 Max with 128GB unified memory. The goal is simple: a real-time voice chat demo where you talk to it like a phone call.

What I've tried:

1. speech-swift MLX 8-bit (PersonaPlexDemo + custom WebSocket server)

2. NVIDIA's official PyTorch server

System specs: M5 Max, 128GB unified memory, macOS 26.4, Swift 6.3, MLX latest

What I'm looking for:

Happy to share the exact code and patches I tried if anyone wants to dig in.