Ngram TTS model?

Posted by Silver-Champion-4846@reddit | LocalLLaMA | View on Reddit | 4 comments

Hey there guys. Question, is it possible to make a llm-based tts model that stores some kind of patterns for specific languages as ngram lookup tables? While it might not be needed for some bulky 7b tts model, my usecase requires a model that runs with <50ms of latency on cpu while also adequately supporting a challenging language like Arabic. Would a Gema4 design be possible to adapt for tts? Maybe the ple's storing language-specific data allowing it to perform like a 500m model while being maybe 100m or less matmul-wise? Thanks.