OpenMOSS-Team/MOSS-TTS-v1.5 · Hugging Face

Posted by pmttyji@reddit | LocalLLaMA | View on Reddit | 12 comments

MOSS-TTS-v1.5

MOSS-TTS-v1.5 is continued from MOSS-TTS 1.0. It preserves the main 1.0 capabilities, including zero-shot voice cloning, long-form speech generation, token-level duration control, Pinyin/IPA pronunciation control, multilingual synthesis, and code-switching. For the full 1.0 feature walkthrough, input schema, decoding hyperparameters, and evaluation tables, please refer to the MOSS-TTS 1.0 README.

Compared with MOSS-TTS 1.0, v1.5 focuses on the following improvements:

Supported Languages

MOSS-TTS-v1.5 currently supports 31 languages. It keeps the 20 languages supported by MOSS-TTS 1.0 and extends multilingual continued training to additional languages including Cantonese, Dutch, Finnish, Hindi, Macedonian, Malay, Romanian, Swahili, Tagalog, Thai, and Vietnamese.

They released additional model as well.

https://huggingface.co/OpenMOSS-Team/MOSS-SoundEffect-v2.0