The best audio processing library built on Apple's MLX framework, providing fast and efficient text-to-speech (TTS), speech-to-text (STT), and speech-to-speech (STS) on Apple Silicon. Kokoro Fast, ...
Abstract: Recent advances in deep learning technology have enabled high-quality speech synthesis, and text-to-speech models are widely used in a variety of applications. However, even state-of-the-art ...
Abstract: Given the scarcity of Code-Switching (CS) datasets, most researchers synthesize CS speech using multiple monolingual datasets. However, this approach presents challenges in synthesizing CS ...
During the weeks of India AI Impact Summit 2026, Sarvam AI announced a series of launches, updates and partnerships. This article lists updates ranging from speech-to-text and text-to-speech ...
Finally, the code for the web UI client used in the Moshi demo is provided in the client/ directory. If you want to fine tune Moshi, head out to kyutai-labs/moshi ...
The landscape of generative audio is shifting toward efficiency. A new open-source contender, Kani-TTS-2, has been released by the team at nineninesix.ai. This model marks a departure from heavy, ...