Live Demo

Text to Speech

Type any Swahili sentence and hear it spoken using our text-to-speech model, optimised for natural Swahili synthesis.

Enter Swahili text43 / 500

Try an example:

ModelSAUTI TTS — end-to-end VITS model fine-tuned for natural Swahili synthesis. View model card

ArchitectureEnd-to-end VITS: text → phonemes → mel spectrogram → waveform. Single forward pass, no vocoder needed.

HostingFirst request may take ~30s for cold start, then ~400ms per synthesis.