Voice Cloning

Beta

Zero-shot voice cloning from a short audio sample. Upload reference audio, get a personalised voice you can drive through the TTS API.

Speaker embedding extraction100%

Clone synthesis100%

API integration100%

Web demo100%

Persistence & sharing40%

What it does

Upload a short clip of any voice and hear an AI speak in that voice. SAUTI Voice Cloning extracts a speaker profile from your reference audio and uses it to synthesize new speech that sounds like the original speaker.

How it works

**Upload:** Provide 6-30 seconds of clear reference audio.
**Clone:** The system extracts a speaker profile using a neural encoder.
**Synthesize:** Use the cloned voice to speak any text in Swahili or English.

Use cases

**Brand voices:** Create a consistent AI voice identity for your brand.
**Accessibility:** Preserve voices for individuals with speech conditions.
**Personalization:** Let users hear AI responses in their own voice or a familiar voice.
**Content creation:** Generate voiceovers in specific voices for media production.

Current status

Live in beta. The POST /v1/voice-clone/ endpoint accepts a reference clip and returns a voice_id that can be passed straight to the TTS endpoint. Try it in the [Voice Clone playground](/voice-clone).