Voice Cloning
Beta
Zero-shot voice cloning from a short audio sample. Upload reference audio, get a personalised voice you can drive through the TTS API.
Speaker embedding extraction100%
Clone synthesis100%
API integration100%
Web demo100%
Persistence & sharing40%
What it does
Upload a short clip of any voice and hear an AI speak in that voice. SAUTI Voice Cloning extracts a speaker profile from your reference audio and uses it to synthesize new speech that sounds like the original speaker.
How it works
- **Upload:** Provide 6-30 seconds of clear reference audio.
- **Clone:** The system extracts a speaker profile using a neural encoder.
- **Synthesize:** Use the cloned voice to speak any text in Swahili or English.
Use cases
- **Brand voices:** Create a consistent AI voice identity for your brand.
- **Accessibility:** Preserve voices for individuals with speech conditions.
- **Personalization:** Let users hear AI responses in their own voice or a familiar voice.
- **Content creation:** Generate voiceovers in specific voices for media production.
Current status
Live in beta. The POST /v1/voice-clone/ endpoint accepts a reference clip and returns a voice_id that can be passed straight to the TTS endpoint. Try it in the [Voice Clone playground](/voice-clone).