SAUTI ASR
v1.0 — Swahili
Swahili speech-to-text, fine-tuned for the way Swahili is actually spoken. Halved the error rate of multilingual baselines.
What it does
SAUTI ASR transcribes spoken Swahili audio into text. Fine-tuned specifically for Swahili, it achieves 13.5% word error rate — a 50% improvement over multilingual alternatives.
How it works
Send a POST request with audio and receive a Swahili transcript. The API accepts WAV, MP3, and other common formats. Language is auto-detected but can be forced to Swahili for better accuracy.
Results
| System | Word Error Rate | |--------|-----------------| | Multilingual baseline | 27.2% | | SAUTI ASR v1 | 13.5% |
Capabilities
- Batch transcription via REST API (upload audio, receive text)
- Support for common audio formats (WAV, MP3, OGG, FLAC)
- Forced language decoding for Swahili
- Real-time streaming transcription via WebSocket (planned)
Availability
Model published on [HuggingFace](https://huggingface.co/Finiflowlabs/sauti-asr-v1). Try it in the [Speech to Text playground](/speech-to-text).