SAUTI ASR

v1.0 — Swahili

Live

Swahili speech-to-text, fine-tuned for the way Swahili is actually spoken. Halved the error rate of multilingual baselines.

Fine-tuning100%

Evaluation (FLEURS)100%

API integration100%

Streaming decode30%

What it does

SAUTI ASR transcribes spoken Swahili audio into text. Fine-tuned specifically for Swahili, it achieves 13.5% word error rate — a 50% improvement over multilingual alternatives.

How it works

Send a POST request with audio and receive a Swahili transcript. The API accepts WAV, MP3, and other common formats. Language is auto-detected but can be forced to Swahili for better accuracy.

Results

| System | Word Error Rate | |--------|-----------------| | Multilingual baseline | 27.2% | | SAUTI ASR v1 | 13.5% |

Capabilities

Batch transcription via REST API (upload audio, receive text)
Support for common audio formats (WAV, MP3, OGG, FLAC)
Forced language decoding for Swahili
Real-time streaming transcription via WebSocket (planned)

Availability

Model published on [HuggingFace](https://huggingface.co/Finiflowlabs/sauti-asr-v1). Try it in the [Speech to Text playground](/speech-to-text).