African Language AI

Voice AI built for Africa's languages

improvement in Swahili TTS naturalness over multilingual baselines

FiniFlow Labs trains, evaluates, and deploys speech AI systems grounded in African linguistic data. Our first system, SAUTI, brings natural Swahili voice synthesis and recognition to production APIs.

Our Mission

African languages deserve first-class AI

The majority of AI speech systems are trained on English and a handful of European languages. African languages — spoken by over a billion people — are largely absent from mainstream model research.

FiniFlow Labs closes that gap. We source native-speaker data, build language-specific training pipelines, and publish models and benchmarks that the broader research community can build on.

Kiswahili

200M+ speakers

Live

Hausa

150M+ speakers

In Development

Yoruba

50M+ speakers

In Development

Amharic

60M+ speakers

In Development

What We Build

The SAUTI platform

Live

SAUTI TTS

Convert written Swahili into natural-sounding speech. Fine-tuned on the Google WAXAL dataset using VITS architecture with LoRA adapters.

View API docs
Beta

SAUTI ASR

Transcribe spoken Swahili audio with low word-error rate. Built on HuggingFace MMS-300M with Swahili-specific fine-tuning.

Request access
In Development

Voice Agent API

A full voice-turn IVR agent for Swahili — combining SAUTI TTS + ASR with a language model backend for real telephony deployments.

Join the waitlist

Products

Platform roadmap

SAUTI TTS

v1.0 — Swahili

Live

Production-grade Swahili text-to-speech. Serves synthesized audio via a low-latency REST API backed by a fine-tuned VITS model.

Model training100%
API integration100%
Multi-speaker voices35%
SSML support20%

SAUTI ASR

v0.5 — Swahili

Beta

Swahili automatic speech recognition with competitive WER on conversational audio. Optimised for telephony-quality 8kHz input.

Model training85%
API integration70%
Streaming decode45%
Noise robustness30%

Voice Agent

Roadmap — Q3 2026

In Development

End-to-end Swahili voice agent for IVR and telephony. Combines TTS, ASR, and an LLM backbone for natural turn-by-turn conversation.

IVR integration40%
Turn management25%
LLM backend15%
Production hardening5%

Research

Grounded in African linguistic data

We build on open datasets and pretrained multilingual models, then apply targeted fine-tuning to close the performance gap between high-resource and African language AI systems. All model weights, training configs, and evaluation results are published openly.

Techniques

VITS fine-tuningLoRA adaptersQLoRA (4-bit)MMS-300MWAXAL datasetData augmentationPhoneme normalisationMonotonic alignmentWER evaluationMOS scoring

Datasets

Google WAXAL — swa_tts (1,778 utterances)WAXAL swa_asr (in preparation)Internal Swahili IVR corpus

Collaborators & acknowledgements

Google WAXAL / African Next Voices

Open Swahili speech dataset

HuggingFace MMS

Massively multilingual pretrained models

Mozilla Common Voice

Community speech data pipeline

Latest Updates

From the lab

All posts
Mar 10, 2026
researchtts

SAUTI TTS v1: Training a VITS model on Swahili from 1,400 utterances

We fine-tuned a VITS TTS model on the WAXAL swa_tts dataset — just 1,387 training samples — and achieved a 3× naturalness improvement over the multilingual MMS-TTS baseline. Here is what we learned.

Read more
Feb 22, 2026
dataresearch

Inside the WAXAL dataset: structure, quirks, and what to watch for

The Google WAXAL Swahili TTS split is small (1,778 utterances), has extreme duration outliers, and uses 48kHz audio. We document every gotcha we hit building our preprocessing pipeline.

Read more
Jan 15, 2026
announcementplatform

Introducing FiniFlow Labs: building African language AI from the ground up

African languages are spoken by over a billion people yet remain largely absent from mainstream AI research. FiniFlow Labs is our answer — a research lab and API platform dedicated to closing that gap.

Read more