Voice Agent

Live — Demo Available

Live

Full Swahili voice agent: speak Swahili, get an AI response in Swahili. Combines ASR + LLM + TTS in a seamless pipeline.

ASR integration100%

LLM orchestration100%

TTS integration100%

Web demo100%

Streaming responses30%

What it does

The SAUTI Voice Agent is a full conversational AI system for Swahili. Speak in Swahili, and it responds in Swahili — combining automatic speech recognition, a large language model, and text-to-speech in one seamless pipeline.

Target use cases

**Customer service automation:** Handle common queries — account balances, service status, appointment scheduling — in natural Swahili.
**Health information:** Deliver health guidance and appointment reminders via voice.
**Banking assistant:** Mobile banking operations via voice for users who prefer Swahili.
**General assistant:** A helpful Swahili-speaking AI for any question.

Architecture

The Voice Agent operates as a pipeline with three stages per conversation turn:

**Listen:** Whisper ASR transcribes the user's speech to text.
**Think:** An LLM (Claude, GPT-4o, or Llama 3 via Groq) generates a contextual response in Swahili.
**Speak:** SAUTI TTS synthesises the response into natural Swahili speech.

Demo scenarios

Three preconfigured scenarios are available: - General assistant — answers any question in Swahili - Banking assistant — mobile banking operations in Swahili - Health advisor — health information in Swahili

LLM-agnostic design

The voice agent supports pluggable LLM backends: - Claude (default) — Anthropic's Claude API - OpenAI — GPT-4o and GPT-4o-mini - Groq — Ultra-fast Llama 3 inference

Current status

Core pipeline (ASR → LLM → TTS) is live behind an authenticated API, with both text and audio request modes and a working web demo. Streaming responses for sub-second latency are on the roadmap.