Voice Agent
Live — Demo Available
Full Swahili voice agent: speak Swahili, get an AI response in Swahili. Combines ASR + LLM + TTS in a seamless pipeline.
What it does
The SAUTI Voice Agent is a full conversational AI system for Swahili. Speak in Swahili, and it responds in Swahili — combining automatic speech recognition, a large language model, and text-to-speech in one seamless pipeline.
Target use cases
- **Customer service automation:** Handle common queries — account balances, service status, appointment scheduling — in natural Swahili.
- **Health information:** Deliver health guidance and appointment reminders via voice.
- **Banking assistant:** Mobile banking operations via voice for users who prefer Swahili.
- **General assistant:** A helpful Swahili-speaking AI for any question.
Architecture
The Voice Agent operates as a pipeline with three stages per conversation turn:
- **Listen:** Whisper ASR transcribes the user's speech to text.
- **Think:** An LLM (Claude, GPT-4o, or Llama 3 via Groq) generates a contextual response in Swahili.
- **Speak:** SAUTI TTS synthesises the response into natural Swahili speech.
Demo scenarios
Three preconfigured scenarios are available: - General assistant — answers any question in Swahili - Banking assistant — mobile banking operations in Swahili - Health advisor — health information in Swahili
LLM-agnostic design
The voice agent supports pluggable LLM backends: - Claude (default) — Anthropic's Claude API - OpenAI — GPT-4o and GPT-4o-mini - Groq — Ultra-fast Llama 3 inference
Current status
Core pipeline (ASR → LLM → TTS) is live behind an authenticated API, with both text and audio request modes and a working web demo. Streaming responses for sub-second latency are on the roadmap.