All products

Voice Agent

Live — Demo Available

Live

Full Swahili voice agent: speak Swahili, get an AI response in Swahili. Combines ASR + LLM + TTS in a seamless pipeline.

ASR integration100%
LLM orchestration100%
TTS integration100%
Web demo100%
Streaming responses30%

What it does

The SAUTI Voice Agent is a full conversational AI system for Swahili. Speak in Swahili, and it responds in Swahili — combining automatic speech recognition, a large language model, and text-to-speech in one seamless pipeline.

Target use cases

  • **Customer service automation:** Handle common queries — account balances, service status, appointment scheduling — in natural Swahili.
  • **Health information:** Deliver health guidance and appointment reminders via voice.
  • **Banking assistant:** Mobile banking operations via voice for users who prefer Swahili.
  • **General assistant:** A helpful Swahili-speaking AI for any question.

Architecture

The Voice Agent operates as a pipeline with three stages per conversation turn:

  1. **Listen:** Whisper ASR transcribes the user's speech to text.
  2. **Think:** An LLM (Claude, GPT-4o, or Llama 3 via Groq) generates a contextual response in Swahili.
  3. **Speak:** SAUTI TTS synthesises the response into natural Swahili speech.

Demo scenarios

Three preconfigured scenarios are available: - General assistant — answers any question in Swahili - Banking assistant — mobile banking operations in Swahili - Health advisor — health information in Swahili

LLM-agnostic design

The voice agent supports pluggable LLM backends: - Claude (default) — Anthropic's Claude API - OpenAI — GPT-4o and GPT-4o-mini - Groq — Ultra-fast Llama 3 inference

Current status

Core pipeline (ASR → LLM → TTS) is live behind an authenticated API, with both text and audio request modes and a working web demo. Streaming responses for sub-second latency are on the roadmap.