ADR: Multilingual Agent Strategy¶
Date: January 2026 Status: Approved Decision: ONE multilingual agent per clinic with auto-detection
Context¶
VitaraVox v1.0 supports English and Mandarin for BC healthcare clinics. The question: should we use separate agents per language or a single multilingual agent?
Decision¶
Use ONE multilingual Vapi.ai assistant per clinic with Deepgram Nova-2 "multi" mode for automatic language detection.
Architecture¶
+------------------------------------------------------------------+
| |
| Patient Call Flow |
| ================ |
| |
| Patient dials clinic number |
| | |
| v |
| +---------------+ |
| | Telnyx | <-- Receives call, routes to Vapi.ai |
| +-------+-------+ |
| | |
| v |
| +------------------------------------------------------------------+
| | |
| | SINGLE MULTILINGUAL VAPI.AI ASSISTANT |
| | |
| | +----------------------------------------------------------+ |
| | | Transcriber: Deepgram Nova-2 (language: "multi") | |
| | | --> Auto-detects: English, Mandarin (v1.0) | |
| | | --> Seamless mid-conversation switching | |
| | +----------------------------------------------------------+ |
| | |
| | +----------------------------------------------------------+ |
| | | LLM: GPT-4o (streaming enabled) | |
| | | --> Multilingual system prompt | |
| | | --> Tool calls to OSCAR EMR | |
| | +----------------------------------------------------------+ |
| | |
| | +----------------------------------------------------------+ |
| | | Voice: Azure TTS (multilingual-auto) | |
| | | --> en-US-AriaNeural (English) | |
| | | --> zh-CN-XiaoxiaoNeural (Mandarin) | |
| | +----------------------------------------------------------+ |
| | |
| +------------------------------------------------------------------+
| | |
| v Tool Calls (real-time during conversation) |
| +---------------+ |
| | OSCAR | <-- check_availability, book_appointment |
| | EMR | |
| +---------------+ |
| |
+------------------------------------------------------------------+
Comparison¶
| Factor | Single Multilingual | Multiple Language Agents |
|---|---|---|
| Agents per clinic | 1 | 5 (one per language) |
| Total for 5 clinics | 5 | 25 |
| Configuration burden | Low | 5x higher |
| Language switching | Seamless | Requires transfer (200-500ms) |
| Mid-conversation mix | Supported | Complex routing needed |
| IVR complexity | None (auto-detect) | "Press 1 for English..." |
| Maintenance | Single prompt | 5 prompts to sync |
Rationale¶
1. Patient Demographics in BC¶
Many BC healthcare patients are bilingual (English/Mandarin). They may:
- Start in one language, switch mid-sentence
- Use English for medical terms, Mandarin for personal details
- Have family members on the call speaking different languages
Single multilingual agent handles this naturally.
2. Latency¶
Target: <800ms end-to-end voice response
| Approach | Latency Impact |
|---|---|
| Single agent | Optimal streaming, no transfers |
| Multiple agents | +200-500ms per transfer/handoff |
3. Configuration¶
v1.0 uses manual Vapi.ai setup:
| Approach | Setup Time (5 clinics) |
|---|---|
| Single multilingual | 5 assistants = ~2 hours |
| Per-language agents | 25 assistants = ~10 hours |
4. User Experience¶
Single Multilingual Agent:
Patient: "Hello, I'd like to book an appointment"
Agent: "Hello! I'd be happy to help you book an appointment..."
Patient: "其实我想用中文" (Actually I want to use Chinese)
Agent: "没问题!请问您想预约什么时间?" (No problem! When would you like to book?)
Multiple Language Agents:
IVR: "Press 1 for English, 2 for 中文..."
Patient presses 1
English Agent: "Hello! How can I help?"
Patient: "Actually, can I switch to Chinese?"
Agent: "Let me transfer you..."
[200-500ms silence]
Chinese Agent: "您好,请问有什么可以帮您的?"
Patient: [repeats request]
Vapi.ai Configuration¶
Transcriber (Deepgram)¶
{
"transcriber": {
"provider": "deepgram",
"model": "nova-2",
"language": "multi",
"smartFormat": true
}
}
Voice (Azure TTS)¶
{
"voice": {
"provider": "azure",
"voiceId": "en-US-AriaNeural",
"multilingualSettings": {
"enabled": true,
"fallbackVoices": {
"zh-CN": "zh-CN-XiaoxiaoNeural"
}
}
}
}
Language Roadmap¶
| Version | Languages |
|---|---|
| v1.0 | English, Mandarin |
| v2.0 | + French, Hindi, Punjabi |
The single-agent architecture scales to additional languages by updating the Deepgram language list and adding Azure voice fallbacks.
Consequences¶
Positive:
- Simpler architecture (fewer moving parts)
- Better patient experience (no IVR, seamless switching)
- Lower operational overhead
- Faster onboarding for new clinics
Negative:
- Single point of failure per clinic (mitigated by Vapi.ai reliability)
- System prompt complexity (one prompt handles all languages)
- Testing matrix grows with each language added