Skip to content

ADR: Multilingual Agent Strategy

Date: January 2026 Status: Approved Decision: ONE multilingual agent per clinic with auto-detection


Context

VitaraVox v1.0 supports English and Mandarin for BC healthcare clinics. The question: should we use separate agents per language or a single multilingual agent?


Decision

Use ONE multilingual Vapi.ai assistant per clinic with Deepgram Nova-2 "multi" mode for automatic language detection.


Architecture

+------------------------------------------------------------------+
|                                                                  |
|   Patient Call Flow                                              |
|   ================                                               |
|                                                                  |
|   Patient dials clinic number                                    |
|           |                                                      |
|           v                                                      |
|   +---------------+                                              |
|   |    Telnyx     |   <-- Receives call, routes to Vapi.ai       |
|   +-------+-------+                                              |
|           |                                                      |
|           v                                                      |
|   +------------------------------------------------------------------+
|   |                                                                  |
|   |          SINGLE MULTILINGUAL VAPI.AI ASSISTANT                   |
|   |                                                                  |
|   |   +----------------------------------------------------------+   |
|   |   |  Transcriber: Deepgram Nova-2 (language: "multi")        |   |
|   |   |  --> Auto-detects: English, Mandarin (v1.0)              |   |
|   |   |  --> Seamless mid-conversation switching                 |   |
|   |   +----------------------------------------------------------+   |
|   |                                                                  |
|   |   +----------------------------------------------------------+   |
|   |   |  LLM: GPT-4o (streaming enabled)                         |   |
|   |   |  --> Multilingual system prompt                          |   |
|   |   |  --> Tool calls to OSCAR EMR                             |   |
|   |   +----------------------------------------------------------+   |
|   |                                                                  |
|   |   +----------------------------------------------------------+   |
|   |   |  Voice: Azure TTS (multilingual-auto)                    |   |
|   |   |  --> en-US-AriaNeural (English)                          |   |
|   |   |  --> zh-CN-XiaoxiaoNeural (Mandarin)                     |   |
|   |   +----------------------------------------------------------+   |
|   |                                                                  |
|   +------------------------------------------------------------------+
|           |                                                      |
|           v  Tool Calls (real-time during conversation)          |
|   +---------------+                                              |
|   |    OSCAR      |   <-- check_availability, book_appointment   |
|   |     EMR       |                                              |
|   +---------------+                                              |
|                                                                  |
+------------------------------------------------------------------+

Comparison

Factor Single Multilingual Multiple Language Agents
Agents per clinic 1 5 (one per language)
Total for 5 clinics 5 25
Configuration burden Low 5x higher
Language switching Seamless Requires transfer (200-500ms)
Mid-conversation mix Supported Complex routing needed
IVR complexity None (auto-detect) "Press 1 for English..."
Maintenance Single prompt 5 prompts to sync

Rationale

1. Patient Demographics in BC

Many BC healthcare patients are bilingual (English/Mandarin). They may:

  • Start in one language, switch mid-sentence
  • Use English for medical terms, Mandarin for personal details
  • Have family members on the call speaking different languages

Single multilingual agent handles this naturally.

2. Latency

Target: <800ms end-to-end voice response

Approach Latency Impact
Single agent Optimal streaming, no transfers
Multiple agents +200-500ms per transfer/handoff

3. Configuration

v1.0 uses manual Vapi.ai setup:

Approach Setup Time (5 clinics)
Single multilingual 5 assistants = ~2 hours
Per-language agents 25 assistants = ~10 hours

4. User Experience

Single Multilingual Agent:

Patient: "Hello, I'd like to book an appointment"
Agent: "Hello! I'd be happy to help you book an appointment..."
Patient: "其实我想用中文" (Actually I want to use Chinese)
Agent: "没问题!请问您想预约什么时间?" (No problem! When would you like to book?)

Multiple Language Agents:

IVR: "Press 1 for English, 2 for 中文..."
Patient presses 1
English Agent: "Hello! How can I help?"
Patient: "Actually, can I switch to Chinese?"
Agent: "Let me transfer you..."
[200-500ms silence]
Chinese Agent: "您好,请问有什么可以帮您的?"
Patient: [repeats request]


Vapi.ai Configuration

Transcriber (Deepgram)

{
  "transcriber": {
    "provider": "deepgram",
    "model": "nova-2",
    "language": "multi",
    "smartFormat": true
  }
}

Voice (Azure TTS)

{
  "voice": {
    "provider": "azure",
    "voiceId": "en-US-AriaNeural",
    "multilingualSettings": {
      "enabled": true,
      "fallbackVoices": {
        "zh-CN": "zh-CN-XiaoxiaoNeural"
      }
    }
  }
}

Language Roadmap

Version Languages
v1.0 English, Mandarin
v2.0 + French, Hindi, Punjabi

The single-agent architecture scales to additional languages by updating the Deepgram language list and adding Azure voice fallbacks.


Consequences

Positive:

  • Simpler architecture (fewer moving parts)
  • Better patient experience (no IVR, seamless switching)
  • Lower operational overhead
  • Faster onboarding for new clinics

Negative:

  • Single point of failure per clinic (mitigated by Vapi.ai reliability)
  • System prompt complexity (one prompt handles all languages)
  • Testing matrix grows with each language added