ADR: Vapi.ai Integration Architecture¶
HISTORICAL — Architecture evolved through v1.0 → v3.0
This document traces the evolution of VitaraVox's voice architecture. The current production system is v3.0 (9-agent dual-track). See Squad Architecture for the current state.
Date: January 2026 (v1.0), Updated February 2026 (v2.0, v3.0) Status: Updated Decision: v3.0 adopts 9-agent dual-track architecture with explicit language gate and per-language STT/TTS
Context¶
VitaraPlatform supports these use cases:
- New patient registration
- Appointment booking
- Appointment update/cancellation
- Patient identification (v2.0)
- Clinic settings check (v2.0)
- Bilingual EN/ZH service (v3.0)
Question: Should each use case have a separate specialized agent, or should one agent handle all use cases? And how should multiple languages be handled?
Decision History¶
v1.0 Decision (January 2026)¶
Use ONE agent per clinic that handles all three use cases. Each agent is multilingual (English + Mandarin).
Rejected: Squad of 9 specialized agents (3 use cases × 3 languages).
v2.0 / v2.3.0 Decision (February 2026)¶
Adopt 6-agent squad architecture with specialized roles:
- Squad Leader Router - Greeting, caller phone auto-detection, clinic check, emergency/frustration detection
- Appointment Booking - Immediate slot finding (no filters), preference refinement
- Appointment Reschedule - Find existing appointments, reschedule with inline confirmation
- Appointment Cancel - Cancel with reason capture, inline confirmation
- Patient Registration - Register new patients with phonetic confirmation, inline confirmation
- Confirmation - Fallback confirmation agent (rarely used; most flows confirm inline)
Languages: English, Mandarin, Cantonese, French, Punjabi with automatic detection and mid-conversation switching.
Reason for change: Patient identification merged into Router (auto-detect via Telnyx caller ID). Booking/reschedule/cancel split into separate agents for cleaner prompts. Inline confirmations eliminate unnecessary transfers. v2.2 adds caller phone auto-detection, past-date clamping, provider name fuzzy matching, and booking-first flow (find slot immediately, refine only on request).
v3.0 Decision (February 2026)¶
Adopt 9-agent dual-track architecture with explicit language gate:
- Router - Language gate ONLY: detect EN/ZH from first utterance, route to correct track
- Patient-ID (EN) - Identify caller by phone, detect intent, route within EN track
- Patient-ID (ZH) - Same role in Mandarin
- Booking (EN/ZH) - Find slots, book appointments
- Modification (EN/ZH) - Consolidated reschedule + cancel + check
- Registration (EN/ZH) - Register new patients
Key changes from v2.3.0:
- Confirmation agent eliminated —
log_call_metadataabsorbed into every role agent - Reschedule + Cancel consolidated into single Modification agent per track
- Router simplified to pure language gate (no patient lookup, no intent detection)
- Patient-ID separated from Router into dedicated agent per language
- Per-language STT: Deepgram nova-2
en/zh(vs universalmulti) - Per-language TTS: ElevenLabs (EN) / Azure XiaoxiaoNeural (ZH)
- ZH endpointing tuned: Longer pauses for Mandarin speech patterns (1.0s wait vs 0.6s)
- Config-as-code: Managed via Vapi GitOps instead of manual dashboard
Reason for change: Auto-detect multilingual agents struggled with LLM language confusion (GPT-4o sometimes output space-separated Chinese characters). Dedicated language tracks with per-language STT/TTS provide higher accuracy. The Router is now a lightweight language gate (<150 tokens), keeping latency minimal. Consolidating Reschedule+Cancel into Modification reduces the number of unique agent roles from 5 to 4 while adding only 1 more agent total (9 vs 6) due to the dual-track split.
Architecture Comparison¶
Approved: Single Agent Per Clinic¶
+------------------------------------------------------------------+
| |
| Clinic Phone Number (+1-604-555-1234) |
| | |
| v |
| +------------------------------------------------------------------+
| | |
| | SINGLE MULTILINGUAL ASSISTANT |
| | |
| | Handles: |
| | - New patient registration |
| | - Appointment booking |
| | - Appointment update/cancellation |
| | |
| | Languages: English, Mandarin (auto-detect) |
| | |
| +------------------------------------------------------------------+
| |
| Total Assistants: 1 per clinic |
| Total for 5 pilot clinics: 5 assistants |
| |
+------------------------------------------------------------------+
Rejected: Squad of Specialized Agents¶
+------------------------------------------------------------------+
| |
| Clinic Phone Number (+1-604-555-1234) |
| | |
| v |
| IVR: "Press 1 for English, 2 for Chinese, 3 for French" |
| | |
| v |
| IVR: "Press 1 to register, 2 to book, 3 to update/cancel" |
| | |
| +----+----+----+----+----+----+----+----+ |
| v v v v v v v v v |
| +-------+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
| | EN | | EN | | EN | | FR | | FR | | FR | | ZH | | ZH |
| | Reg | |Book| | Upd| |Reg | |Book| |Upd | |Reg | |Book|
| +-------+ +----+ +----+ +----+ +----+ +----+ +----+ +----+
| |
| Total Assistants: 9 per clinic |
| Total for 5 pilot clinics: 45 assistants |
| |
+------------------------------------------------------------------+
Rationale¶
1. Latency (Critical)¶
Target: <800ms end-to-end voice latency
| Approach | Latency Impact |
|---|---|
| Single agent | Optimal - no transfers |
| Squad approach | +200-500ms per agent transfer |
If a patient says "I want to book, but also update my contact info" during booking, the squad approach requires mid-call transfer, adding latency and risking context loss.
2. Configuration Complexity¶
v1.0 involves manual Vapi.ai configuration:
| Approach | Assistants | Setup Time |
|---|---|---|
| Single agent | 5 (5 clinics × 1) | 4-5 hours |
| Squad approach | 45 (5 clinics × 9) | 12-15 hours |
v1.0 pilot savings: ~$800-1000 in configuration time.
3. User Experience¶
Single Agent:
Agent: "Hello! I can help with registration, booking, or
updating appointments. What would you like to do?"
Patient: "I'd like to book an appointment"
Agent: "Great! Let me check availability..."
Patient: "Actually, can you also update my phone number?"
Agent: "Of course! What's your new phone number?"
Squad Approach:
IVR: "Press 1 for English..."
IVR: "Press 2 to book appointments..."
Booking Agent: "I'll help you book an appointment..."
Patient: "Can you also update my phone number?"
Agent: "Let me transfer you to our update agent..."
[200-500ms silence, context lost]
Update Agent: "Hello! What would you like to update?"
Patient: [repeats everything]
4. Technical Architecture¶
All 3 use cases share:
- Same OSCAR EMR API connection
- Same clinic business hours logic
- Same handoff phone number
- Same webhook handler
// Single webhook handles all use cases
switch (functionName) {
case 'register_patient':
return registerPatient(clinic, args);
case 'book_appointment':
return bookAppointment(clinic, args);
case 'update_appointment':
return updateAppointment(clinic, args);
}
No architectural benefit to separating use cases into different agents.
5. System Prompt Complexity¶
Modern LLMs (GPT-4) handle multi-intent prompts easily:
| Approach | Total Prompt Lines |
|---|---|
| Single agent | ~750 lines (all workflows) |
| Squad (3 agents) | ~650 lines (200+250+200) |
Marginal difference, but squad adds transfer logic complexity.
When Squad Approach Makes Sense¶
Squad of specialized agents is appropriate when:
| Criteria | VitaraVox v1.0 |
|---|---|
| Vastly different complexity | No - all similar |
| Different knowledge bases | No - same OSCAR data |
| Different LLM models | No - all GPT-4 |
| Compliance/legal separation | No |
| Distinct user personas | No - all patient-facing |
None of these criteria apply to v1.0.
v2.2 Squad Architecture (Historical)¶
+------------------------------------------------------------------+
| |
| CALL START (Telnyx inbound) |
| │ |
| v |
| ┌─────────────────────────────────────────────┐ |
| │ 1. ROUTER (vitara-router-v2) │ |
| │ • Auto-detect caller via Telnyx phone number │ |
| │ • search_patient_by_phone (server extracts │ |
| │ real phone from call.customer.number) │ |
| │ • Emergency detection, frustration detection │ |
| │ • Routes to appropriate specialist │ |
| └─────────────┬───────────────────────────────┘ |
| │ |
| ┌─────────┼─────────┬──────────┐ |
| v v v v |
| ┌─────────┐ ┌─────────┐ ┌────────┐ ┌──────────┐ |
| │2.BOOKING│ │3.RESCHED│ │4.CANCEL│ │5.REGISTER│ |
| │ │ │ │ │ │ │ │ |
| │ Finds │ │ Gets │ │ Finds │ │ Collects │ |
| │ earliest│ │ patient │ │ appts, │ │ info, │ |
| │ slot │ │ appts, │ │ cancels│ │ registers│ |
| │ immedi- │ │ finds │ │ with │ │ in OSCAR │ |
| │ ately │ │ new slot│ │ inline │ │ with │ |
| │ (no │ │ with │ │ confirm│ │ inline │ |
| │ filters)│ │ inline │ │ │ │ confirm │ |
| │ │ │ confirm │ │ │ │ │ |
| └─────────┘ └─────────┘ └────────┘ └──────────┘ |
| │ |
| v |
| ┌─────────────────────────────────────────────┐ |
| │ 6. CONFIRMATION (fallback, rarely used) │ |
| │ Most flows handle confirmation inline │ |
| └─────────────────────────────────────────────┘ |
| |
| Silent Handoffs: NEVER mention "transferring" or agent names |
| Caller ID: Auto-detected from Telnyx metadata (server-side) |
| Date Awareness: Today's date injected into all prompts |
| Languages: English, Mandarin, Cantonese, French, Punjabi |
| |
+------------------------------------------------------------------+
v2.2 Tool Distribution¶
| Agent | Tools | Key Behaviors |
|---|---|---|
| Router | search_patient_by_phone, transferToHuman |
Server overrides phone arg with real Telnyx caller ID |
| Booking | find_earliest_appointment, create_appointment, get_providers |
Finds slot immediately with no filters; refines on request only |
| Reschedule | get_appointments, find_earliest_appointment, update_appointment |
Inline confirmation (no handoff) |
| Cancel | get_appointments, cancel_appointment |
Inline confirmation with reason capture |
| Registration | register_new_patient, add_to_waitlist |
Inline confirmation (no handoff) |
| Confirmation | confirm_appointment |
Fallback only — most flows confirm inline |
v2.2 Key Design Decisions¶
| Decision | Rationale |
|---|---|
| Caller phone auto-detection | LLM hallucinates phone numbers. Server extracts real number from call.customer.number (Telnyx metadata), ignoring whatever the LLM sends. |
| Booking-first flow | Patient calls to book → immediately find earliest slot with ANY provider. Only apply filters (doctor, date, time-of-day) if patient requests changes. |
| Single-slot returns | find_earliest_appointment returns exactly 1 slot. If rejected, patient provides reason → excludeDates grows → search again. Prevents decision paralysis. |
| Inline confirmation | Reschedule, cancel, and registration handle confirmation themselves ("Is there anything else?"). Eliminates unnecessary transfer to confirmation agent. |
| Silent transfers | All prompts include "NEVER mention transferring, assistant names, or internal routing." Patient perceives one continuous conversation. |
| Past-date clamping | GPT-4o doesn't know today's date. Server clamps any startDate before today to today. All prompts also include Today's date is YYYY-MM-DD. |
| Provider name fuzzy matching | Patient says "Dr. Chen" → server strips "Dr." prefix, fuzzy-matches against provider list, resolves to provider ID. |
v3.0 Dual-Track Architecture (Deployed)¶
+------------------------------------------------------------------+
| |
| CALL START (Telnyx inbound) |
| | |
| v |
| +--------------------------------------------------+ |
| | 1. ROUTER (vitara-router-v3) | |
| | STT: AssemblyAI Universal (bilingual) | |
| | ONLY JOB: Detect language, route to track | |
| +---------------------+----------------------------+ |
| | |
| +-----------+-----------+ |
| v v |
| ENGLISH TRACK CHINESE TRACK |
| | | |
| v v |
| +-------------------+ +-------------------+ |
| | PATIENT-ID-EN | | PATIENT-ID-ZH | |
| | STT: Deepgram en | | STT: Deepgram zh | |
| | TTS: ElevenLabs | | TTS: Azure Xiaoxiao| |
| +--------+----------+ +--------+----------+ |
| | | |
| +------+------+ +------+------+ |
| v v v v v v |
| +----+ +----+ +-----+ +----+ +----+ +-----+ |
| |BOOK| |MOD | |REG | |BOOK| |MOD | |REG | |
| | EN | | EN | | EN | | ZH | | ZH | | ZH | |
| +----+ +----+ +-----+ +----+ +----+ +-----+ |
| |
| Config: Vapi GitOps (slug-based, environment separation) |
| Squad ID: 13fdfd19-a2cd-4ca4-8e14-ad2275095e32 |
| Handoffs: 20 routes (see voice-agent.md for full matrix) |
| |
+------------------------------------------------------------------+
v3.0 Key Design Decisions¶
| Decision | Rationale |
|---|---|
| Explicit language gate | GPT-4o with auto-detect multilingual sometimes output space-separated Chinese. Dedicated tracks eliminate LLM language confusion. |
| Per-language STT | Deepgram nova-2 en/zh outperforms universal multi mode for each individual language. AssemblyAI Universal on Router handles the bilingual detection. |
| Azure TTS for Chinese | ElevenLabs eleven_multilingual_v2 produces adequate but not native-quality Mandarin. Azure zh-CN-XiaoxiaoNeural is purpose-built for CJK. |
| Consolidated Modification | Reschedule and Cancel share 6 of 8 tools. A single Modification agent per track is simpler than two separate agents with near-identical tool sets. |
| Confirmation eliminated | Every role agent now calls log_call_metadata directly. No need for a handoff to a separate confirmation agent. |
| Tuned ZH endpointing | Mandarin speech uses longer pauses between phrases. Standard EN timing (0.3s punct, 0.8s no-punct) caused premature cutoffs. ZH uses 0.6s/1.5s. |
| Router as pure language gate | Router temperature 0.3, max 150 tokens. Does NOT look up patients or detect intent — just language. Minimizes latency at the entry point. |
| Vapi GitOps | Config-as-code replaces manual dashboard editing. Slug-based tool references, environment separation (dev/staging/prod). |
v3.0 vs v2.3.0 Trade-offs¶
| Factor | v2.3.0 (6 agents) | v3.0 (9 agents) |
|---|---|---|
| Agent count | 6 | 9 (+50%) |
| Language accuracy | Good (auto-detect) | Better (per-language STT) |
| Chinese voice quality | Adequate (ElevenLabs) | Native (Azure) |
| Mid-call language switch | Seamless | Stays in initial track |
| Config management | Manual dashboard | GitOps (repeatable) |
| Prompt complexity | Higher (multilingual) | Lower (monolingual per agent) |
| Handoff routes | 8 | 20 |
| LLM language confusion | Occasional | Eliminated |
v1.0 Architecture (Deprecated)¶
Single Agent Per Clinic (v1.0)¶
+------------------------------------------------------------------+
| |
| Clinic Phone Number (+1-604-555-1234) |
| | |
| v |
| +------------------------------------------------------------------+
| | |
| | SINGLE MULTILINGUAL ASSISTANT |
| | |
| | Handles: |
| | - New patient registration |
| | - Appointment booking |
| | - Appointment update/cancellation |
| | |
| | Languages: English, Mandarin (auto-detect) |
| | |
| +------------------------------------------------------------------+
| |
| Total Assistants: 1 per clinic |
| Total for 5 pilot clinics: 5 assistants |
| |
+------------------------------------------------------------------+
Implementation Guidelines (v2.0)¶
System Prompt Structure¶
Each agent prompt follows this structure:
## IDENTITY
[Who the agent is]
## LANGUAGE HANDLING
**Language shift can happen at any point...**
**Languages:** English and Mandarin Chinese (普通话)
**Current Time:** {{now | date: "%B %d, %Y %I:%M %p", "America/Vancouver"}}
## STYLE
[Tone and communication guidelines]
## TASK & GOALS
[Step-by-step workflow]
## ERROR HANDLING
[Recovery flows]
Database Schema¶
-- Squad ID per clinic (v2.0)
CREATE TABLE clinics (
id UUID PRIMARY KEY,
vapi_squad_id VARCHAR(100), -- Squad ID (not assistant)
vapi_phone_number VARCHAR(20),
...
);
Consequences¶
v2.3.0¶
Positive:
- Caller auto-identification via Telnyx eliminates manual phone entry
- Booking-first flow reduces average call duration
- Specialized prompts per workflow (cleaner, easier to tune)
- Inline confirmations reduce transfers and perceived latency
- Silent handoffs make it feel like one continuous conversation
- Server-side guardrails (date clamping, phone override) compensate for LLM weaknesses
Negative:
- 6x more assistants to manage per clinic
- More complex squad configuration on Vapi dashboard
- Context passing between agents required (Vapi handles via
assistantOverrides) - Prompt updates require API calls to all 6 assistants
v3.0 (Additional)¶
Positive:
- Per-language STT/TTS eliminates LLM language confusion
- Native Mandarin voice quality (Azure XiaoxiaoNeural)
- Monolingual prompts are shorter and more focused
- GitOps enables repeatable, version-controlled configuration
- Consolidated Modification agent reduces unique role count
callMetadataCacheensures call logs always have language/outcome data
Negative:
- 9 assistants per clinic (50% more than v2.3.0)
- 20 handoff routes to configure and test
- Mid-call language switching not supported (stays in initial track)
- Two parallel prompts to maintain per role (EN + ZH)
- Dual squad support requires
vapi_squad_id_v3DB column