ADR: Vapi.ai Integration Architecture¶

HISTORICAL — Architecture evolved through v1.0 → v3.0

This document traces the evolution of VitaraVox's voice architecture. The current production system is v3.0 (9-agent dual-track). See Squad Architecture for the current state.

Date: January 2026 (v1.0), Updated February 2026 (v2.0, v3.0) Status: Updated Decision: v3.0 adopts 9-agent dual-track architecture with explicit language gate and per-language STT/TTS

Context¶

VitaraPlatform supports these use cases:

New patient registration
Appointment booking
Appointment update/cancellation
Patient identification (v2.0)
Clinic settings check (v2.0)
Bilingual EN/ZH service (v3.0)

Question: Should each use case have a separate specialized agent, or should one agent handle all use cases? And how should multiple languages be handled?

Decision History¶

v1.0 Decision (January 2026)¶

Use ONE agent per clinic that handles all three use cases. Each agent is multilingual (English + Mandarin).

Rejected: Squad of 9 specialized agents (3 use cases × 3 languages).

v2.0 / v2.3.0 Decision (February 2026)¶

Adopt 6-agent squad architecture with specialized roles:

Squad Leader Router - Greeting, caller phone auto-detection, clinic check, emergency/frustration detection
Appointment Booking - Immediate slot finding (no filters), preference refinement
Appointment Reschedule - Find existing appointments, reschedule with inline confirmation
Appointment Cancel - Cancel with reason capture, inline confirmation
Patient Registration - Register new patients with phonetic confirmation, inline confirmation
Confirmation - Fallback confirmation agent (rarely used; most flows confirm inline)

Languages: English, Mandarin, Cantonese, French, Punjabi with automatic detection and mid-conversation switching.

Reason for change: Patient identification merged into Router (auto-detect via Telnyx caller ID). Booking/reschedule/cancel split into separate agents for cleaner prompts. Inline confirmations eliminate unnecessary transfers. v2.2 adds caller phone auto-detection, past-date clamping, provider name fuzzy matching, and booking-first flow (find slot immediately, refine only on request).

v3.0 Decision (February 2026)¶

Adopt 9-agent dual-track architecture with explicit language gate:

Router - Language gate ONLY: detect EN/ZH from first utterance, route to correct track
Patient-ID (EN) - Identify caller by phone, detect intent, route within EN track
Patient-ID (ZH) - Same role in Mandarin
Booking (EN/ZH) - Find slots, book appointments
Modification (EN/ZH) - Consolidated reschedule + cancel + check
Registration (EN/ZH) - Register new patients

Key changes from v2.3.0:

Confirmation agent eliminated — log_call_metadata absorbed into every role agent
Reschedule + Cancel consolidated into single Modification agent per track
Router simplified to pure language gate (no patient lookup, no intent detection)
Patient-ID separated from Router into dedicated agent per language
Per-language STT: Deepgram nova-2 en / zh (vs universal multi)
Per-language TTS: ElevenLabs (EN) / Azure XiaoxiaoNeural (ZH)
ZH endpointing tuned: Longer pauses for Mandarin speech patterns (1.0s wait vs 0.6s)
Config-as-code: Managed via Vapi GitOps instead of manual dashboard

Reason for change: Auto-detect multilingual agents struggled with LLM language confusion (GPT-4o sometimes output space-separated Chinese characters). Dedicated language tracks with per-language STT/TTS provide higher accuracy. The Router is now a lightweight language gate (<150 tokens), keeping latency minimal. Consolidating Reschedule+Cancel into Modification reduces the number of unique agent roles from 5 to 4 while adding only 1 more agent total (9 vs 6) due to the dual-track split.

Architecture Comparison¶

Approved: Single Agent Per Clinic¶

+------------------------------------------------------------------+
|                                                                  |
|   Clinic Phone Number (+1-604-555-1234)                          |
|           |                                                      |
|           v                                                      |
|   +------------------------------------------------------------------+
|   |                                                                  |
|   |              SINGLE MULTILINGUAL ASSISTANT                       |
|   |                                                                  |
|   |   Handles:                                                       |
|   |     - New patient registration                                   |
|   |     - Appointment booking                                        |
|   |     - Appointment update/cancellation                            |
|   |                                                                  |
|   |   Languages: English, Mandarin (auto-detect)                     |
|   |                                                                  |
|   +------------------------------------------------------------------+
|                                                                  |
|   Total Assistants: 1 per clinic                                 |
|   Total for 5 pilot clinics: 5 assistants                        |
|                                                                  |
+------------------------------------------------------------------+

Rejected: Squad of Specialized Agents¶

+------------------------------------------------------------------+
|                                                                  |
|   Clinic Phone Number (+1-604-555-1234)                          |
|           |                                                      |
|           v                                                      |
|   IVR: "Press 1 for English, 2 for Chinese, 3 for French"        |
|           |                                                      |
|           v                                                      |
|   IVR: "Press 1 to register, 2 to book, 3 to update/cancel"      |
|           |                                                      |
|           +----+----+----+----+----+----+----+----+               |
|           v    v    v    v    v    v    v    v    v               |
|   +-------+  +---+  +---+  +---+  +---+  +---+  +---+  +---+  +---+
|   | EN    |  | EN |  | EN |  | FR |  | FR |  | FR |  | ZH |  | ZH |
|   | Reg   |  |Book|  | Upd|  |Reg |  |Book|  |Upd |  |Reg |  |Book|
|   +-------+  +----+  +----+  +----+  +----+  +----+  +----+  +----+
|                                                                  |
|   Total Assistants: 9 per clinic                                 |
|   Total for 5 pilot clinics: 45 assistants                       |
|                                                                  |
+------------------------------------------------------------------+

Rationale¶

1. Latency (Critical)¶

Target: <800ms end-to-end voice latency

Approach	Latency Impact
Single agent	Optimal - no transfers
Squad approach	+200-500ms per agent transfer

If a patient says "I want to book, but also update my contact info" during booking, the squad approach requires mid-call transfer, adding latency and risking context loss.

2. Configuration Complexity¶

v1.0 involves manual Vapi.ai configuration:

Approach	Assistants	Setup Time
Single agent	5 (5 clinics × 1)	4-5 hours
Squad approach	45 (5 clinics × 9)	12-15 hours

v1.0 pilot savings: ~$800-1000 in configuration time.

3. User Experience¶

Single Agent:

Agent: "Hello! I can help with registration, booking, or
        updating appointments. What would you like to do?"
Patient: "I'd like to book an appointment"
Agent: "Great! Let me check availability..."
Patient: "Actually, can you also update my phone number?"
Agent: "Of course! What's your new phone number?"

Squad Approach:

IVR: "Press 1 for English..."
IVR: "Press 2 to book appointments..."
Booking Agent: "I'll help you book an appointment..."
Patient: "Can you also update my phone number?"
Agent: "Let me transfer you to our update agent..."
[200-500ms silence, context lost]
Update Agent: "Hello! What would you like to update?"
Patient: [repeats everything]

4. Technical Architecture¶

All 3 use cases share:

Same OSCAR EMR API connection
Same clinic business hours logic
Same handoff phone number
Same webhook handler

// Single webhook handles all use cases
switch (functionName) {
  case 'register_patient':
    return registerPatient(clinic, args);
  case 'book_appointment':
    return bookAppointment(clinic, args);
  case 'update_appointment':
    return updateAppointment(clinic, args);
}

No architectural benefit to separating use cases into different agents.

5. System Prompt Complexity¶

Modern LLMs (GPT-4) handle multi-intent prompts easily:

Approach	Total Prompt Lines
Single agent	~750 lines (all workflows)
Squad (3 agents)	~650 lines (200+250+200)

Marginal difference, but squad adds transfer logic complexity.

When Squad Approach Makes Sense¶

Squad of specialized agents is appropriate when:

Criteria	VitaraVox v1.0
Vastly different complexity	No - all similar
Different knowledge bases	No - same OSCAR data
Different LLM models	No - all GPT-4
Compliance/legal separation	No
Distinct user personas	No - all patient-facing

None of these criteria apply to v1.0.

v2.2 Squad Architecture (Historical)¶

+------------------------------------------------------------------+
|                                                                  |
|   CALL START (Telnyx inbound)                                    |
|       │                                                          |
|       v                                                          |
|   ┌─────────────────────────────────────────────┐               |
|   │ 1. ROUTER (vitara-router-v2)                 │               |
|   │ • Auto-detect caller via Telnyx phone number │               |
|   │ • search_patient_by_phone (server extracts   │               |
|   │   real phone from call.customer.number)      │               |
|   │ • Emergency detection, frustration detection │               |
|   │ • Routes to appropriate specialist           │               |
|   └─────────────┬───────────────────────────────┘               |
|                 │                                                |
|       ┌─────────┼─────────┬──────────┐                          |
|       v         v         v          v                           |
|   ┌─────────┐ ┌─────────┐ ┌────────┐ ┌──────────┐             |
|   │2.BOOKING│ │3.RESCHED│ │4.CANCEL│ │5.REGISTER│             |
|   │         │ │         │ │        │ │          │             |
|   │ Finds   │ │ Gets    │ │ Finds  │ │ Collects │             |
|   │ earliest│ │ patient │ │ appts, │ │ info,    │             |
|   │ slot    │ │ appts,  │ │ cancels│ │ registers│             |
|   │ immedi- │ │ finds   │ │ with   │ │ in OSCAR │             |
|   │ ately   │ │ new slot│ │ inline │ │ with     │             |
|   │ (no     │ │ with    │ │ confirm│ │ inline   │             |
|   │ filters)│ │ inline  │ │        │ │ confirm  │             |
|   │         │ │ confirm │ │        │ │          │             |
|   └─────────┘ └─────────┘ └────────┘ └──────────┘             |
|       │                                                          |
|       v                                                          |
|   ┌─────────────────────────────────────────────┐               |
|   │ 6. CONFIRMATION (fallback, rarely used)      │               |
|   │    Most flows handle confirmation inline     │               |
|   └─────────────────────────────────────────────┘               |
|                                                                  |
|   Silent Handoffs: NEVER mention "transferring" or agent names  |
|   Caller ID: Auto-detected from Telnyx metadata (server-side)   |
|   Date Awareness: Today's date injected into all prompts        |
|   Languages: English, Mandarin, Cantonese, French, Punjabi      |
|                                                                  |
+------------------------------------------------------------------+

v2.2 Tool Distribution¶

Agent	Tools	Key Behaviors
Router	`search_patient_by_phone`, `transferToHuman`	Server overrides phone arg with real Telnyx caller ID
Booking	`find_earliest_appointment`, `create_appointment`, `get_providers`	Finds slot immediately with no filters; refines on request only
Reschedule	`get_appointments`, `find_earliest_appointment`, `update_appointment`	Inline confirmation (no handoff)
Cancel	`get_appointments`, `cancel_appointment`	Inline confirmation with reason capture
Registration	`register_new_patient`, `add_to_waitlist`	Inline confirmation (no handoff)
Confirmation	`confirm_appointment`	Fallback only — most flows confirm inline

v2.2 Key Design Decisions¶

Decision	Rationale
Caller phone auto-detection	LLM hallucinates phone numbers. Server extracts real number from `call.customer.number` (Telnyx metadata), ignoring whatever the LLM sends.
Booking-first flow	Patient calls to book → immediately find earliest slot with ANY provider. Only apply filters (doctor, date, time-of-day) if patient requests changes.
Single-slot returns	`find_earliest_appointment` returns exactly 1 slot. If rejected, patient provides reason → `excludeDates` grows → search again. Prevents decision paralysis.
Inline confirmation	Reschedule, cancel, and registration handle confirmation themselves ("Is there anything else?"). Eliminates unnecessary transfer to confirmation agent.
Silent transfers	All prompts include "NEVER mention transferring, assistant names, or internal routing." Patient perceives one continuous conversation.
Past-date clamping	GPT-4o doesn't know today's date. Server clamps any `startDate` before today to today. All prompts also include `Today's date is YYYY-MM-DD`.
Provider name fuzzy matching	Patient says "Dr. Chen" → server strips "Dr." prefix, fuzzy-matches against provider list, resolves to provider ID.

v3.0 Dual-Track Architecture (Deployed)¶

+------------------------------------------------------------------+
|                                                                  |
|   CALL START (Telnyx inbound)                                    |
|       |                                                          |
|       v                                                          |
|   +--------------------------------------------------+          |
|   | 1. ROUTER (vitara-router-v3)                      |          |
|   |    STT: AssemblyAI Universal (bilingual)          |          |
|   |    ONLY JOB: Detect language, route to track      |          |
|   +---------------------+----------------------------+          |
|                         |                                        |
|             +-----------+-----------+                            |
|             v                       v                            |
|       ENGLISH TRACK            CHINESE TRACK                     |
|             |                       |                            |
|             v                       v                            |
|   +-------------------+   +-------------------+                  |
|   | PATIENT-ID-EN     |   | PATIENT-ID-ZH     |                  |
|   | STT: Deepgram en  |   | STT: Deepgram zh  |                  |
|   | TTS: ElevenLabs   |   | TTS: Azure Xiaoxiao|                 |
|   +--------+----------+   +--------+----------+                  |
|            |                        |                            |
|     +------+------+         +------+------+                      |
|     v      v      v         v      v      v                      |
|   +----+ +----+ +-----+  +----+ +----+ +-----+                  |
|   |BOOK| |MOD | |REG  |  |BOOK| |MOD | |REG  |                  |
|   | EN | | EN | | EN  |  | ZH | | ZH | | ZH  |                  |
|   +----+ +----+ +-----+  +----+ +----+ +-----+                  |
|                                                                  |
|   Config: Vapi GitOps (slug-based, environment separation)       |
|   Squad ID: 13fdfd19-a2cd-4ca4-8e14-ad2275095e32               |
|   Handoffs: 20 routes (see voice-agent.md for full matrix)      |
|                                                                  |
+------------------------------------------------------------------+

v3.0 Key Design Decisions¶

Decision	Rationale
Explicit language gate	GPT-4o with auto-detect multilingual sometimes output space-separated Chinese. Dedicated tracks eliminate LLM language confusion.
Per-language STT	Deepgram nova-2 `en`/`zh` outperforms universal `multi` mode for each individual language. AssemblyAI Universal on Router handles the bilingual detection.
Azure TTS for Chinese	ElevenLabs `eleven_multilingual_v2` produces adequate but not native-quality Mandarin. Azure `zh-CN-XiaoxiaoNeural` is purpose-built for CJK.
Consolidated Modification	Reschedule and Cancel share 6 of 8 tools. A single Modification agent per track is simpler than two separate agents with near-identical tool sets.
Confirmation eliminated	Every role agent now calls `log_call_metadata` directly. No need for a handoff to a separate confirmation agent.
Tuned ZH endpointing	Mandarin speech uses longer pauses between phrases. Standard EN timing (0.3s punct, 0.8s no-punct) caused premature cutoffs. ZH uses 0.6s/1.5s.
Router as pure language gate	Router temperature 0.3, max 150 tokens. Does NOT look up patients or detect intent — just language. Minimizes latency at the entry point.
Vapi GitOps	Config-as-code replaces manual dashboard editing. Slug-based tool references, environment separation (dev/staging/prod).

v3.0 vs v2.3.0 Trade-offs¶

Factor	v2.3.0 (6 agents)	v3.0 (9 agents)
Agent count	6	9 (+50%)
Language accuracy	Good (auto-detect)	Better (per-language STT)
Chinese voice quality	Adequate (ElevenLabs)	Native (Azure)
Mid-call language switch	Seamless	Stays in initial track
Config management	Manual dashboard	GitOps (repeatable)
Prompt complexity	Higher (multilingual)	Lower (monolingual per agent)
Handoff routes	8	20
LLM language confusion	Occasional	Eliminated

v1.0 Architecture (Deprecated)¶

Single Agent Per Clinic (v1.0)¶

+------------------------------------------------------------------+
|                                                                  |
|   Clinic Phone Number (+1-604-555-1234)                          |
|           |                                                      |
|           v                                                      |
|   +------------------------------------------------------------------+
|   |                                                                  |
|   |              SINGLE MULTILINGUAL ASSISTANT                       |
|   |                                                                  |
|   |   Handles:                                                       |
|   |     - New patient registration                                   |
|   |     - Appointment booking                                        |
|   |     - Appointment update/cancellation                            |
|   |                                                                  |
|   |   Languages: English, Mandarin (auto-detect)                     |
|   |                                                                  |
|   +------------------------------------------------------------------+
|                                                                  |
|   Total Assistants: 1 per clinic                                 |
|   Total for 5 pilot clinics: 5 assistants                        |
|                                                                  |
+------------------------------------------------------------------+

Implementation Guidelines (v2.0)¶

System Prompt Structure¶

Each agent prompt follows this structure:

## IDENTITY
[Who the agent is]

## LANGUAGE HANDLING
**Language shift can happen at any point...**
**Languages:** English and Mandarin Chinese (普通话)
**Current Time:** {{now | date: "%B %d, %Y %I:%M %p", "America/Vancouver"}}

## STYLE
[Tone and communication guidelines]

## TASK & GOALS
[Step-by-step workflow]

## ERROR HANDLING
[Recovery flows]

Database Schema¶

-- Squad ID per clinic (v2.0)
CREATE TABLE clinics (
  id UUID PRIMARY KEY,
  vapi_squad_id VARCHAR(100),      -- Squad ID (not assistant)
  vapi_phone_number VARCHAR(20),
  ...
);

Consequences¶

v2.3.0¶

Positive:

Caller auto-identification via Telnyx eliminates manual phone entry
Booking-first flow reduces average call duration
Specialized prompts per workflow (cleaner, easier to tune)
Inline confirmations reduce transfers and perceived latency
Silent handoffs make it feel like one continuous conversation
Server-side guardrails (date clamping, phone override) compensate for LLM weaknesses

Negative:

6x more assistants to manage per clinic
More complex squad configuration on Vapi dashboard
Context passing between agents required (Vapi handles via assistantOverrides)
Prompt updates require API calls to all 6 assistants

v3.0 (Additional)¶