Skip to content

ADR: Vapi.ai Integration Architecture

HISTORICAL — Architecture evolved through v1.0 → v3.0

This document traces the evolution of VitaraVox's voice architecture. The current production system is v3.0 (9-agent dual-track). See Squad Architecture for the current state.

Date: January 2026 (v1.0), Updated February 2026 (v2.0, v3.0) Status: Updated Decision: v3.0 adopts 9-agent dual-track architecture with explicit language gate and per-language STT/TTS


Context

VitaraPlatform supports these use cases:

  1. New patient registration
  2. Appointment booking
  3. Appointment update/cancellation
  4. Patient identification (v2.0)
  5. Clinic settings check (v2.0)
  6. Bilingual EN/ZH service (v3.0)

Question: Should each use case have a separate specialized agent, or should one agent handle all use cases? And how should multiple languages be handled?


Decision History

v1.0 Decision (January 2026)

Use ONE agent per clinic that handles all three use cases. Each agent is multilingual (English + Mandarin).

Rejected: Squad of 9 specialized agents (3 use cases × 3 languages).

v2.0 / v2.3.0 Decision (February 2026)

Adopt 6-agent squad architecture with specialized roles:

  1. Squad Leader Router - Greeting, caller phone auto-detection, clinic check, emergency/frustration detection
  2. Appointment Booking - Immediate slot finding (no filters), preference refinement
  3. Appointment Reschedule - Find existing appointments, reschedule with inline confirmation
  4. Appointment Cancel - Cancel with reason capture, inline confirmation
  5. Patient Registration - Register new patients with phonetic confirmation, inline confirmation
  6. Confirmation - Fallback confirmation agent (rarely used; most flows confirm inline)

Languages: English, Mandarin, Cantonese, French, Punjabi with automatic detection and mid-conversation switching.

Reason for change: Patient identification merged into Router (auto-detect via Telnyx caller ID). Booking/reschedule/cancel split into separate agents for cleaner prompts. Inline confirmations eliminate unnecessary transfers. v2.2 adds caller phone auto-detection, past-date clamping, provider name fuzzy matching, and booking-first flow (find slot immediately, refine only on request).

v3.0 Decision (February 2026)

Adopt 9-agent dual-track architecture with explicit language gate:

  1. Router - Language gate ONLY: detect EN/ZH from first utterance, route to correct track
  2. Patient-ID (EN) - Identify caller by phone, detect intent, route within EN track
  3. Patient-ID (ZH) - Same role in Mandarin
  4. Booking (EN/ZH) - Find slots, book appointments
  5. Modification (EN/ZH) - Consolidated reschedule + cancel + check
  6. Registration (EN/ZH) - Register new patients

Key changes from v2.3.0:

  • Confirmation agent eliminatedlog_call_metadata absorbed into every role agent
  • Reschedule + Cancel consolidated into single Modification agent per track
  • Router simplified to pure language gate (no patient lookup, no intent detection)
  • Patient-ID separated from Router into dedicated agent per language
  • Per-language STT: Deepgram nova-2 en / zh (vs universal multi)
  • Per-language TTS: ElevenLabs (EN) / Azure XiaoxiaoNeural (ZH)
  • ZH endpointing tuned: Longer pauses for Mandarin speech patterns (1.0s wait vs 0.6s)
  • Config-as-code: Managed via Vapi GitOps instead of manual dashboard

Reason for change: Auto-detect multilingual agents struggled with LLM language confusion (GPT-4o sometimes output space-separated Chinese characters). Dedicated language tracks with per-language STT/TTS provide higher accuracy. The Router is now a lightweight language gate (<150 tokens), keeping latency minimal. Consolidating Reschedule+Cancel into Modification reduces the number of unique agent roles from 5 to 4 while adding only 1 more agent total (9 vs 6) due to the dual-track split.


Architecture Comparison

Approved: Single Agent Per Clinic

+------------------------------------------------------------------+
|                                                                  |
|   Clinic Phone Number (+1-604-555-1234)                          |
|           |                                                      |
|           v                                                      |
|   +------------------------------------------------------------------+
|   |                                                                  |
|   |              SINGLE MULTILINGUAL ASSISTANT                       |
|   |                                                                  |
|   |   Handles:                                                       |
|   |     - New patient registration                                   |
|   |     - Appointment booking                                        |
|   |     - Appointment update/cancellation                            |
|   |                                                                  |
|   |   Languages: English, Mandarin (auto-detect)                     |
|   |                                                                  |
|   +------------------------------------------------------------------+
|                                                                  |
|   Total Assistants: 1 per clinic                                 |
|   Total for 5 pilot clinics: 5 assistants                        |
|                                                                  |
+------------------------------------------------------------------+

Rejected: Squad of Specialized Agents

+------------------------------------------------------------------+
|                                                                  |
|   Clinic Phone Number (+1-604-555-1234)                          |
|           |                                                      |
|           v                                                      |
|   IVR: "Press 1 for English, 2 for Chinese, 3 for French"        |
|           |                                                      |
|           v                                                      |
|   IVR: "Press 1 to register, 2 to book, 3 to update/cancel"      |
|           |                                                      |
|           +----+----+----+----+----+----+----+----+               |
|           v    v    v    v    v    v    v    v    v               |
|   +-------+  +---+  +---+  +---+  +---+  +---+  +---+  +---+  +---+
|   | EN    |  | EN |  | EN |  | FR |  | FR |  | FR |  | ZH |  | ZH |
|   | Reg   |  |Book|  | Upd|  |Reg |  |Book|  |Upd |  |Reg |  |Book|
|   +-------+  +----+  +----+  +----+  +----+  +----+  +----+  +----+
|                                                                  |
|   Total Assistants: 9 per clinic                                 |
|   Total for 5 pilot clinics: 45 assistants                       |
|                                                                  |
+------------------------------------------------------------------+

Rationale

1. Latency (Critical)

Target: <800ms end-to-end voice latency

Approach Latency Impact
Single agent Optimal - no transfers
Squad approach +200-500ms per agent transfer

If a patient says "I want to book, but also update my contact info" during booking, the squad approach requires mid-call transfer, adding latency and risking context loss.

2. Configuration Complexity

v1.0 involves manual Vapi.ai configuration:

Approach Assistants Setup Time
Single agent 5 (5 clinics × 1) 4-5 hours
Squad approach 45 (5 clinics × 9) 12-15 hours

v1.0 pilot savings: ~$800-1000 in configuration time.

3. User Experience

Single Agent:

Agent: "Hello! I can help with registration, booking, or
        updating appointments. What would you like to do?"
Patient: "I'd like to book an appointment"
Agent: "Great! Let me check availability..."
Patient: "Actually, can you also update my phone number?"
Agent: "Of course! What's your new phone number?"

Squad Approach:

IVR: "Press 1 for English..."
IVR: "Press 2 to book appointments..."
Booking Agent: "I'll help you book an appointment..."
Patient: "Can you also update my phone number?"
Agent: "Let me transfer you to our update agent..."
[200-500ms silence, context lost]
Update Agent: "Hello! What would you like to update?"
Patient: [repeats everything]

4. Technical Architecture

All 3 use cases share:

  • Same OSCAR EMR API connection
  • Same clinic business hours logic
  • Same handoff phone number
  • Same webhook handler
// Single webhook handles all use cases
switch (functionName) {
  case 'register_patient':
    return registerPatient(clinic, args);
  case 'book_appointment':
    return bookAppointment(clinic, args);
  case 'update_appointment':
    return updateAppointment(clinic, args);
}

No architectural benefit to separating use cases into different agents.

5. System Prompt Complexity

Modern LLMs (GPT-4) handle multi-intent prompts easily:

Approach Total Prompt Lines
Single agent ~750 lines (all workflows)
Squad (3 agents) ~650 lines (200+250+200)

Marginal difference, but squad adds transfer logic complexity.


When Squad Approach Makes Sense

Squad of specialized agents is appropriate when:

Criteria VitaraVox v1.0
Vastly different complexity No - all similar
Different knowledge bases No - same OSCAR data
Different LLM models No - all GPT-4
Compliance/legal separation No
Distinct user personas No - all patient-facing

None of these criteria apply to v1.0.


v2.2 Squad Architecture (Historical)

+------------------------------------------------------------------+
|                                                                  |
|   CALL START (Telnyx inbound)                                    |
|       │                                                          |
|       v                                                          |
|   ┌─────────────────────────────────────────────┐               |
|   │ 1. ROUTER (vitara-router-v2)                 │               |
|   │ • Auto-detect caller via Telnyx phone number │               |
|   │ • search_patient_by_phone (server extracts   │               |
|   │   real phone from call.customer.number)      │               |
|   │ • Emergency detection, frustration detection │               |
|   │ • Routes to appropriate specialist           │               |
|   └─────────────┬───────────────────────────────┘               |
|                 │                                                |
|       ┌─────────┼─────────┬──────────┐                          |
|       v         v         v          v                           |
|   ┌─────────┐ ┌─────────┐ ┌────────┐ ┌──────────┐             |
|   │2.BOOKING│ │3.RESCHED│ │4.CANCEL│ │5.REGISTER│             |
|   │         │ │         │ │        │ │          │             |
|   │ Finds   │ │ Gets    │ │ Finds  │ │ Collects │             |
|   │ earliest│ │ patient │ │ appts, │ │ info,    │             |
|   │ slot    │ │ appts,  │ │ cancels│ │ registers│             |
|   │ immedi- │ │ finds   │ │ with   │ │ in OSCAR │             |
|   │ ately   │ │ new slot│ │ inline │ │ with     │             |
|   │ (no     │ │ with    │ │ confirm│ │ inline   │             |
|   │ filters)│ │ inline  │ │        │ │ confirm  │             |
|   │         │ │ confirm │ │        │ │          │             |
|   └─────────┘ └─────────┘ └────────┘ └──────────┘             |
|       │                                                          |
|       v                                                          |
|   ┌─────────────────────────────────────────────┐               |
|   │ 6. CONFIRMATION (fallback, rarely used)      │               |
|   │    Most flows handle confirmation inline     │               |
|   └─────────────────────────────────────────────┘               |
|                                                                  |
|   Silent Handoffs: NEVER mention "transferring" or agent names  |
|   Caller ID: Auto-detected from Telnyx metadata (server-side)   |
|   Date Awareness: Today's date injected into all prompts        |
|   Languages: English, Mandarin, Cantonese, French, Punjabi      |
|                                                                  |
+------------------------------------------------------------------+

v2.2 Tool Distribution

Agent Tools Key Behaviors
Router search_patient_by_phone, transferToHuman Server overrides phone arg with real Telnyx caller ID
Booking find_earliest_appointment, create_appointment, get_providers Finds slot immediately with no filters; refines on request only
Reschedule get_appointments, find_earliest_appointment, update_appointment Inline confirmation (no handoff)
Cancel get_appointments, cancel_appointment Inline confirmation with reason capture
Registration register_new_patient, add_to_waitlist Inline confirmation (no handoff)
Confirmation confirm_appointment Fallback only — most flows confirm inline

v2.2 Key Design Decisions

Decision Rationale
Caller phone auto-detection LLM hallucinates phone numbers. Server extracts real number from call.customer.number (Telnyx metadata), ignoring whatever the LLM sends.
Booking-first flow Patient calls to book → immediately find earliest slot with ANY provider. Only apply filters (doctor, date, time-of-day) if patient requests changes.
Single-slot returns find_earliest_appointment returns exactly 1 slot. If rejected, patient provides reason → excludeDates grows → search again. Prevents decision paralysis.
Inline confirmation Reschedule, cancel, and registration handle confirmation themselves ("Is there anything else?"). Eliminates unnecessary transfer to confirmation agent.
Silent transfers All prompts include "NEVER mention transferring, assistant names, or internal routing." Patient perceives one continuous conversation.
Past-date clamping GPT-4o doesn't know today's date. Server clamps any startDate before today to today. All prompts also include Today's date is YYYY-MM-DD.
Provider name fuzzy matching Patient says "Dr. Chen" → server strips "Dr." prefix, fuzzy-matches against provider list, resolves to provider ID.

v3.0 Dual-Track Architecture (Deployed)

+------------------------------------------------------------------+
|                                                                  |
|   CALL START (Telnyx inbound)                                    |
|       |                                                          |
|       v                                                          |
|   +--------------------------------------------------+          |
|   | 1. ROUTER (vitara-router-v3)                      |          |
|   |    STT: AssemblyAI Universal (bilingual)          |          |
|   |    ONLY JOB: Detect language, route to track      |          |
|   +---------------------+----------------------------+          |
|                         |                                        |
|             +-----------+-----------+                            |
|             v                       v                            |
|       ENGLISH TRACK            CHINESE TRACK                     |
|             |                       |                            |
|             v                       v                            |
|   +-------------------+   +-------------------+                  |
|   | PATIENT-ID-EN     |   | PATIENT-ID-ZH     |                  |
|   | STT: Deepgram en  |   | STT: Deepgram zh  |                  |
|   | TTS: ElevenLabs   |   | TTS: Azure Xiaoxiao|                 |
|   +--------+----------+   +--------+----------+                  |
|            |                        |                            |
|     +------+------+         +------+------+                      |
|     v      v      v         v      v      v                      |
|   +----+ +----+ +-----+  +----+ +----+ +-----+                  |
|   |BOOK| |MOD | |REG  |  |BOOK| |MOD | |REG  |                  |
|   | EN | | EN | | EN  |  | ZH | | ZH | | ZH  |                  |
|   +----+ +----+ +-----+  +----+ +----+ +-----+                  |
|                                                                  |
|   Config: Vapi GitOps (slug-based, environment separation)       |
|   Squad ID: 13fdfd19-a2cd-4ca4-8e14-ad2275095e32               |
|   Handoffs: 20 routes (see voice-agent.md for full matrix)      |
|                                                                  |
+------------------------------------------------------------------+

v3.0 Key Design Decisions

Decision Rationale
Explicit language gate GPT-4o with auto-detect multilingual sometimes output space-separated Chinese. Dedicated tracks eliminate LLM language confusion.
Per-language STT Deepgram nova-2 en/zh outperforms universal multi mode for each individual language. AssemblyAI Universal on Router handles the bilingual detection.
Azure TTS for Chinese ElevenLabs eleven_multilingual_v2 produces adequate but not native-quality Mandarin. Azure zh-CN-XiaoxiaoNeural is purpose-built for CJK.
Consolidated Modification Reschedule and Cancel share 6 of 8 tools. A single Modification agent per track is simpler than two separate agents with near-identical tool sets.
Confirmation eliminated Every role agent now calls log_call_metadata directly. No need for a handoff to a separate confirmation agent.
Tuned ZH endpointing Mandarin speech uses longer pauses between phrases. Standard EN timing (0.3s punct, 0.8s no-punct) caused premature cutoffs. ZH uses 0.6s/1.5s.
Router as pure language gate Router temperature 0.3, max 150 tokens. Does NOT look up patients or detect intent — just language. Minimizes latency at the entry point.
Vapi GitOps Config-as-code replaces manual dashboard editing. Slug-based tool references, environment separation (dev/staging/prod).

v3.0 vs v2.3.0 Trade-offs

Factor v2.3.0 (6 agents) v3.0 (9 agents)
Agent count 6 9 (+50%)
Language accuracy Good (auto-detect) Better (per-language STT)
Chinese voice quality Adequate (ElevenLabs) Native (Azure)
Mid-call language switch Seamless Stays in initial track
Config management Manual dashboard GitOps (repeatable)
Prompt complexity Higher (multilingual) Lower (monolingual per agent)
Handoff routes 8 20
LLM language confusion Occasional Eliminated

v1.0 Architecture (Deprecated)

Single Agent Per Clinic (v1.0)

+------------------------------------------------------------------+
|                                                                  |
|   Clinic Phone Number (+1-604-555-1234)                          |
|           |                                                      |
|           v                                                      |
|   +------------------------------------------------------------------+
|   |                                                                  |
|   |              SINGLE MULTILINGUAL ASSISTANT                       |
|   |                                                                  |
|   |   Handles:                                                       |
|   |     - New patient registration                                   |
|   |     - Appointment booking                                        |
|   |     - Appointment update/cancellation                            |
|   |                                                                  |
|   |   Languages: English, Mandarin (auto-detect)                     |
|   |                                                                  |
|   +------------------------------------------------------------------+
|                                                                  |
|   Total Assistants: 1 per clinic                                 |
|   Total for 5 pilot clinics: 5 assistants                        |
|                                                                  |
+------------------------------------------------------------------+

Implementation Guidelines (v2.0)

System Prompt Structure

Each agent prompt follows this structure:

## IDENTITY
[Who the agent is]

## LANGUAGE HANDLING
**Language shift can happen at any point...**
**Languages:** English and Mandarin Chinese (普通话)
**Current Time:** {{now | date: "%B %d, %Y %I:%M %p", "America/Vancouver"}}

## STYLE
[Tone and communication guidelines]

## TASK & GOALS
[Step-by-step workflow]

## ERROR HANDLING
[Recovery flows]

Database Schema

-- Squad ID per clinic (v2.0)
CREATE TABLE clinics (
  id UUID PRIMARY KEY,
  vapi_squad_id VARCHAR(100),      -- Squad ID (not assistant)
  vapi_phone_number VARCHAR(20),
  ...
);

Consequences

v2.3.0

Positive:

  • Caller auto-identification via Telnyx eliminates manual phone entry
  • Booking-first flow reduces average call duration
  • Specialized prompts per workflow (cleaner, easier to tune)
  • Inline confirmations reduce transfers and perceived latency
  • Silent handoffs make it feel like one continuous conversation
  • Server-side guardrails (date clamping, phone override) compensate for LLM weaknesses

Negative:

  • 6x more assistants to manage per clinic
  • More complex squad configuration on Vapi dashboard
  • Context passing between agents required (Vapi handles via assistantOverrides)
  • Prompt updates require API calls to all 6 assistants

v3.0 (Additional)

Positive:

  • Per-language STT/TTS eliminates LLM language confusion
  • Native Mandarin voice quality (Azure XiaoxiaoNeural)
  • Monolingual prompts are shorter and more focused
  • GitOps enables repeatable, version-controlled configuration
  • Consolidated Modification agent reduces unique role count
  • callMetadataCache ensures call logs always have language/outcome data

Negative:

  • 9 assistants per clinic (50% more than v2.3.0)
  • 20 handoff routes to configure and test
  • Mid-call language switching not supported (stays in initial track)
  • Two parallel prompts to maintain per role (EN + ZH)
  • Dual squad support requires vapi_squad_id_v3 DB column