Skip to content

v3.0 Squad Architecture

9-agent dual-track bilingual squad with all P0/P1/P2 fixes applied

Last Updated: 2026-03-09 (v4.3.0 — SMS consent, firstMessage corrections from YAML source)


Overview

Setting Value
Architecture 9-Agent Dual-Track (Router + 4 roles x 2 languages)
Squad ID 13fdfd19-a2cd-4ca4-8e14-ad2275095e32
Phone +1 236-305-7446
LLM OpenAI GPT-4o (all 9 assistants)
Temperature 0.3 (Router), 0.5 (all others)
Max Tokens 400 (Router -- P0 fix), 200 (standard), 250 (Registration)
Handoff Mode Silent (no announcement)
Config Management Vapi GitOps (config-as-code)
GitOps Path vitara-platform/vapi-gitops/
Webhook https://api-dev.vitaravox.ca/api/vapi/{tool-slug} (each tool has its own URL)
firstMessage "Hi, thanks for calling VV Health! English or Mandarin?" (hardcoded in squad YAML, played by Vapi before LLM runs)

Router maxTokens

The Router maxTokens was increased from 150 to 400 as a P0 fix. GPT-4o tool-call JSON alone consumes 80-120 tokens; the original 150-token limit caused silent truncation and broken handoffs.


Squad Members

# Agent Vapi Name ID Role Tools
1 Router vitara-router-v3 4f70e214-... Language gate: detect EN/ZH keywords, route to correct track get_clinic_info, transfer_call, log_call_metadata
2 Patient-ID (EN) vitara-patient-id-en-v3 7d054785-... Identify caller by phone, detect intent, route search_patient_by_phone, search_patient, get_clinic_info, transfer_call
3 Patient-ID (ZH) vitara-patient-id-zh-v3 7585c092-... Same as EN but in Mandarin Same as EN
4 Booking (EN) vitara-booking-en-v3 ac25775b-... Find slots, book appointments find_earliest_appointment, check_appointments, create_appointment, get_providers, log_call_metadata, transfer_call
5 Booking (ZH) vitara-booking-zh-v3 6ef04a40-... Same as EN but in Mandarin Same as EN
6 Modification (EN) vitara-modification-en-v3 9cd8381d-... Reschedule, cancel, check appointments check_appointments, find_earliest_appointment, update_appointment, cancel_appointment, create_appointment, get_providers, log_call_metadata, transfer_call
7 Modification (ZH) vitara-modification-zh-v3 e348cd2f-... Same as EN but in Mandarin Same as EN
8 Registration (EN) vitara-registration-en-v3 9fcfd00d-... Register new patients register_new_patient, add_to_waitlist, log_call_metadata, transfer_call
9 Registration (ZH) vitara-registration-zh-v3 ce50df43-... Same as EN but in Mandarin Same as EN
Full Vapi UUIDs
Agent Full UUID
Router 4f70e214-6111-4f53-86c9-48f8f7c265e1
Patient-ID-EN 7d054785-9074-4856-81db-9fe44da47bc5
Patient-ID-ZH 7585c092-f8b3-4bdd-95ba-d41d71a54101
Booking-EN ac25775b-c1cc-41ae-8899-810d4ae62efd
Booking-ZH 6ef04a40-6764-4d4e-b2e0-73045b288611
Modification-EN 9cd8381d-9501-4c9a-a92d-ce185f49e50d
Modification-ZH e348cd2f-f3d8-4b9a-ac35-7725c767287f
Registration-EN 9fcfd00d-1493-4041-9214-36159eba4511
Registration-ZH ce50df43-7c3a-45f7-b121-8772adaa0eff

Call Flow Diagram

CALL START (Telnyx inbound)
    |
    v
+--------------------------------------------------+
| 1. ROUTER (vitara-router-v3)                      |
|    STT: AssemblyAI Universal (bilingual)          |
|    TTS: ElevenLabs multilingual_v2                |
|    firstMessage: "Hi, thanks for calling VV Health!|
|      English or Mandarin?"                         |
|      (hardcoded in squad YAML, plays before LLM)  |
|                                                    |
|    - LLM calls get_clinic_info (mandatory 1st turn)|
|    - Greets with clinic info (customGreeting)      |
|    - Language detection: KEYWORD-BASED             |
|      Default = English. Chinese only if caller     |
|      says "Mandarin/Chinese/中文" or garbled text  |
|    - Routes to language track                      |
+---------------------+----------------------------+
                      |
          +-----------+-----------+
          v                       v
    ENGLISH TRACK            CHINESE TRACK
          |                       |
          v                       v
+-------------------+   +-------------------+
| 2. PATIENT-ID-EN  |   | 2. PATIENT-ID-ZH  |
|  STT: Deepgram    |   |  STT: Deepgram    |
|       nova-2 en   |   |       nova-2 zh   |
|  TTS: ElevenLabs  |   |  TTS: Azure       |
|                   |   |    XiaoxiaoNeural  |
| - search by phone |   | - search by phone |
|   (server subs    |   |   (server subs    |
|    real number)   |   |    real number)   |
| - detect intent   |   | - detect intent   |
| - route to role   |   | - route to role   |
+--------+----------+   +--------+----------+
         |                        |
  +------+------+         +------+------+
  v      v      v         v      v      v
+----+ +----+ +-----+  +----+ +----+ +-----+
|BOOK| |MOD | |REG  |  |BOOK| |MOD | |REG  |
| EN | | EN | | EN  |  | ZH | | ZH | | ZH  |
+----+ +----+ +-----+  +----+ +----+ +-----+

Handoff Matrix (18 Squad Routes)

These are the actual handoff tool definitions from the squad YAML (vitaravox-v3.yml). Each is a Vapi handoff tool type configured in assistantOverrides.tools:append.

From To Handoff Tool Trigger
Router Patient-ID-EN handoff_to_patient_id_en Caller speaks English (default)
Router Patient-ID-ZH handoff_to_patient_id_zh Caller requests Chinese/Mandarin
Patient-ID-EN Booking-EN handoff_to_booking_en Patient found + intent = book
Patient-ID-EN Modification-EN handoff_to_modification_en Patient found + intent = reschedule/cancel/check
Patient-ID-EN Registration-EN handoff_to_registration_en Patient not found + new patient
Patient-ID-ZH Booking-ZH handoff_to_booking_zh Patient found + intent = book
Patient-ID-ZH Modification-ZH handoff_to_modification_zh Patient found + intent = reschedule/cancel/check
Patient-ID-ZH Registration-ZH handoff_to_registration_zh Patient not found + new patient
Booking-EN Modification-EN handoff_to_modification_en Patient says "reschedule" or "cancel"
Booking-EN Router handoff_to_router_v3 Unrelated request
Booking-ZH Modification-ZH handoff_to_modification_zh Patient says "reschedule" or "cancel"
Booking-ZH Router handoff_to_router_v3 Unrelated request
Modification-EN Booking-EN handoff_to_booking_en After cancel, patient wants to rebook
Modification-EN Router handoff_to_router_v3 Unrelated request
Modification-ZH Booking-ZH handoff_to_booking_zh After cancel, patient wants to rebook
Modification-ZH Router handoff_to_router_v3 Unrelated request
Registration-EN Booking-EN handoff_to_booking_en After registration, patient wants first appointment
Registration-ZH Booking-ZH handoff_to_booking_zh After registration, patient wants first appointment

Additionally, agents with the transfer_call tool (Router, Patient-ID EN/ZH, Booking EN/ZH, Modification EN/ZH, Registration EN/ZH) can transfer to clinic staff. This triggers a transfer-destination-request webhook, not a squad handoff.

firstMessage Overrides (from Squad YAML)

Squad YAML can override each member's firstMessage. These play immediately on handoff, before the LLM generates any text:

Agent firstMessage Override Source
Router "Hi, thanks for calling VV Health! English or Mandarin?" vitaravox-v3.yml:6
Patient-ID-EN "Are you a returning patient, or is this your first visit?" vitaravox-v3.yml:38
Patient-ID-ZH "请问您是老患者还是第一次来?" vitaravox-v3.yml:245
Booking-EN "Let me check what's available for you." vitaravox-v3.yml:105
Booking-ZH "我帮您查一下可以的时间。" vitaravox-v3.yml:324
Modification-EN (none — LLM generates)
Modification-ZH (none — LLM generates)
Registration-EN (none — LLM generates)
Registration-ZH (none — LLM generates)

Handoff Tool Names

All prompts use handoff_to_X function names (e.g., handoff_to_booking_en, handoff_to_modification_zh), not transferAssistant. This was a P0 fix -- the original prompts referenced transferAssistant which does not exist as a tool function name.


Voice & Transcription Configuration

Router

Setting Value
STT Provider AssemblyAI
STT Mode Universal (bilingual auto-detect)
TTS Provider ElevenLabs
TTS Voice ID fQj4gJSexpu8RDE2Ii5m
TTS Model eleven_multilingual_v2

AssemblyAI Schema

AssemblyAI transcriber in Vapi has NO endpointing, languageDetection, or model fields. These are Deepgram-specific. Sending a model property to AssemblyAI causes a 400 error.

English Track

Setting Value
STT Provider Deepgram
STT Model nova-2
STT Language en
TTS Provider ElevenLabs
TTS Voice ID fQj4gJSexpu8RDE2Ii5m
TTS Model eleven_multilingual_v2
Stability 0.5
Similarity Boost 0.7

Endpointing (EN standard agents):

Setting Value
Wait Seconds 0.6
On Punctuation 0.3s
On No Punctuation 0.8s
On Number 0.5s
Stop on Words 2

Endpointing (EN registration -- longer pauses for spelling):

Setting Value
Wait Seconds 1.6
On Punctuation 1.0s
On No Punctuation 2.5s
On Number 1.2s
Stop on Words 2

Chinese Track

Setting Value
STT Provider Deepgram
STT Model nova-2
STT Language zh
TTS Provider Azure
TTS Voice ID zh-CN-XiaoxiaoNeural

TTS Choice

ElevenLabs eleven_turbo_v2_5 is English-only. For CJK languages, eleven_multilingual_v2 or Azure is required. The ZH track uses Azure XiaoxiaoNeural for native Mandarin quality.

Endpointing (ZH standard agents -- tuned for Mandarin speech patterns):

Setting Value
Wait Seconds 1.0
On Punctuation 0.6s
On No Punctuation 1.5s
On Number 0.8s
Stop on Words 3

Endpointing (ZH registration -- longer pauses for spelling):

Setting Value
Wait Seconds 1.6
On Punctuation 1.0s
On No Punctuation 2.5s
On Number 1.2s
Stop on Words 2

Design Changes from v2.3.0

Change v2.3.0 v3.0 Rationale
Language handling Single multilingual agent Explicit Router language gate (keyword-based) Eliminates LLM language confusion; dedicated STT/TTS per track
Patient identification Combined in Router Separate Patient-ID agent per track Cleaner prompt; Router stays lightweight
Reschedule + Cancel Two separate agents Single Modification agent per track Reduces squad complexity; both share same tools
Confirmation agent Exists (rarely used) Eliminated log_call_metadata absorbed into Booking/Modification/Registration
STT (English) Deepgram nova-2 multi Deepgram nova-2 en Language-specific = higher accuracy
STT (Chinese) Deepgram nova-2 multi Deepgram nova-2 zh Language-specific = higher accuracy
STT (Router) Deepgram nova-2 multi AssemblyAI Universal Bilingual detection before routing
TTS (English) ElevenLabs multilingual_v2 ElevenLabs multilingual_v2 No change
TTS (Chinese) ElevenLabs multilingual_v2 Azure zh-CN-XiaoxiaoNeural Native Mandarin voice quality
ZH Endpointing Same as EN Longer pauses (1.0s wait, 0.6s punct) Mandarin speech patterns need longer pauses
Config management Manual Vapi dashboard Vapi GitOps (config-as-code) Version control, reproducibility
Handoff routes 8 18 More granular routing between roles

Server-Side Middleware

Clinic Resolution (Actual Implementation)

Incoming webhooks resolve the clinic via this chain:

1. metadata.clinicId           (explicit override for testing)
2. call.phoneNumber.number     (Vapi phone number)
3. resolvePhoneFromVapiId()    (Telnyx BYO fallback: 1-hour cached API lookup)
4. findClinicByVapiPhone()     (SELECT id FROM clinic WHERE vapiPhone = $1)
CLINIC RESOLUTION CASCADE (webhook handler)
┌───────────────────────────────────────────────────────┐
│  1. metadata.clinicId (explicit override for testing) │
│     Found? ──► YES ──► Use this clinic                │
│              └─ NO                                    │
│                 ▼                                     │
│  2. call.phoneNumber.number (Vapi phone number)       │
│     Found? ──► YES ──► findClinicByVapiPhone()        │
│              └─ NO                                    │
│                 ▼                                     │
│  3. resolvePhoneFromVapiId() (Telnyx BYO fallback)    │
│     Calls GET api.vapi.ai/phone-number (1-hr cache)   │
│     Found? ──► YES ──► findClinicByVapiPhone()        │
│              └─ NO                                    │
│                 ▼                                     │
│  4. ERROR: Cannot resolve clinic                      │
│     Log warning, return error to Vapi                 │
└───────────────────────────────────────────────────────┘

Telnyx BYO Numbers

For Telnyx BYO phone numbers, call.phoneNumber.number is often empty. The server resolves via call.phoneNumberId using a cached lookup from GET https://api.vapi.ai/phone-number (1-hour TTL). This is critical for clinic resolution.

Call Metadata Cache

v3.0 introduced an in-memory callMetadataCache to bridge tool-call webhooks and end-of-call-report webhooks:

Tool call (log_call_metadata)    End-of-call-report
        |                               |
        v                               v
  setCallMetadata(callId, {       getCallMetadata(callId)
    language, outcome,              -> merges into saveCallLog
    demographicId, appointmentId  })
  • Why: Vapi's end-of-call-report does not include tool results. The cache lets the server persist metadata that was set during the call.
  • Fallback: create_appointment and register_new_patient also cache language and demographicId as a safety net if log_call_metadata is never called.
  • TTL: 30 minutes. Entries auto-cleaned probabilistically (1% per call).
CALL METADATA CACHE -- Timeline

Time ─────────────────────────────────────────────────────►

  ┌──────────────────────────────────────────────────────┐
  │ DURING CALL                                          │
  │                                                      │
  │ Tool call: log_call_metadata                         │
  │   ┌─────────────────────────────────┐                │
  │   │ setCallMetadata(callId, {       │                │
  │   │   language: "en",               │                │
  │   │   callOutcome: "booked",        │                │
  │   │   demographicId: 123,           │                │
  │   │   appointmentId: 456            │                │
  │   │ })                              │                │
  │   └────────────┬────────────────────┘                │
  │                │ Stored in memory Map                │
  │                ▼                                     │
  │ SAFETY NET: create_appointment / register also       │
  │ caches language + demographicId as backup            │
  ├──────────────────────────────────────────────────────┤
  │ AFTER CALL ENDS                                      │
  │                                                      │
  │ Vapi sends end-of-call-report webhook                │
  │   ┌─────────────────────────────────┐                │
  │   │ getCallMetadata(callId)         │                │
  │   │   → merge into saveCallLog()   │                │
  │   │   → persist to PostgreSQL      │                │
  │   └─────────────────────────────────┘                │
  │                                                      │
  │ TTL: 30 min (1% probabilistic cleanup per call)      │
  └──────────────────────────────────────────────────────┘

Tool Name Handling

The webhook handler accepts both snake_case (canonical) and camelCase (Vapi/legacy) names for each tool:

Canonical Name Also Accepted
search_patient searchPatient
search_patient_by_phone (snake_case only)
create_appointment bookAppointment
cancel_appointment cancelAppointment
update_appointment (snake_case only)
get_providers getProviders
get_clinic_info getClinicSettings
log_call_metadata logCallSummary
transfer_call transferToHuman
register_new_patient registerPatient
add_to_waitlist addToWaitlist
check_appointments getPatientAppointments
find_earliest_appointment findEarliestAppointment

P0/P1/P2 Fixes Applied (2026-02-15 / 2026-02-16)

P0 Fixes (Critical -- Applied 2026-02-15)

Fix Before After Impact
Router maxTokens 150 400 GPT-4o tool-call JSON = 80-120 tokens; 150 caused silent truncation
Router prompt rewrite Rigid "Say EXACTLY 'One moment please'" Warm acknowledgment + get_clinic_info for greeting Clinic-agnostic, natural tone
Patient-ID steps merged Steps 1+2 separate (greet then tool call) Single first-turn tool call + intent analysis Eliminates redundant turn
Defensive tool-result instruction None "WAIT for actual tool result before speaking about the patient" Prevents hallucinated patient names
transferAssistant fix All 8 non-Router prompts used transferAssistant Changed to handoff_to_X (actual function names) Handoffs actually work
Circuit breaker timeout 10s 4s Under Vapi's 5s tool timeout (both SOAP and bridge)
Clinic-agnostic prompts "Vitara" hardcoded in Patient-ID EN/ZH All references removed Multi-clinic ready

P1/P2 Fixes (Applied 2026-02-16)

Fix Details
request-start messages Added to all 14 tool YAMLs (3 audible, 11 silent)
Patient-ID firstMessage removed Removed from EN/ZH squad members (was causing 16s silence)
Prompt alongside first-turn tool calls Router/Patient-ID/Booking/Modification say brief phrase alongside tool call
FILLER PHRASE RULES deleted Removed from Booking + Modification EN/ZH prompts (Registration agents retain theirs)
Tool-level request-start Replaces LLM-generated filler phrases for Booking and Modification flows

Filler Strategy Change

In v4.0.1, filler phrases for Booking and Modification flows are handled at the tool level via Vapi request-start messages. Registration agents still use prompt-level FILLER PHRASE RULES because the registration flow has more varied tool-call patterns.


Known Issues

GPT-4o Chinese Character Spacing

GPT-4o occasionally outputs space-separated Chinese characters (e.g., "您 好" instead of "您好"). This is a known GPT-4o behavior. Monitor in the ZH track; may need a post-launch LLM swap if it becomes frequent.

CONVERSATION STYLE Sections — Verified

All 4 ZH prompts confirmed to have ## 对话风格 sections. No action needed (2026-03-04).

transfer_call Tool Gaps — Resolved

transfer_call added to Patient-ID EN/ZH, Booking EN/ZH, Registration EN/ZH (2026-03-04). All role agents now have escalation path to clinic staff.

handoff_to_router_v3 — Added

handoff_to_router_v3 added to Patient-ID-ZH and Registration EN/ZH in squad YAML (2026-03-04).

get_clinic_info Does Not Return clinicName

The Router prompt references [clinicName] from get_clinic_info, but the tool handler does not return a clinicName field. It returns customGreeting and customGreetingZh instead. The Router must use these fields or the LLM receives no clinic name.

SOAP Client Warm Start — Implemented

EmrAdapterFactory now awaits warmUp for preferRest adapters on creation. PM2 startup triggers warmUp via IIFE in index.ts (2026-03-04).


Global Behaviors (All Agents)

  • Emergency detection: Both EN and ZH keywords trigger 911 message in caller's language
  • Silent transfers: No agent ever mentions "transferring", "assistant", or system internals
  • Date awareness: All prompts include {{now | date: ...}} template for current date/time
  • Past-date clamping: Server rejects appointment dates in the past and clamps to today (added 2026-03-04)
  • Phone auto-detection: Server extracts real caller phone from call.customer.number; LLM sends "0000000000" as placeholder
  • One question at a time: Never overwhelm the caller
  • Delay handling: Tool-level request-start messages handle audible feedback during tool calls (3 audible, 11 silent)
  • Never say technical terms: No "function", "tool", "API", "database"
  • SMS consent: Patient-ID agents disclose SMS confirmations; Booking/Modification pass smsConsent param and check smsSent response. See SMS Integration

Appointment Type Mapping

Patient Says (EN) Patient Says (ZH) Code Meaning
"checkup", "general", "physical", "not sure" "体检", "普通检查", "不确定" B General visit
"follow-up", "results", "check results" "复查", "看结果" 2 Follow-up
"pain", "illness", "specific complaint" "不舒服", "疼痛", "生病" 3 Specific concern
"prescription refill", "medication" "配药", "续药" P Prescription

Server-side validation: appointmentType must be one of ['B', '2', '3', 'P'] (or DB-configured values). Invalid values default to 'B'.