v3.0 Squad Architecture¶
9-agent dual-track bilingual squad with all P0/P1/P2 fixes applied
Last Updated: 2026-03-09 (v4.3.0 — SMS consent, firstMessage corrections from YAML source)
Overview¶
| Setting | Value |
|---|---|
| Architecture | 9-Agent Dual-Track (Router + 4 roles x 2 languages) |
| Squad ID | 13fdfd19-a2cd-4ca4-8e14-ad2275095e32 |
| Phone | +1 236-305-7446 |
| LLM | OpenAI GPT-4o (all 9 assistants) |
| Temperature | 0.3 (Router), 0.5 (all others) |
| Max Tokens | 400 (Router -- P0 fix), 200 (standard), 250 (Registration) |
| Handoff Mode | Silent (no announcement) |
| Config Management | Vapi GitOps (config-as-code) |
| GitOps Path | vitara-platform/vapi-gitops/ |
| Webhook | https://api-dev.vitaravox.ca/api/vapi/{tool-slug} (each tool has its own URL) |
| firstMessage | "Hi, thanks for calling VV Health! English or Mandarin?" (hardcoded in squad YAML, played by Vapi before LLM runs) |
Router maxTokens
The Router maxTokens was increased from 150 to 400 as a P0 fix. GPT-4o tool-call JSON alone consumes 80-120 tokens; the original 150-token limit caused silent truncation and broken handoffs.
Squad Members¶
| # | Agent | Vapi Name | ID | Role | Tools |
|---|---|---|---|---|---|
| 1 | Router | vitara-router-v3 | 4f70e214-... |
Language gate: detect EN/ZH keywords, route to correct track | get_clinic_info, transfer_call, log_call_metadata |
| 2 | Patient-ID (EN) | vitara-patient-id-en-v3 | 7d054785-... |
Identify caller by phone, detect intent, route | search_patient_by_phone, search_patient, get_clinic_info, transfer_call |
| 3 | Patient-ID (ZH) | vitara-patient-id-zh-v3 | 7585c092-... |
Same as EN but in Mandarin | Same as EN |
| 4 | Booking (EN) | vitara-booking-en-v3 | ac25775b-... |
Find slots, book appointments | find_earliest_appointment, check_appointments, create_appointment, get_providers, log_call_metadata, transfer_call |
| 5 | Booking (ZH) | vitara-booking-zh-v3 | 6ef04a40-... |
Same as EN but in Mandarin | Same as EN |
| 6 | Modification (EN) | vitara-modification-en-v3 | 9cd8381d-... |
Reschedule, cancel, check appointments | check_appointments, find_earliest_appointment, update_appointment, cancel_appointment, create_appointment, get_providers, log_call_metadata, transfer_call |
| 7 | Modification (ZH) | vitara-modification-zh-v3 | e348cd2f-... |
Same as EN but in Mandarin | Same as EN |
| 8 | Registration (EN) | vitara-registration-en-v3 | 9fcfd00d-... |
Register new patients | register_new_patient, add_to_waitlist, log_call_metadata, transfer_call |
| 9 | Registration (ZH) | vitara-registration-zh-v3 | ce50df43-... |
Same as EN but in Mandarin | Same as EN |
Full Vapi UUIDs
| Agent | Full UUID |
|---|---|
| Router | 4f70e214-6111-4f53-86c9-48f8f7c265e1 |
| Patient-ID-EN | 7d054785-9074-4856-81db-9fe44da47bc5 |
| Patient-ID-ZH | 7585c092-f8b3-4bdd-95ba-d41d71a54101 |
| Booking-EN | ac25775b-c1cc-41ae-8899-810d4ae62efd |
| Booking-ZH | 6ef04a40-6764-4d4e-b2e0-73045b288611 |
| Modification-EN | 9cd8381d-9501-4c9a-a92d-ce185f49e50d |
| Modification-ZH | e348cd2f-f3d8-4b9a-ac35-7725c767287f |
| Registration-EN | 9fcfd00d-1493-4041-9214-36159eba4511 |
| Registration-ZH | ce50df43-7c3a-45f7-b121-8772adaa0eff |
Call Flow Diagram¶
CALL START (Telnyx inbound)
|
v
+--------------------------------------------------+
| 1. ROUTER (vitara-router-v3) |
| STT: AssemblyAI Universal (bilingual) |
| TTS: ElevenLabs multilingual_v2 |
| firstMessage: "Hi, thanks for calling VV Health!|
| English or Mandarin?" |
| (hardcoded in squad YAML, plays before LLM) |
| |
| - LLM calls get_clinic_info (mandatory 1st turn)|
| - Greets with clinic info (customGreeting) |
| - Language detection: KEYWORD-BASED |
| Default = English. Chinese only if caller |
| says "Mandarin/Chinese/中文" or garbled text |
| - Routes to language track |
+---------------------+----------------------------+
|
+-----------+-----------+
v v
ENGLISH TRACK CHINESE TRACK
| |
v v
+-------------------+ +-------------------+
| 2. PATIENT-ID-EN | | 2. PATIENT-ID-ZH |
| STT: Deepgram | | STT: Deepgram |
| nova-2 en | | nova-2 zh |
| TTS: ElevenLabs | | TTS: Azure |
| | | XiaoxiaoNeural |
| - search by phone | | - search by phone |
| (server subs | | (server subs |
| real number) | | real number) |
| - detect intent | | - detect intent |
| - route to role | | - route to role |
+--------+----------+ +--------+----------+
| |
+------+------+ +------+------+
v v v v v v
+----+ +----+ +-----+ +----+ +----+ +-----+
|BOOK| |MOD | |REG | |BOOK| |MOD | |REG |
| EN | | EN | | EN | | ZH | | ZH | | ZH |
+----+ +----+ +-----+ +----+ +----+ +-----+
Handoff Matrix (18 Squad Routes)¶
These are the actual handoff tool definitions from the squad YAML (vitaravox-v3.yml). Each is a Vapi handoff tool type configured in assistantOverrides.tools:append.
| From | To | Handoff Tool | Trigger |
|---|---|---|---|
| Router | Patient-ID-EN | handoff_to_patient_id_en |
Caller speaks English (default) |
| Router | Patient-ID-ZH | handoff_to_patient_id_zh |
Caller requests Chinese/Mandarin |
| Patient-ID-EN | Booking-EN | handoff_to_booking_en |
Patient found + intent = book |
| Patient-ID-EN | Modification-EN | handoff_to_modification_en |
Patient found + intent = reschedule/cancel/check |
| Patient-ID-EN | Registration-EN | handoff_to_registration_en |
Patient not found + new patient |
| Patient-ID-ZH | Booking-ZH | handoff_to_booking_zh |
Patient found + intent = book |
| Patient-ID-ZH | Modification-ZH | handoff_to_modification_zh |
Patient found + intent = reschedule/cancel/check |
| Patient-ID-ZH | Registration-ZH | handoff_to_registration_zh |
Patient not found + new patient |
| Booking-EN | Modification-EN | handoff_to_modification_en |
Patient says "reschedule" or "cancel" |
| Booking-EN | Router | handoff_to_router_v3 |
Unrelated request |
| Booking-ZH | Modification-ZH | handoff_to_modification_zh |
Patient says "reschedule" or "cancel" |
| Booking-ZH | Router | handoff_to_router_v3 |
Unrelated request |
| Modification-EN | Booking-EN | handoff_to_booking_en |
After cancel, patient wants to rebook |
| Modification-EN | Router | handoff_to_router_v3 |
Unrelated request |
| Modification-ZH | Booking-ZH | handoff_to_booking_zh |
After cancel, patient wants to rebook |
| Modification-ZH | Router | handoff_to_router_v3 |
Unrelated request |
| Registration-EN | Booking-EN | handoff_to_booking_en |
After registration, patient wants first appointment |
| Registration-ZH | Booking-ZH | handoff_to_booking_zh |
After registration, patient wants first appointment |
Additionally, agents with the transfer_call tool (Router, Patient-ID EN/ZH, Booking EN/ZH, Modification EN/ZH, Registration EN/ZH) can transfer to clinic staff. This triggers a transfer-destination-request webhook, not a squad handoff.
firstMessage Overrides (from Squad YAML)¶
Squad YAML can override each member's firstMessage. These play immediately on handoff, before the LLM generates any text:
| Agent | firstMessage Override | Source |
|---|---|---|
| Router | "Hi, thanks for calling VV Health! English or Mandarin?" |
vitaravox-v3.yml:6 |
| Patient-ID-EN | "Are you a returning patient, or is this your first visit?" |
vitaravox-v3.yml:38 |
| Patient-ID-ZH | "请问您是老患者还是第一次来?" |
vitaravox-v3.yml:245 |
| Booking-EN | "Let me check what's available for you." |
vitaravox-v3.yml:105 |
| Booking-ZH | "我帮您查一下可以的时间。" |
vitaravox-v3.yml:324 |
| Modification-EN | (none — LLM generates) | — |
| Modification-ZH | (none — LLM generates) | — |
| Registration-EN | (none — LLM generates) | — |
| Registration-ZH | (none — LLM generates) | — |
Handoff Tool Names
All prompts use handoff_to_X function names (e.g., handoff_to_booking_en, handoff_to_modification_zh), not transferAssistant. This was a P0 fix -- the original prompts referenced transferAssistant which does not exist as a tool function name.
Voice & Transcription Configuration¶
Router¶
| Setting | Value |
|---|---|
| STT Provider | AssemblyAI |
| STT Mode | Universal (bilingual auto-detect) |
| TTS Provider | ElevenLabs |
| TTS Voice ID | fQj4gJSexpu8RDE2Ii5m |
| TTS Model | eleven_multilingual_v2 |
AssemblyAI Schema
AssemblyAI transcriber in Vapi has NO endpointing, languageDetection, or model fields. These are Deepgram-specific. Sending a model property to AssemblyAI causes a 400 error.
English Track¶
| Setting | Value |
|---|---|
| STT Provider | Deepgram |
| STT Model | nova-2 |
| STT Language | en |
| TTS Provider | ElevenLabs |
| TTS Voice ID | fQj4gJSexpu8RDE2Ii5m |
| TTS Model | eleven_multilingual_v2 |
| Stability | 0.5 |
| Similarity Boost | 0.7 |
Endpointing (EN standard agents):
| Setting | Value |
|---|---|
| Wait Seconds | 0.6 |
| On Punctuation | 0.3s |
| On No Punctuation | 0.8s |
| On Number | 0.5s |
| Stop on Words | 2 |
Endpointing (EN registration -- longer pauses for spelling):
| Setting | Value |
|---|---|
| Wait Seconds | 1.6 |
| On Punctuation | 1.0s |
| On No Punctuation | 2.5s |
| On Number | 1.2s |
| Stop on Words | 2 |
Chinese Track¶
| Setting | Value |
|---|---|
| STT Provider | Deepgram |
| STT Model | nova-2 |
| STT Language | zh |
| TTS Provider | Azure |
| TTS Voice ID | zh-CN-XiaoxiaoNeural |
TTS Choice
ElevenLabs eleven_turbo_v2_5 is English-only. For CJK languages, eleven_multilingual_v2 or Azure is required. The ZH track uses Azure XiaoxiaoNeural for native Mandarin quality.
Endpointing (ZH standard agents -- tuned for Mandarin speech patterns):
| Setting | Value |
|---|---|
| Wait Seconds | 1.0 |
| On Punctuation | 0.6s |
| On No Punctuation | 1.5s |
| On Number | 0.8s |
| Stop on Words | 3 |
Endpointing (ZH registration -- longer pauses for spelling):
| Setting | Value |
|---|---|
| Wait Seconds | 1.6 |
| On Punctuation | 1.0s |
| On No Punctuation | 2.5s |
| On Number | 1.2s |
| Stop on Words | 2 |
Design Changes from v2.3.0¶
| Change | v2.3.0 | v3.0 | Rationale |
|---|---|---|---|
| Language handling | Single multilingual agent | Explicit Router language gate (keyword-based) | Eliminates LLM language confusion; dedicated STT/TTS per track |
| Patient identification | Combined in Router | Separate Patient-ID agent per track | Cleaner prompt; Router stays lightweight |
| Reschedule + Cancel | Two separate agents | Single Modification agent per track | Reduces squad complexity; both share same tools |
| Confirmation agent | Exists (rarely used) | Eliminated | log_call_metadata absorbed into Booking/Modification/Registration |
| STT (English) | Deepgram nova-2 multi |
Deepgram nova-2 en |
Language-specific = higher accuracy |
| STT (Chinese) | Deepgram nova-2 multi |
Deepgram nova-2 zh |
Language-specific = higher accuracy |
| STT (Router) | Deepgram nova-2 multi |
AssemblyAI Universal | Bilingual detection before routing |
| TTS (English) | ElevenLabs multilingual_v2 | ElevenLabs multilingual_v2 | No change |
| TTS (Chinese) | ElevenLabs multilingual_v2 | Azure zh-CN-XiaoxiaoNeural |
Native Mandarin voice quality |
| ZH Endpointing | Same as EN | Longer pauses (1.0s wait, 0.6s punct) | Mandarin speech patterns need longer pauses |
| Config management | Manual Vapi dashboard | Vapi GitOps (config-as-code) | Version control, reproducibility |
| Handoff routes | 8 | 18 | More granular routing between roles |
Server-Side Middleware¶
Clinic Resolution (Actual Implementation)¶
Incoming webhooks resolve the clinic via this chain:
1. metadata.clinicId (explicit override for testing)
2. call.phoneNumber.number (Vapi phone number)
3. resolvePhoneFromVapiId() (Telnyx BYO fallback: 1-hour cached API lookup)
4. findClinicByVapiPhone() (SELECT id FROM clinic WHERE vapiPhone = $1)
CLINIC RESOLUTION CASCADE (webhook handler)
┌───────────────────────────────────────────────────────┐
│ 1. metadata.clinicId (explicit override for testing) │
│ Found? ──► YES ──► Use this clinic │
│ └─ NO │
│ ▼ │
│ 2. call.phoneNumber.number (Vapi phone number) │
│ Found? ──► YES ──► findClinicByVapiPhone() │
│ └─ NO │
│ ▼ │
│ 3. resolvePhoneFromVapiId() (Telnyx BYO fallback) │
│ Calls GET api.vapi.ai/phone-number (1-hr cache) │
│ Found? ──► YES ──► findClinicByVapiPhone() │
│ └─ NO │
│ ▼ │
│ 4. ERROR: Cannot resolve clinic │
│ Log warning, return error to Vapi │
└───────────────────────────────────────────────────────┘
Telnyx BYO Numbers
For Telnyx BYO phone numbers, call.phoneNumber.number is often empty. The server resolves via call.phoneNumberId using a cached lookup from GET https://api.vapi.ai/phone-number (1-hour TTL). This is critical for clinic resolution.
Call Metadata Cache¶
v3.0 introduced an in-memory callMetadataCache to bridge tool-call webhooks and end-of-call-report webhooks:
Tool call (log_call_metadata) End-of-call-report
| |
v v
setCallMetadata(callId, { getCallMetadata(callId)
language, outcome, -> merges into saveCallLog
demographicId, appointmentId })
- Why: Vapi's
end-of-call-reportdoes not include tool results. The cache lets the server persist metadata that was set during the call. - Fallback:
create_appointmentandregister_new_patientalso cachelanguageanddemographicIdas a safety net iflog_call_metadatais never called. - TTL: 30 minutes. Entries auto-cleaned probabilistically (1% per call).
CALL METADATA CACHE -- Timeline
Time ─────────────────────────────────────────────────────►
┌──────────────────────────────────────────────────────┐
│ DURING CALL │
│ │
│ Tool call: log_call_metadata │
│ ┌─────────────────────────────────┐ │
│ │ setCallMetadata(callId, { │ │
│ │ language: "en", │ │
│ │ callOutcome: "booked", │ │
│ │ demographicId: 123, │ │
│ │ appointmentId: 456 │ │
│ │ }) │ │
│ └────────────┬────────────────────┘ │
│ │ Stored in memory Map │
│ ▼ │
│ SAFETY NET: create_appointment / register also │
│ caches language + demographicId as backup │
├──────────────────────────────────────────────────────┤
│ AFTER CALL ENDS │
│ │
│ Vapi sends end-of-call-report webhook │
│ ┌─────────────────────────────────┐ │
│ │ getCallMetadata(callId) │ │
│ │ → merge into saveCallLog() │ │
│ │ → persist to PostgreSQL │ │
│ └─────────────────────────────────┘ │
│ │
│ TTL: 30 min (1% probabilistic cleanup per call) │
└──────────────────────────────────────────────────────┘
Tool Name Handling¶
The webhook handler accepts both snake_case (canonical) and camelCase (Vapi/legacy) names for each tool:
| Canonical Name | Also Accepted |
|---|---|
search_patient |
searchPatient |
search_patient_by_phone |
(snake_case only) |
create_appointment |
bookAppointment |
cancel_appointment |
cancelAppointment |
update_appointment |
(snake_case only) |
get_providers |
getProviders |
get_clinic_info |
getClinicSettings |
log_call_metadata |
logCallSummary |
transfer_call |
transferToHuman |
register_new_patient |
registerPatient |
add_to_waitlist |
addToWaitlist |
check_appointments |
getPatientAppointments |
find_earliest_appointment |
findEarliestAppointment |
P0/P1/P2 Fixes Applied (2026-02-15 / 2026-02-16)¶
P0 Fixes (Critical -- Applied 2026-02-15)¶
| Fix | Before | After | Impact |
|---|---|---|---|
| Router maxTokens | 150 | 400 | GPT-4o tool-call JSON = 80-120 tokens; 150 caused silent truncation |
| Router prompt rewrite | Rigid "Say EXACTLY 'One moment please'" | Warm acknowledgment + get_clinic_info for greeting |
Clinic-agnostic, natural tone |
| Patient-ID steps merged | Steps 1+2 separate (greet then tool call) | Single first-turn tool call + intent analysis | Eliminates redundant turn |
| Defensive tool-result instruction | None | "WAIT for actual tool result before speaking about the patient" | Prevents hallucinated patient names |
| transferAssistant fix | All 8 non-Router prompts used transferAssistant |
Changed to handoff_to_X (actual function names) |
Handoffs actually work |
| Circuit breaker timeout | 10s | 4s | Under Vapi's 5s tool timeout (both SOAP and bridge) |
| Clinic-agnostic prompts | "Vitara" hardcoded in Patient-ID EN/ZH | All references removed | Multi-clinic ready |
P1/P2 Fixes (Applied 2026-02-16)¶
| Fix | Details |
|---|---|
| request-start messages | Added to all 14 tool YAMLs (3 audible, 11 silent) |
| Patient-ID firstMessage removed | Removed from EN/ZH squad members (was causing 16s silence) |
| Prompt alongside first-turn tool calls | Router/Patient-ID/Booking/Modification say brief phrase alongside tool call |
| FILLER PHRASE RULES deleted | Removed from Booking + Modification EN/ZH prompts (Registration agents retain theirs) |
| Tool-level request-start | Replaces LLM-generated filler phrases for Booking and Modification flows |
Filler Strategy Change
In v4.0.1, filler phrases for Booking and Modification flows are handled at the tool level via Vapi request-start messages. Registration agents still use prompt-level FILLER PHRASE RULES because the registration flow has more varied tool-call patterns.
Known Issues¶
GPT-4o Chinese Character Spacing
GPT-4o occasionally outputs space-separated Chinese characters (e.g., "您 好" instead of "您好"). This is a known GPT-4o behavior. Monitor in the ZH track; may need a post-launch LLM swap if it becomes frequent.
CONVERSATION STYLE Sections — Verified
All 4 ZH prompts confirmed to have ## 对话风格 sections. No action needed (2026-03-04).
transfer_call Tool Gaps — Resolved
transfer_call added to Patient-ID EN/ZH, Booking EN/ZH, Registration EN/ZH (2026-03-04). All role agents now have escalation path to clinic staff.
handoff_to_router_v3 — Added
handoff_to_router_v3 added to Patient-ID-ZH and Registration EN/ZH in squad YAML (2026-03-04).
get_clinic_info Does Not Return clinicName
The Router prompt references [clinicName] from get_clinic_info, but the tool handler does not return a clinicName field. It returns customGreeting and customGreetingZh instead. The Router must use these fields or the LLM receives no clinic name.
SOAP Client Warm Start — Implemented
EmrAdapterFactory now awaits warmUp for preferRest adapters on creation. PM2 startup triggers warmUp via IIFE in index.ts (2026-03-04).
Global Behaviors (All Agents)¶
- Emergency detection: Both EN and ZH keywords trigger 911 message in caller's language
- Silent transfers: No agent ever mentions "transferring", "assistant", or system internals
- Date awareness: All prompts include
{{now | date: ...}}template for current date/time - Past-date clamping: Server rejects appointment dates in the past and clamps to today (added 2026-03-04)
- Phone auto-detection: Server extracts real caller phone from
call.customer.number; LLM sends "0000000000" as placeholder - One question at a time: Never overwhelm the caller
- Delay handling: Tool-level
request-startmessages handle audible feedback during tool calls (3 audible, 11 silent) - Never say technical terms: No "function", "tool", "API", "database"
- SMS consent: Patient-ID agents disclose SMS confirmations; Booking/Modification pass
smsConsentparam and checksmsSentresponse. See SMS Integration
Appointment Type Mapping¶
| Patient Says (EN) | Patient Says (ZH) | Code | Meaning |
|---|---|---|---|
| "checkup", "general", "physical", "not sure" | "体检", "普通检查", "不确定" | B |
General visit |
| "follow-up", "results", "check results" | "复查", "看结果" | 2 |
Follow-up |
| "pain", "illness", "specific complaint" | "不舒服", "疼痛", "生病" | 3 |
Specific concern |
| "prescription refill", "medication" | "配药", "续药" | P |
Prescription |
Server-side validation: appointmentType must be one of ['B', '2', '3', 'P'] (or DB-configured values). Invalid values default to 'B'.
Related Documentation¶
- Agent Behaviors -- Per-agent behavior details
- Tool Inventory -- Full tool reference with distribution matrix
- Conversation UX -- Conversation design principles and scripts
- Vapi Architecture ADR -- Decision history v1.0 to v3.0
- Multilingual Strategy ADR -- Why dual-track in v3.0
- API Endpoints -- All webhook tool parameters