v3.0 Architecture (Current)¶
Production Deployment
Squad ID: 13fdfd19-a2cd-4ca4-8e14-ad2275095e32
Phone: +1 236-305-7446
Status: DEPLOYED to Vapi
Config: Vapi GitOps at vitara-platform/vapi-gitops/
Version Comparison: v2.3.0 vs v3.0¶
| Dimension | v2.3.0 | v3.0 |
|---|---|---|
| Squad ID | 775db28c-21cf-4eec-a643-d078cf9bc5c1 |
13fdfd19-a2cd-4ca4-8e14-ad2275095e32 |
| Agent count | 6 (single language) | 9 (dual-track EN/ZH) |
| Languages | English only | English + Mandarin Chinese |
| Architecture | Linear: Router → Role agents | Dual-track: Router → Language gate → EN/ZH role agents |
| LLM | GPT-4o (OpenAI) | GPT-4o (OpenAI) — all 9 agents |
| Transcriber | AssemblyAI Universal (all agents) | AssemblyAI Universal (Router) + Deepgram Nova-2 per-language |
| TTS | ElevenLabs (all agents) | ElevenLabs (EN track) + Azure XiaoxiaoNeural (ZH track) |
| Reschedule/Cancel | 2 separate agents | 1 consolidated Modification agent per track |
| Confirmation | Dedicated agent | Eliminated — log_call_metadata absorbed into role agents |
| Patient ID | Part of Router | Dedicated Patient-ID agent per track |
| Router maxTokens | 150 | 400 (P0 fix — 150 caused silent truncation) |
| Circuit breaker | 10s timeout | 4s timeout (P0 fix) |
| Filler phrases | LLM-generated | request-start messages on all 14 tools |
| Config management | Manual Vapi API calls | Vapi GitOps (YAML/MD source of truth) |
Squad Topology — 9-Agent Dual-Track Layout¶
+1 236-305-7446
|
[ Telnyx SIP ]
|
[ Vapi SIP ]
|
"Hi there, thanks for calling!"
|
+----------------------------+
| ROUTER (4f70e214) |
| STT: AssemblyAI Universal |
| TTS: ElevenLabs |
| LLM: GPT-4o T=0.3 |
| maxTokens: 400 |
| |
| Tools: |
| - get_clinic_info |
| - transfer_call |
| - log_call_metadata |
| |
| 1. Play firstMessage |
| 2. Call get_clinic_info |
| 3. Detect language via STT |
| 4. Route to EN or ZH track |
+----------------------------+
| |
Language = EN Language = ZH
| |
+-------------+ +-------------+
| |
v v
+=============================+ +=============================+
| ENGLISH TRACK | | CHINESE TRACK |
| STT: Deepgram Nova-2 `en` | | STT: Deepgram Nova-2 `zh` |
| TTS: ElevenLabs | | TTS: Azure XiaoxiaoNeural |
+=============================+ +=============================+
| |
v v
+--------------------+ +--------------------+
| PATIENT-ID EN | | PATIENT-ID ZH |
| (7d054785) | | (7585c092) |
| T=0.5 maxTok=200 | | T=0.5 maxTok=200 |
| Tools: | | Tools: |
| search_patient_ | | search_patient_ |
| by_phone | | by_phone |
| search_patient | | search_patient |
| get_clinic_info | | get_clinic_info |
| transfer_call | | transfer_call |
+--------------------+ +--------------------+
| |
Patient identified Patient identified
+ intent detected + intent detected
| |
+----+----------+-----+ +----+----------+-----+
| | | | | | | |
v v v v v v v v
+------+------+ +------+------+ +------+------+ +------+------+
|BOOKING EN | |MODIF. EN | |BOOKING ZH | |MODIF. ZH |
|(ac25775b) | |(9cd8381d) | |(6ef04a40) | |(e348cd2f) |
|T=0.5 200tok | |T=0.5 200tok | |T=0.5 200tok | |T=0.5 200tok |
+------+------+ +------+------+ +------+------+ +------+------+
| | | | | | | |
v v v v v v v v
+------+------+ +------+------+ +------+------+ +------+------+
|REGISTR. EN | | | |REGISTR. ZH | | |
|(9fcfd00d) | | | |(ce50df43) | | |
|T=0.5 250tok | | | |T=0.5 250tok | | |
+-------------+ +-------------+ +-------------+ +-------------+
Complete Call Flow¶
CALLER TELCO VAPI PLATFORM OUR SERVER OSCAR EMR
| | | | |
| Dial +1 236-305-7446 | | |
|-------------------->| | | |
| | SIP INVITE | | |
| |------------------>| | |
| | 200 OK (media) | | |
| |<------------------| | |
| | | | |
| <-- "Hi there, thanks for calling!" | | |
| |<--[ElevenLabs TTS]| | |
| | | | |
| | ROUTER AGENT (4f70e214) | |
| | AssemblyAI Universal STT | |
| | | | |
| | get_clinic_info | | |
| | POST webhook--->| /api/vapi/get-clinic | |
| | | info | |
| | | Zod validate | |
| | | Prisma lookup | |
| | | 200 {name, hours, | |
| | <-- JSON result | address, phone} | |
| | | | |
| "I'd like to book | | | |
| an appointment" | | | |
|-------------------->| | | |
| | STT transcription | | |
| | (AssemblyAI | | |
| | detects: EN) | | |
| | | | |
| | GPT-4o determines:| | |
| | language=EN | | |
| | intent=booking | | |
| | | | |
| | handoff_to_patient| | |
| | _id_en | | |
| | | | |
| | PATIENT-ID EN (7d054785) | |
| | Deepgram Nova-2 `en` STT | |
| | | | |
| "My phone number | | | |
| is 604-555-1234" | | | |
|-------------------->| | | |
| | search_patient_ | | |
| | by_phone | | |
| | POST webhook--->| /api/vapi/search- | |
| | | patient-by-phone | |
| | | Zod validate | |
| | | AdapterFactory | |
| | | .getAdapter(clinic) | |
| | | | | |
| | | [Circuit Breaker | |
| | | 4s timeout] | |
| | | |--SOAP or REST-->| quickSearch |
| | | | | ?query=6045551234 |
| | | |<--patient data--| |
| | | Transform response | |
| | <-- JSON result | 200 {patients:[...]} | |
| | | | |
| "Yes, that's me, | | | |
| John Smith" | | | |
|-------------------->| | | |
| | | | |
| | GPT-4o: patient | | |
| | confirmed, | | |
| | intent=booking | | |
| | | | |
| | handoff_to_ | | |
| | booking_en | | |
| | | | |
| | BOOKING EN (ac25775b) | |
| | ... (see Booking Flow below) | |
| | | | |
Language Detection
The Router uses AssemblyAI Universal Multilingual STT, which transcribes both English and Mandarin. The GPT-4o LLM then determines the language from the transcription and routes to the appropriate track via handoff_to_patient_id_en or handoff_to_patient_id_zh.
Booking Flow (Detailed)¶
CALLER BOOKING EN AGENT SERVER OSCAR EMR
| (ac25775b) | |
| | | |
| Patient context | | |
| passed from | | |
| Patient-ID handoff | | |
| {demographicNo, | | |
| name, phone} | | |
| | | |
| | get_providers | |
| | POST---------------->| /api/vapi/get-providers |
| | request-start: | Zod validate |
| | "One moment..." | AdapterFactory |
| <-- "One moment | | .getAdapter() |
| while I check..." | | | |
| | | [Breaker 4s] |
| | | |---getProviders--->|
| | | | (XML fallback |
| | | | for Kai 406) |
| | | |<--provider list---|
| | <-- providers[] | Transform + filter |
| | | |
| | find_earliest_ | |
| | appointment | |
| | POST---------------->| /api/vapi/find-earliest |
| | | Zod validate |
| | | AdapterFactory |
| | | | |
| | | [Breaker 4s] |
| | | |--getSchedule----->|
| | | | Slots |
| | | |<--{data:{slots}}--|
| | | | |
| | | |--getAppointments->|
| | | | (collision check)|
| | | |<--existing appts--|
| | | Filter booked times |
| | | from available slots |
| | <-- available slots | getTrueAvailability() |
| | | |
| "Dr. Smith at 2pm | | |
| on Thursday please" | | |
|---------------------->| | |
| | | |
| | create_appointment | |
| | POST---------------->| /api/vapi/create-appt |
| | request-start: | Zod validate |
| | "Booking that..." | Past-date clamp (P1) |
| | | acquireAdvisoryLock() |
| <-- "Booking that | | (fail-safe: false |
| for you now..." | | on error, not true) |
| | | | |
| | | [Breaker 4s] |
| | | |--createAppt------>|
| | | | NewAppointmentTo1 |
| | | | startTime: "HH:mm"|
| | | | status: "t" |
| | | |<--201 + apptId----|
| | | releaseAdvisoryLock() |
| | <-- {success, id} | |
| | | |
| | GPT-4o confirms: | |
| "Your appointment | "Thursday 2pm with | |
| is booked!" | Dr. Smith confirmed" | |
|<----------------------| | |
| | | |
| | log_call_metadata | |
| | POST---------------->| /api/vapi/log-metadata |
| | | Store call summary |
| | | Prisma insert |
| | <-- {logged: true} | |
| | | |
| "Thank you, | | |
| goodbye!" | | |
|<---"Have a great | | |
| day!" | | |
| | | |
| [CALL ENDS] | | |
Modification Flow (Reschedule + Cancel)¶
CALLER MODIFICATION EN AGENT SERVER OSCAR EMR
| (9cd8381d) | |
| | | |
| Patient context | | |
| from Patient-ID | | |
| handoff | | |
| | | |
| | check_appointments | |
| | POST------------------->| /api/vapi/check-appts |
| | request-start: | Zod validate |
| | "Let me pull up | AdapterFactory |
| | your appointments..." | .getAdapter() |
| | | | |
| <-- "Let me pull | | [Breaker 4s] |
| up your appts..." | | |--getAppts------->|
| | | | demographicNo |
| | | |<--appt list------|
| | <-- appointments[] | Transform dates/times |
| | | |
| | GPT-4o presents: | |
| "You have: | "1. Mar 10, 2pm | |
| 1. Mar 10, 2pm | Dr. Smith | |
| 2. Mar 15, 10am" | 2. Mar 15, 10am | |
|<----------------------| Dr. Jones" | |
| | | |
| | | |
|===== RESCHEDULE PATH ========================== |=========================|
| | | |
| "Reschedule #1 | | |
| to next week" | | |
|---------------------->| | |
| | | |
| | find_earliest_ | |
| | appointment | |
| | POST------------------->| /api/vapi/find-earliest |
| | | getTrueAvailability() |
| | | |--getSchedule----->|
| | | |<--slots-----------|
| | | |--getAppts-------->|
| | | |<--booked----------|
| | <-- available slots | Filter + return |
| | | |
| "Tuesday 3pm works" | | |
|---------------------->| | |
| | | |
| | cancel_appointment | |
| | POST------------------->| /api/vapi/cancel-appt |
| | | [Breaker 4s] |
| | | |--updateStatus---->|
| | | | appt/{id}/ |
| | | | updateStatus |
| | | |<--200 OK----------|
| | <-- {cancelled} | |
| | | |
| | create_appointment | |
| | POST------------------->| /api/vapi/create-appt |
| | | Past-date clamp |
| | | Advisory lock |
| | | [Breaker 4s] |
| | | |--createAppt------>|
| | | |<--201 + newId-----|
| | <-- {success, newId} | Release lock |
| | | |
| "Rescheduled to | | |
| Tuesday 3pm!" | | |
|<----------------------| | |
| | | |
|===== CANCEL PATH ============================== |=========================|
| | | |
| "Cancel appointment | | |
| #2 please" | | |
|---------------------->| | |
| | | |
| | cancel_appointment | |
| | POST------------------->| /api/vapi/cancel-appt |
| | | [Breaker 4s] |
| | | |--updateStatus---->|
| | | | appt/{id}/ |
| | | | updateStatus |
| | | |<--200 OK----------|
| | <-- {cancelled: true} | |
| | | |
| "Appointment | | |
| cancelled!" | | |
|<----------------------| | |
| | | |
| | log_call_metadata | |
| | POST------------------->| Log action taken |
| | <-- {logged} | |
| | | |
| [CALL ENDS] | | |
Server Data Flow¶
INBOUND REQUEST
|
v
+===========================================================================+
| EXPRESS 4 SERVER (port 3002) |
| PM2 process: vitara-admin-api |
+===========================================================================+
|
v
+-------------------+
| Vapi Webhook POST | POST https://api-dev.vitaravox.ca/api/vapi/{slug}
| Content-Type: |
| application/json | Body: { message: { type, call, toolCalls,
+-------------------+ toolCallList, ... } }
|
v
+-------------------+
| Express Router | /api/vapi/:toolSlug
| vapi-webhook.ts | Maps slug to handler function
+-------------------+
|
v
+-------------------+
| Zod Validation | Validates incoming payload structure:
| | - toolCallId (string)
| | - parameters (tool-specific schema)
| | - call.customer.number (phone extraction)
| | - clinic context (from call metadata)
+-------------------+
|
| Valid Invalid
v v
+-------------------+ +-------------------+
| Pino Structured | | 400 Bad Request |
| Logging | | {error: "Zod |
| (request details) | | validation |
+-------------------+ | failed: ..."} |
| +-------------------+
v
+-------------------+
| Clinic Resolution | Prisma lookup → clinic config
| | - clinicId, slug, name
| | - emrType (oscar-soap | oscar-rest)
| | - preferRest: boolean
| | - timezone (default: America/Vancouver)
+-------------------+
|
v
+-------------------------------------------+
| EmrAdapterFactory.getAdapter(clinicId) |
| |
| Cache hit? |
| YES → return cached adapter |
| NO → create new adapter: |
| preferRest=true? |
| → OscarSoapAdapter (REST mode) |
| await warmUp() ← P1 fix |
| (cold TLS to Kai CF ~6s) |
| → OscarSoapAdapter (SOAP mode) |
| WSDL fetch + cache |
+-------------------------------------------+
|
v
+-------------------------------------------+
| Circuit Breaker (opossum) |
| |
| +-----------+ +-----------+ |
| | READ | | WRITE | |
| | Breaker | | Breaker | |
| | | | | |
| | timeout: | | timeout: | |
| | 4000ms | | 4000ms | |
| | threshold:| | threshold:| |
| | 50% | | 50% | |
| | reset: | | reset: | |
| | 30s | | 30s | |
| +-----------+ +-----------+ |
| | | |
| getAppointments createAppointment |
| getProviders cancelAppointment |
| searchPatient updateAppointment |
| getScheduleSlots |
+-------------------------------------------+
|
v
+-------------------------------------------+
| OSCAR EMR Connection |
| |
| +--- SOAP MODE (self-hosted) ----------+ |
| | node-soap client | |
| | WS-Security: UsernameToken | |
| | passwordType: PasswordText | |
| | mustUnderstand: true | |
| | hasTimeStamp: false ← REQUIRED | |
| | hasNonce: false | |
| | Endpoints: | |
| | /ws/XxxService?wsdl | |
| | Positional args: arg0, arg1, ... | |
| +--------------------------------------+ |
| |
| +--- REST MODE (Kai-hosted/CF WAF) ----+ |
| | OAuth 1.0a signed requests | |
| | Consumer: Vitaradev | |
| | key: l2vy8ibulrhuxo9d | |
| | Endpoints: | |
| | /ws/services/* (CXF OAuth REST) | |
| | Accept: application/json | |
| | fallback: application/xml | |
| | (providers → XML due to 406 bug) | |
| | Write format: NewAppointmentTo1 | |
| | startTime: "HH:mm" (.slice(0,5)) | |
| | duration: int (REQUIRED) | |
| | status: "t" (REQUIRED) | |
| +--------------------------------------+ |
+-------------------------------------------+
|
v
+-------------------+
| Response | Transform OSCAR response → JSON
| Transformation | - Normalize dates/times to clinic TZ
| | - Map OSCAR fields to Vapi tool schema
| | - Filter non-bookable schedule codes
| | (L, P, V, A, a, B, H, R, E, G, M, m, d, t)
+-------------------+
|
v
+-------------------+
| HTTP Response | 200 OK
| | Content-Type: application/json
| | { results: [ { ... } ] }
| |
| | → Vapi receives result
| | → GPT-4o generates speech
| | → TTS renders audio
| | → Caller hears response
+-------------------+
STT/TTS Pipeline Per Track¶
CALLER SPEECH (English)
|
v
+---------------------------+
| Deepgram Nova-2 |
| Language: en |
| Model: nova-2 |
| Endpointing: default |
| (standard pause |
| detection) |
+---------------------------+
|
| Transcription text (EN)
v
+---------------------------+
| GPT-4o (OpenAI) |
| Temperature: 0.5 |
| maxTokens: 200-250 |
| (varies by agent role) |
+---------------------------+
|
| Generated response text (EN)
v
+---------------------------+
| ElevenLabs TTS |
| Model: eleven_ |
| multilingual_v2 |
| Voice ID: fQj4gJSexpu8 |
| RDE2Ii5m |
| Language: English |
+---------------------------+
|
| Audio stream
v
CALLER HEARS ENGLISH SPEECH
CALLER SPEECH (Mandarin)
|
v
+---------------------------+
| Deepgram Nova-2 |
| Language: zh |
| Model: nova-2 |
| Endpointing: extended |
| wait: 1.0s |
| punctuation: 0.6s |
| (longer pauses for |
| Mandarin cadence) |
+---------------------------+
|
| Transcription text (ZH)
v
+---------------------------+
| GPT-4o (OpenAI) |
| Temperature: 0.5 |
| maxTokens: 200-250 |
| (varies by agent role) |
| NOTE: GPT-4o may output |
| space-separated Chinese |
| characters — monitor |
+---------------------------+
|
| Generated response text (ZH)
v
+---------------------------+
| Azure Cognitive Services |
| Voice: zh-CN-Xiaoxiao |
| Neural |
| Language: zh-CN |
| Quality: Native Mandarin |
| prosody |
| (ElevenLabs NOT used — |
| Azure superior for CJK) |
+---------------------------+
|
| Audio stream
v
CALLER HEARS MANDARIN SPEECH
GPT-4o Chinese Character Spacing
GPT-4o has been observed outputting space-separated Chinese characters in some responses. This is being monitored in the ZH track and may require a post-launch LLM swap if quality degrades.
Why Not ElevenLabs for Chinese?
ElevenLabs eleven_turbo_v2_5 is English-only. The multilingual model eleven_multilingual_v2 supports CJK but Azure's XiaoxiaoNeural voice provides significantly more natural Mandarin prosody and is the preferred choice.
Agent Registry¶
Full UUID Reference¶
| Agent | Short ID | Full UUID |
|---|---|---|
| Router | 4f70e214 |
4f70e214-6111-4f53-86c9-48f8f7c265e1 |
| Patient-ID EN | 7d054785 |
7d054785-9074-4856-81db-9fe44da47bc5 |
| Patient-ID ZH | 7585c092 |
7585c092-f8b3-4bdd-95ba-d41d71a54101 |
| Booking EN | ac25775b |
ac25775b-c1cc-41ae-8899-810d4ae62efd |
| Booking ZH | 6ef04a40 |
6ef04a40-6764-4d4e-b2e0-73045b288611 |
| Modification EN | 9cd8381d |
9cd8381d-9501-4c9a-a92d-ce185f49e50d |
| Modification ZH | e348cd2f |
e348cd2f-f3d8-4b9a-ac35-7725c767287f |
| Registration EN | 9fcfd00d |
9fcfd00d-1493-4041-9214-36159eba4511 |
| Registration ZH | ce50df43 |
ce50df43-7c3a-45f7-b121-8772adaa0eff |
Agent Configuration Matrix¶
| Agent | Vapi Name | LLM | Temp | MaxTokens | STT | TTS |
|---|---|---|---|---|---|---|
| Router | vitara-router-v3 | GPT-4o | 0.3 | 400 | AssemblyAI Universal | ElevenLabs |
| Patient-ID EN | vitara-patient-id-en-v3 | GPT-4o | 0.5 | 200 | Deepgram Nova-2 en |
ElevenLabs |
| Patient-ID ZH | vitara-patient-id-zh-v3 | GPT-4o | 0.5 | 200 | Deepgram Nova-2 zh |
Azure XiaoxiaoNeural |
| Booking EN | vitara-booking-en-v3 | GPT-4o | 0.5 | 200 | Deepgram Nova-2 en |
ElevenLabs |
| Booking ZH | vitara-booking-zh-v3 | GPT-4o | 0.5 | 200 | Deepgram Nova-2 zh |
Azure XiaoxiaoNeural |
| Modification EN | vitara-modification-en-v3 | GPT-4o | 0.5 | 200 | Deepgram Nova-2 en |
ElevenLabs |
| Modification ZH | vitara-modification-zh-v3 | GPT-4o | 0.5 | 200 | Deepgram Nova-2 zh |
Azure XiaoxiaoNeural |
| Registration EN | vitara-registration-en-v3 | GPT-4o | 0.5 | 250 | Deepgram Nova-2 en |
ElevenLabs |
| Registration ZH | vitara-registration-zh-v3 | GPT-4o | 0.5 | 250 | Deepgram Nova-2 zh |
Azure XiaoxiaoNeural |
Router Temperature
The Router uses a lower temperature (0.3) than role agents (0.5) because its job is deterministic: detect language and route. Higher creativity would cause inconsistent routing decisions.
Registration MaxTokens
Registration agents get 250 tokens (vs 200 for others) because patient registration involves collecting and confirming more data fields (name, DOB, address, phone, health card number).
Tool Distribution Matrix¶
| Tool | Router | Patient-ID | Booking | Modification | Registration |
|---|---|---|---|---|---|
get_clinic_info |
x | x | |||
transfer_call |
x | x | x | x | x |
log_call_metadata |
x | x | x | x | |
search_patient_by_phone |
x | ||||
search_patient |
x | ||||
find_earliest_appointment |
x | x | |||
check_appointments |
x | x | |||
create_appointment |
x | x | |||
get_providers |
x | x | |||
update_appointment |
x | ||||
cancel_appointment |
x | ||||
register_new_patient |
x | ||||
add_to_waitlist |
x |
14 Tools Total
Each EN/ZH pair shares the same tool set — tools are language-agnostic. The language-specific behavior comes from the agent prompts and STT/TTS configuration, not the tools themselves.
Handoff Topology¶
+----------+
| ROUTER |
| 4f70e214 |
+-----+----+
|
+--------------+--------------+
| |
handoff_to_ handoff_to_
patient_id_en patient_id_zh
| |
+------+------+ +-------+------+
| PATIENT-ID | | PATIENT-ID |
| EN | | ZH |
| 7d054785 | | 7585c092 |
+------+------+ +-------+------+
| |
+------------+------------+ +-----------+-----------+
| | | | | |
handoff_to_ handoff_to_ handoff_ handoff_ handoff_ handoff_
booking_en modif_en to_reg_ to_book_ to_modif_ to_reg_
en zh zh zh
| | | | | |
+-----+----+ +----+-----+ +----+--+ +----+----+ +----+----+ +----+---+
|BOOKING EN| |MODIF. EN | |REG. EN| |BOOK. ZH| |MOD. ZH | |REG. ZH|
|ac25775b | |9cd8381d | |9fcfd00d| |6ef04a40| |e348cd2f| |ce50df43|
+----------+ +----------+ +-------+ +--------+ +--------+ +-------+
| | | | | |
+------+-----+------+----+ +-----+-----+-----+----+
| | | |
handoff_to_ handoff_to_ handoff_to_ handoff_to_
router_v3 router_v3 router_v3 router_v3
| | | |
+------+------+ +-----+-----+
| |
+------+------+ +------+------+
| ROUTER | | ROUTER |
| (return) | | (return) |
+-------------+ +-------------+
NOTE: handoff_to_router_v3 added to Patient-ID-ZH
as P1 fix (was missing, trapping ZH callers)
NOTE: transfer_call tool available on ALL 9 agents
for escalation to human operator
Key Design Decisions¶
1. Explicit Language Gate at Router¶
Decision
The Router uses AssemblyAI Universal Multilingual for STT, which can transcribe both English and Mandarin. After transcription, GPT-4o determines the caller's language and routes to the appropriate track via handoff tools.
Why not auto-detect per-agent? Each downstream agent uses language-specific Deepgram Nova-2 (en or zh) for higher accuracy. A Chinese caller hitting an English STT would get garbage transcriptions. The Router serves as the language gate that prevents this.
2. Patient-ID Separated from Router¶
Problem: In v2.3.0, the Router handled both routing AND patient identification, leading to bloated prompts and inconsistent behavior.
Fix: Dedicated Patient-ID agents handle phone lookup and patient confirmation. The Router only determines language and intent.
3. Reschedule + Cancel Consolidated¶
Problem: v2.3.0 had separate Reschedule and Cancel agents with nearly identical tool sets and significant prompt overlap.
Fix: Single Modification agent per track handles both reschedule and cancel workflows. Reduces agent count and simplifies handoff topology.
4. Confirmation Agent Eliminated¶
Problem: The dedicated Confirmation agent added an unnecessary handoff hop. Its only job was calling log_call_metadata.
Fix: log_call_metadata tool absorbed into Booking, Modification, and Registration agents. Each agent logs its own call summary before ending the conversation.
5. request-start Messages Replace LLM Filler¶
request-start
All 14 tools have request-start messages configured in the Vapi tool YAML. 4 are audible ("One moment while I check...") and 10 are silent. This replaces unreliable LLM-generated filler phrases.
6. ZH Endpointing Tuned for Mandarin¶
Mandarin speech has different pause patterns than English. The ZH track uses extended endpointing:
- Wait timeout: 1.0s (vs default ~0.5s)
- Punctuation timeout: 0.6s
This prevents Deepgram from prematurely cutting off Mandarin speakers mid-sentence.
P0/P1/P2 Fixes Applied¶
P0 — Critical Production Fixes
| Fix | Problem | Solution |
|---|---|---|
| Router maxTokens | 150 tokens caused GPT-4o silent truncation (tool-call JSON alone = 80-120 tokens) | Increased to 400 |
| Router prompt rewrite | Rigid scripting ("Say EXACTLY 'One moment please'") caused unnatural responses | Warm acknowledgment + dynamic greeting via get_clinic_info |
| Patient-ID steps merged | Steps 1+2 were sequential, causing unnecessary back-and-forth | Consolidated into single first-turn tool call + intent analysis |
| Defensive tool-result | LLM would hallucinate patient data before tool returned | Added "WAIT for actual tool result before speaking" instruction |
| Handoff naming | Prompts referenced transferAssistant but actual tools are handoff_to_X |
Fixed all 8 non-Router prompts |
| Circuit breaker timeout | 10s timeout exceeded Vapi's 5s tool timeout window | Reduced to 4s |
P1 — High Priority Fixes
| Fix | Problem | Solution |
|---|---|---|
| Past-date clamp | LLM would pass past dates to createAppointment |
Server-side clamp to today in bookAppointment handler |
| EMR adapter warmup | Cold TLS to Kai CF ~6s, exceeding 4s breaker on first call | await warmUp() on PM2 startup (IIFE in index.ts) |
| transfer_call tool | Missing from 6 agents — callers couldn't escalate to human | Added to Patient-ID EN/ZH, Booking EN/ZH, Registration EN/ZH |
| handoff_to_router_v3 | Missing from Patient-ID-ZH — ZH callers trapped if wrong intent | Added to squad YAML |
| Lock fail-safe | acquireAdvisoryLock returned true on error, allowing double-booking |
Changed to return false on error |
P2 — Quality Improvements
| Fix | Problem | Solution |
|---|---|---|
| request-start messages | LLM filler phrases inconsistent and slow | Added to all 14 tools (4 audible, 10 silent) |
| firstMessage removed | Patient-ID EN/ZH had firstMessage causing 16s silence after handoff | Removed from squad member config |
| Filler phrase rules deleted | FILLER PHRASE RULES sections in prompts conflicted with request-start |
Deleted from Booking + Modification EN/ZH prompts |
Server Architecture Summary¶
+================================================================+
| OCI ARM Instance (Toronto) |
| |
| +----------------------------------------------------------+ |
| | PM2: vitara-admin-api (port 3002) | |
| | | |
| | Express 4 | |
| | +-- /api/vapi/:toolSlug (Vapi webhook) | |
| | +-- /api/clinics (Admin CRUD) | |
| | +-- /api/providers (Provider mgmt) | |
| | +-- /api/appointments (Direct queries) | |
| | | |
| | Prisma ORM ──> PostgreSQL (15 models, PIPEDA compliant) | |
| | Zod ──> Request validation | |
| | Pino ──> Structured JSON logging | |
| | opossum ──> Circuit breakers (read + write, 4s timeout) | |
| | node-soap ──> OSCAR SOAP client | |
| | oauth-1.0a ──> OSCAR REST OAuth signing | |
| +----------------------------------------------------------+ |
| |
| +------------------+ +------------------+ |
| | nginx (reverse | | Let's Encrypt | |
| | proxy + TLS) | | (auto-renew) | |
| +------------------+ +------------------+ |
+================================================================+
| |
| SOAP (self-hosted) | OAuth REST (Kai-hosted)
v v
+------------------+ +------------------+
| Self-hosted | | Kai OSCAR Pro |
| OSCAR instances | | (fbh.kai-oscar |
| /ws/XxxService | | .com) |
| | | Cloudflare WAF |
| WS-Security | | /ws/services/* |
| UsernameToken | | OAuth 1.0a |
+------------------+ +------------------+
Configuration Management¶
Vapi GitOps
All v3.0 configuration is managed as code in vitara-platform/vapi-gitops/.
- Assistants:
.mdfiles (YAML frontmatter + system prompt as Markdown body) - Tools:
.ymlfiles with webhook URLs and parameter schemas - Squads:
.ymlfiles defining member topology and handoff rules - State:
.vapi-state.dev.jsonmaps slugs to Vapi UUIDs - Deploy:
cd vitara-platform/vapi-gitops && npm run push:dev - Env key:
VAPI_TOKENin.env.dev(not VAPI_API_KEY)
Resources reference each other by filename without extension — the GitOps engine resolves slugs to Vapi UUIDs at push time.