Voice Architecture Analysis¶
VitaraVox Enterprise Readiness Analysis¶
Date: February 17, 2026¶
Agent: Voice Architecture & Telephony Analyst¶
COMPREHENSIVE ANALYSIS: VITARAVOX v3.0 VAPI GITOPS ARCHITECTURE¶
EXECUTIVE SUMMARY¶
VitaraVox v3.0 is a production-deployed, multilingual voice agent system managing 9 Vapi assistants (dual-track EN/ZH) coordinated via a squad with 14 tools connecting to the OSCAR EMR backend. The infrastructure is managed via Vapi GitOps — a declarative, version-controlled configuration system with official TypeScript engine. All core functionality is deployed and tested; the system is live on phone number +1 236-305-7446.
1. SQUAD TOPOLOGY & HANDOFF PATTERNS¶
1.1 Squad Architecture (9 Members)¶
File: /home/ubuntu/vitara-platform/vapi-gitops/resources/squads/vitaravox-v3.yml
Squad ID: 13fdfd19-a2cd-4ca4-8e14-ad2275095e32 (deployed to Vapi)
Entry Point:
├─ ROUTER (4f70e214) — Language detection + emergency handling
English Track (4 agents):
├─ Patient-ID-EN (7d054785) — Phone/name lookup + intent routing
├─ Booking-EN (ac25775b) — Find slots & create appointments
├─ Modification-EN (9cd8381d) — Reschedule/cancel/check
└─ Registration-EN (9fcfd00d) — New patient registration
Chinese Track (4 agents):
├─ Patient-ID-ZH (7585c092)
├─ Booking-ZH (6ef04a40)
├─ Modification-ZH (e348cd2f)
└─ Registration-ZH (ce50df43)
1.2 Handoff Pattern Design¶
All 8 non-Router assistants have assistantOverrides.tools:append with silent handoff tools defined in the squad YAML:
type: handoff
function:
name: handoff_to_booking_en
description: "Route to English booking when patient wants to book"
destinations:
- assistantName: vitara-booking-en-v3
description: "English appointment booking"
type: assistant
messages:
- content: "" # CRITICAL: Empty content = invisible handoff
type: request-start
Key Design Decision: Handoff destinations use assistantName (the name field from YAML frontmatter), NOT assistantId. The GitOps engine resolves the name to the actual UUID at push time. This allows decoupling prompt changes from UUID management.
1.3 Handoff Flow Example (Booking Path)¶
Router (greeting + language detect via get_clinic_info)
└─ call handoff_to_patient_id_en [silent, empty message]
└─ Patient-ID-EN (search_patient_by_phone + confirm identity)
└─ call handoff_to_booking_en [silent]
└─ Booking-EN (find_earliest_appointment + create_appointment)
└─ Optional: handoff_to_modification_en [if reschedule request]
└─ Optional: handoff_to_router_v3 [if unrelated request]
No cross-track handoffs implemented — once in EN/ZH track, conversation stays in that language track. Router is the only multilingual agent (uses AssemblyAI Universal STT).
2. PROMPT ENGINEERING QUALITY¶
2.1 System Prompt Structure (Example: Router)¶
File: /home/ubuntu/vitara-platform/vapi-gitops/resources/assistants/router-v3.md
---
name: vitara-router-v3
model:
model: gpt-4o
provider: openai
temperature: 0.3
maxTokens: 400 # CRITICAL: Increased 150→400 to prevent truncation
toolIds:
- get-clinic-info-aaec50cf
- transfer-call-d95ed81e
- log-call-metadata-4619b3cb
transcriber:
provider: assembly-ai
voice:
provider: 11labs
voiceId: fQj4gJSexpu8RDE2Ii5m
model: eleven_multilingual_v2
---
## IDENTITY
## CRITICAL: Current date/time is {{now | date: "%B %d, %Y %I:%M %p", "America/Vancouver"}}...
You are a bilingual front-desk scheduling assistant.
## EMERGENCY [hardcoded escalation keywords in EN + ZH]
## INVISIBLE HANDOFFS
When routing the caller, make it sound like a natural conversation. Say "Sure!" then call handoff tool.
## FLOW
### Step 1: Call get_clinic_info (FIRST TURN — MANDATORY)
In your very first response, call get_clinic_info. This gives clinic name for greeting.
### Step 2: Greet with clinic name + route
Once get_clinic_info returns, say ONE warm line: "Welcome to [clinicName]!"
Then call appropriate handoff tool (handoff_to_patient_id_en or handoff_to_patient_id_zh)
Quality Observations:
- Defensive tool-result instruction: "WAIT for actual tool result before speaking about X" — explicitly prevents LLM from hallucinating tool outcomes
- Single-turn tool + speech: "Call tool in your first response" — ensures filler speech covers tool latency
- Language detection logic: Keyword-based (NOT STT-based) — caller must say "Mandarin", "Chinese", "中文" to trigger ZH track
- maxTokens tuning: 400 tokens for Router (was 150) to prevent GPT-4o prompt truncation on complex tool calls
- Timezone-aware templates: Uses Liquid
{{now | date: format, timezone}}with America/Vancouver
2.2 Patient-ID-EN Prompt Strengths¶
File: /home/ubuntu/vitara-platform/vapi-gitops/resources/assistants/patient-id-en.md
### Step 1: Look up the patient and analyze intent (FIRST TURN)
IMMEDIATELY call `search_patient_by_phone` with phone "0000000000" — this must be in your
very first response, no exceptions. The system uses the real caller number automatically.
Say "One moment while I look you up" alongside the tool call.
**Intent detection** — check what the caller said:
- "book", "appointment" → intent = BOOK
- "reschedule", "change my appointment" → intent = RESCHEDULE
- etc.
### Step 2: Confirm patient identity
CRITICAL: WAIT for the actual `search_patient_by_phone` result before speaking.
Read the `found` field from the ACTUAL tool response.
Strengths: - Explicit defensive pattern: "WAIT for actual tool response" - Server-side phone number handling: "0000000000" is placeholder; server extracts real phone from Vapi metadata - Multi-level confirmation: Confirm identity on "yes", offer search by name on "no" - Error-aware: Fallback to manual search if phone lookup fails
2.3 Registration Prompt (EN/ZH) — Name Collection¶
File: /home/ubuntu/vitara-platform/vapi-gitops/resources/assistants/registration-en.md
Critical spelling rule:
1. **Full name** — "What is your full legal name?"
- If unclear, say: "Could you spell it? A as in Apple, B as in Bravo..."
- **IMPORTANT: While the caller is spelling, stay COMPLETELY SILENT.
Do NOT speak or acknowledge individual letters. Wait until the caller
clearly finishes or pauses for several seconds before responding.**
- After receiving spelling, repeat FULL name back once and ask "Is that correct?"
This is a PHI-handling best practice — prevents accidental misspellings on health records and demonstrates appropriate silence during spelling, which is conversationally natural.
2.4 Documented P0 Fixes (2026-02-15)¶
From memory notes:
- Router maxTokens 150→400 — Fixed GPT-4o silent truncation on tool-call JSON
- Router prompt rewritten — Replaced rigid "Say EXACTLY" scripting with warm acknowledgment
- Patient-ID EN/ZH steps merged — Consolidated first-turn tool call + intent analysis
- Defensive tool-result instruction — Added across all prompts
- transferAssistant → handoff_to_X — Fixed function names to match squad YAML
- Circuit breaker 10s→4s — SOAP phone search timeout tuned for Vapi 5s window
- All prompts clinic-agnostic — Removed "Vitara" branding (replaced with clinic_info tool result)
- All 9 prompts pushed to Vapi API — Verified 9/9 success
3. MULTILINGUAL DESIGN (EN/ZH)¶
3.1 STT/TTS Strategy¶
| Component | Router | EN Track | ZH Track |
|---|---|---|---|
| STT (Speech→Text) | AssemblyAI Universal (bilingual detection) | Deepgram nova-2 en |
Deepgram nova-2 zh |
| LLM | GPT-4o | GPT-4o | GPT-4o |
| TTS (Text→Speech) | ElevenLabs eleven_multilingual_v2 | ElevenLabs eleven_multilingual_v2 | Azure zh-CN-XiaoxiaoNeural |
| Latency (startSpeakingPlan) | 0.6s (aggressive) | 0.6s | 1.0s (Chinese slower) |
| Interruption tolerance | 2 words | 2 words | 3 words (char-based) |
3.2 Language Detection Logic¶
File: router-v3.md
**Language:** Default is ENGLISH. Route to CHINESE only if the caller says:
- "Mandarin", "Chinese", "speak Chinese", "speak Mandarin", "中文"
- If caller's words don't make sense in English (garbled), ask:
"Would you like English or Mandarin? 英文还是中文?"
Why keyword-based (not STT)? AssemblyAI in Vapi is English-only. Mandarin speech gets force-transcribed as gibberish ("Please speak, man. Darin." instead of "你好"). Router must detect language via caller explicitly requesting Mandarin.
3.3 Timezone Handling¶
Both EN/ZH agents use Liquid template with clinic timezone:
# EN
{{now | date: "%B %d, %Y %I:%M %p", "America/Vancouver"}}
# Outputs: "February 17, 2026 09:30 AM"
# ZH
{{now | date: "%Y年%m月%d日 %H:%M", "America/Vancouver"}}
# Outputs: "2026年02月17日 09:30"
Hardcoded limitation: Clinic timezone is hardcoded as 'America/Vancouver' in admin-dashboard OscarSoapAdapter.ts. Multi-clinic setups will need clinic-aware timezone config.
3.4 Known ZH Prompt Issues¶
From documentation:
- GPT-4o space-separated characters: "我想预约" becomes "我 想 预 约" in some outputs — being monitored
- Chinese name formatting: No romanization/pinyin in prompts; names collected as-is
- Date format: Uses ISO YYYY-MM-DD internally; prompts format as "2月17日"
4. TOOL DEFINITIONS & SERVER FUNCTION MAPPINGS¶
4.1 Tool Inventory (14 Tools)¶
File: /home/ubuntu/vitara-platform/vapi-gitops/resources/tools/*.yml
All tools point to https://api-dev.vitaravox.ca/api/vapi/* with credential ID 02698381-2c38-494d-858e-f8c679ab803a.
| Tool | LLM Function Name | Request-Start Message | Timeout | Used By |
|---|---|---|---|---|
| search_patient_by_phone | search_patient_by_phone(phone) |
"Let me pull up your file." | 20s | Patient-ID EN/ZH |
| search_patient | search_patient(name, firstName?) |
"" (silent) | 20s | Patient-ID EN/ZH |
| get_clinic_info | get_clinic_info() |
"" (silent) | - | Router, Patient-ID |
| get_providers | get_providers(specialty?) |
"" (silent) | - | Booking, Modification |
| find_earliest_appointment | find_earliest_appointment(startDate?, endDate?, timeOfDay?, providerId?, providerName?, excludeDates?) |
"Let me check what's available." | - | Booking, Modification |
| check_appointments | check_appointments(startDate, endDate, demographicId?, providerId?, findAvailable?) |
"Let me look that up." | - | Booking, Modification |
| create_appointment | create_appointment(demographicId, providerId, startTime, appointmentType, reason, language, isVirtual?) |
"" (silent) | - | Booking EN/ZH |
| update_appointment | update_appointment(appointmentId, newStartTime, newProviderId?, demographicId?) |
"" (silent) | - | Modification EN/ZH |
| cancel_appointment | cancel_appointment(appointmentId, reason?) |
"" (silent) | - | Modification EN/ZH |
| register_new_patient | register_new_patient(firstName, lastName, dateOfBirth, gender, phone, address, city, postalCode, healthCardType, language, email?, province?, healthCardNumber?) |
"" (silent) | - | Registration EN/ZH |
| add_to_waitlist | add_to_waitlist(firstName, lastName, phone, notes?) |
"" (silent) | - | Registration EN/ZH |
| log_call_metadata | log_call_metadata(language, callOutcome, demographicId?, appointmentId?) |
"" (silent) | - | Booking, Modification, Registration |
| transfer_call | transfer_call(reason, notes?) |
"" (silent) | - | Router, Patient-ID, all tracks |
| get_patient | get_patient(demographicId) |
"" (silent) | - | (defined but unused in squad) |
4.2 Critical Server-Side Logic¶
File: /home/ubuntu/vitara-platform/admin-dashboard/server/src/routes/vapi-webhook.ts
4.2.1 Caller Phone Auto-Extraction¶
// LLM sends "0000000000" as placeholder in search_patient_by_phone
// Server extracts REAL phone from Vapi metadata:
const callerPhone = call.customer.number; // E.164: "+12367770690"
// Then normalizes: strip +1, use 10-digit only: "2367770690"
Design rationale: LLM doesn't have access to caller's real number. It must pass a placeholder. The server catches all search_patient_by_phone calls and substitutes the real number extracted from call.customer.number (Telnyx metadata).
4.2.2 Past-Date Clamping¶
Why: GPT-4o sometimes hallucinates past dates. Server-side guard ensures no appointments in the past are booked.
4.2.3 Provider Name → ID Resolution¶
// LLM may send: providerName = "Dr. Chen"
// Server fuzzy-matches against clinic's provider list
// Only treats as specific provider if providerId is purely numeric: /^\d+$/ test
// Fallback: if providerName sent but no ID, search_patient_by_phone result has OSCAR provider IDs
4.2.4 Non-Numeric Provider Handling¶
// If LLM sends "any" or Mandarin "任何" for providerId, regex /^\d+$/ returns false
// Server treats as "search all providers" (undefined providerId)
4.2.5 Slot Collision Check¶
// Before create_appointment:
// 1. search existing appointments in the slot window
// 2. Check if [startTime, endTime] overlaps any existing appointment
// 3. Return error if collision detected
// 4. LLM reruns find_earliest_appointment to get next slot
4.3 Tool Result Schema Examples¶
search_patient_by_phone response:
{
"found": true,
"id": 12345,
"firstName": "John",
"lastName": "Doe",
"dateOfBirth": "1990-01-15",
"phone": "2367770690"
}
find_earliest_appointment response:
{
"slotId": "abc123",
"date": "2026-02-20",
"day": "Thursday",
"startTime": "2026-02-20T14:00:00",
"endTime": "2026-02-20T14:30:00",
"providerId": "100",
"providerName": "Dr. Chen",
"clinicName": "Vitara"
}
5. ERROR RECOVERY & CONVERSATION FLOWS¶
5.1 Defensive Prompt Patterns¶
All 9 agents include these defensive sections:
EMERGENCY Detection (All Agents)¶
If the caller mentions ANY of these: "chest pain", "cannot breathe", "difficulty breathing",
"heart attack", "stroke", "seizure", "unconscious", "severe bleeding", "choking", "emergency",
"overdose", "suicidal" [+ ZH equivalents]:
Respond: "This sounds like a medical emergency. Please hang up and call 911 immediately."
End the call immediately. Do NOT continue.
Hardcoded keywords (not LLM-inferred). Triggers immediate escalation.
WRONG INTENT REDIRECT (Agent-Specific)¶
# Booking-EN
If patient says "reschedule", "cancel", or anything NOT about booking NEW appointment:
Say "Of course" and call `handoff_to_modification_en`
# Modification-EN
If patient says "book a new appointment" (not reschedule):
Say "Of course" and call `handoff_to_booking_en`
Prevents wasted turns — immediately redirects off-topic requests.
3-Attempt Fallback (All Agents)¶
After 3 unclear attempts → "Let me connect you with our staff."
Call transfer_call with reason "out_of_scope"
No infinite loop — ensures eventual escalation to human if LLM can't understand.
5.2 Booking Flow (Tested & Working)¶
Router: "Hi there, thanks for calling!"
Router: [get_clinic_info] → "Welcome to [clinic]! How can I help?"
User: "I'd like to book an appointment."
Router: → handoff_to_patient_id_en
Patient-ID: "One moment while I look you up" + [search_patient_by_phone]
Patient-ID: "I have John Doe on file — is that you?"
User: "Yes."
Patient-ID: "I'll get you set up." → handoff_to_booking_en
Booking: "Let me find you an appointment" + [find_earliest_appointment]
Booking: "I have Thursday, Feb 20 at 2:00 PM with Dr. Chen. Does that work?"
User: "Yes." OR "No, I want March." [→ find_earliest_appointment with filters]
Booking: "What is this visit for?"
User: "General consultation."
Booking: [create_appointment] → "All set! Thursday, Feb 20 at 2:00 PM..."
Booking: [log_call_metadata callOutcome="booked"] → "Take care!"
5.3 Reschedule Flow (Tested, Fixed)¶
Patient-ID: → handoff_to_modification_en
Modification: "Let me pull up your appointments" + [check_appointments startDate=today, endDate=6mo]
Modification: Lists first 3 appointments
User: "The second one"
Modification: "Would you like to reschedule or cancel?"
User: "Reschedule"
Modification: "When would work better?"
User: "Next week"
Modification: [find_earliest_appointment startDate=next-week] → "How about Thursday, Feb 27 at 10 AM?"
User: "Yes"
Modification: [update_appointment appointmentId=X, newStartTime=2026-02-27T10:00:00]
Modification: "Done! Moved to Thursday, Feb 27 at 10 AM..."
Modification: [log_call_metadata callOutcome="rescheduled"]
5.4 Error Recovery Examples¶
search_patient_by_phone fails¶
Patient-ID: "I'm having trouble looking up your information. Could you tell me your name?"
[switch to search_patient tool]
No available slots¶
Booking: "Nothing in that range. Would you like to try a different week or different doctor?"
[find_earliest_appointment with adjusted filters]
Slot collision (just taken)¶
Booking: "That slot was just taken. Let me find the next available."
[find_earliest_appointment with excludeDates: [previousSlotDate]]
6. LATENCY ARCHITECTURE¶
6.1 Tool Message Strategy (request-start)¶
Design Philosophy: Filler speech covers tool latency by speaking simultaneously.
messages:
- type: request-start
blocking: false # Allow speech to start before tool completes
content: "Let me check that for you."
- type: request-response-delayed
timingMilliseconds: 5000
content: "Still looking that up." # If tool takes >5s
- type: request-failed
content: "I'm sorry, I wasn't able to check that."
Current config: Most tools use empty request-start (content: ""), relying on prompt-level instruction to generate filler. A few audible tools are:
- search_patient_by_phone: "Let me pull up your file."
- find_earliest_appointment: "Let me check what's available."
- check_appointments: "Let me look that up."
Recommended improvement: Add request-response-delayed with 4000ms to slow tools (find_earliest_appointment, create_appointment, register_new_patient) to prevent dead air if backend is slow.
6.2 startSpeakingPlan (Endpointing)¶
# Router (fast, high-latency STT)
startSpeakingPlan:
waitSeconds: 0.6
transcriptionEndpointingPlan:
onPunctuationSeconds: 0.3
onNoPunctuationSeconds: 0.8
onNumberSeconds: 0.5
# EN agents (Deepgram nova-2 en)
waitSeconds: 0.6
onPunctuationSeconds: 0.3
onNoPunctuationSeconds: 0.8
# ZH agents (slower Chinese processing)
waitSeconds: 1.0
onPunctuationSeconds: 0.6
onNoPunctuationSeconds: 1.5
onNumberSeconds: 0.8
Rationale: Chinese takes longer to process (more ambiguous, character-based). Longer wait times prevent premature response generation.
6.3 stopSpeakingPlan (Interruption)¶
# EN agents
stopSpeakingPlan:
numWords: 2 # Agent stops speaking after user says 2 words
# ZH agents
stopSpeakingPlan:
numWords: 3 # Slightly more tolerant (3 words ≈ 1 sentence in Chinese)
6.4 Circuit Breaker Timeouts¶
File: admin-dashboard server SOAP adapter
const CIRCUIT_BREAKER_TIMEOUT = 4000; // 4 seconds
// Must complete within Vapi's 5-second tool timeout window
// Leaves 1s buffer for JSON serialization + network
This fixed server-side delays that were exceeding Vapi's default timeout and returning "tool execution failed" to the LLM.
7. PHI EXPOSURE RISK ANALYSIS¶
7.1 PHI in Prompts¶
Minimal risk. Prompts are templates; no real PHI hardcoded. However:
-
Server extracts PHI from call metadata:
-
LLM can see patient name after lookup:
7.2 Logging & Redaction¶
File: vapi-webhook.ts, lines 193-198
const PHI_KEYS = new Set([
'name', 'firstName', 'lastName', 'dateOfBirth', 'phone', 'email',
'healthCardNumber', 'healthCardProvince', 'callerPhone', 'address',
'city', 'postalCode', 'patient', 'transcript', 'summary',
]);
Debug mode vs. production:
- Debug mode (debugManager.isActive()): Full PHI logged with [PHI-DEBUG] prefix
- Production: PHI redacted from logs (keys listed but values obscured)
7.3 Recording & Compliance¶
From routing defaults:
# Registration-EN opening
"Welcome! I'll help you register. This takes a few minutes.
Just so you know, this call is recorded for quality and scheduling purposes.
By continuing, you consent to the recording."
HIPAA considerations: - Calls are recorded by Vapi (recordingUrl in end-of-call-report) - Transcripts stored in Vapi call history - No explicit HIPAA encryption/audit trail configuration in assistants - Recommendation: Consult with legal for Canadian healthcare compliance (PHIPA, PIPA)
8. CONVERSATION STATE MANAGEMENT ACROSS HANDOFFS¶
8.1 Context Passing Strategy¶
Default: Full history
# In squad YAML, handoff destinations don't specify contextMode
# Vapi default: contextMode: "all" — full message history passed to next agent
Example flow:
Router conversation:
User: "Hi, I want to book"
Router: [get_clinic_info] → "Welcome to Vitara!"
Router: → calls handoff_to_patient_id_en
Patient-ID-EN receives:
- All Router messages (including clinic info)
- LLM can reference: "The clinic I just greeted them from was..."
- Patient lookup happens fresh: [search_patient_by_phone]
8.2 Patient Context Retention¶
Booking agent accesses Patient-ID lookup result:
Patient-ID-EN returns: {id: 12345, firstName: "John", lastName: "Doe"}
This is in the message history. Booking-EN reads it:
from conversation history: "Patient's demographicId is 12345"
Then Booking calls: create_appointment(demographicId=12345, ...)
No explicit state mechanism — context is implicit in conversation history. Works because: 1. LLM reads full history (maxTokens sufficient) 2. Prompts instruct: "Patient is ALREADY identified. Their demographicId is the 'id' field from search_patient_by_phone result in conversation history."
8.3 Potential Issue: Long Conversations¶
From prompt engineering report:
"As conversations get long (e.g., patient asks many questions, tries multiple slots),
the context grows and GPT-4o instruction following degrades."
Mitigation (not yet applied):
# Could add to handoff destinations:
contextMode: "lastNMessages"
lastNMessages: 20 # Only pass last 20 messages, drop early history
Current status: Using default (all history). No production incidents reported yet.
9. FALLBACK & ESCALATION PATHS¶
9.1 Escalation Triggers¶
- Emergency keywords detected → End call + direct to 911
- 3 unclear attempts →
transfer_call(reason: "out_of_scope") - Explicit API error →
transfer_call(reason: "registration_error") - Record not found →
transfer_call(reason: "record_not_found") - Medical questions (out of scope) →
transfer_call(reason: "medical_question") - Patient explicitly requests human →
transfer_call(reason: "patient_request")
9.2 transfer_call Tool¶
Endpoint: https://api-dev.vitaravox.ca/api/vapi/transfer-call
function:
name: transfer_call
description: Transfer call to clinic staff
parameters:
reason: [patient_request, frustrated, medical_question, billing,
registration_error, record_not_found, out_of_scope]
notes: (optional) Context for staff
Server-side behavior (inferred): 1. Logs transfer reason + notes to database 2. Initiates SIP REFER or bridges to clinic phone number 3. Returns status to Vapi (call transferred or failed)
9.3 New Patient Registration Rejection¶
# If clinic not accepting new patients (get_clinic_info.acceptingNewPatients=false)
Registration-EN: "Sorry, we're not accepting new patients right now.
Would you like to join our waitlist?"
YES → add_to_waitlist(firstName, lastName, phone)
log_call_metadata(callOutcome="waitlisted")
"We'll call you when a spot opens up."
NO → "Take care!"
10. CALL RECORDING & LOGGING PRACTICES¶
10.1 Vapi End-of-Call Report Webhook¶
interface VapiWebhookMessage {
type: 'end-of-call-report';
summary?: string; // AI-generated summary
transcript?: string; // Full text transcript
recordingUrl?: string; // HTTPS URL to call recording
durationSeconds?: number;
cost?: number; // USD cost
endedReason?: string; // why call ended
metadata?: Record<string, unknown>;
}
Processing:
// vapi-webhook.ts: On end-of-call-report, server:
// 1. Saves transcript + summary to database
// 2. Stores recordingUrl (can be downloaded for archival)
// 3. Logs call duration + cost
// 4. Triggers any post-call analysis (if configured)
10.2 Call Metadata Logging¶
Agents call log_call_metadata at call conclusion:
Outcomes tracked: - booked, rescheduled, cancelled - registered, waitlisted - transferred, no_action, clinic_info - out_of_scope, record_not_found, registration_error
10.3 Debug Mode¶
File: vapi-webhook.ts
function logWebhook(action: string, data: unknown) {
if (debugManager.isActive()) {
logger.info({ webhook: data, _debugMode: true },
`[PHI-DEBUG][VAPI WEBHOOK] ${action}`);
} else {
logger.info({ webhook: data }, `[VAPI WEBHOOK] ${action}`);
}
}
Production: PHI redacted (keys visible, values obscured) Debug mode: Full PHI logged (for internal testing only)
11. VAPI GITOPS INFRASTRUCTURE¶
11.1 GitOps Engine Architecture¶
File: /home/ubuntu/vitara-platform/vapi-gitops/
src/
├── pull.ts # Download platform state, preserve local changes
├── push.ts # Upload local YAML/MD to Vapi API
├── apply.ts # Orchestrator: pull → merge → push
├── call.ts # WebSocket call testing
├── types.ts # TypeScript interfaces
├── config.ts # Environment & config
├── api.ts # Vapi HTTP client
├── state.ts # State file (.vapi-state.*.json)
├── resources.ts # Load YAML/MD files
├── resolver.ts # Resolve resource IDs → Vapi UUIDs
└── delete.ts # Deletion & orphan checks
resources/
├── assistants/ # 9 agents (.md files with YAML frontmatter)
├── tools/ # 14 function tools (.yml)
├── structuredOutputs/
├── squads/ # 1 squad (vitaravox-v3.yml)
└── simulations/ # (empty for v3.0)
11.2 Markdown + YAML Frontmatter Format¶
Example: resources/assistants/router-v3.md
---
name: vitara-router-v3
model:
model: gpt-4o
provider: openai
temperature: 0.3
maxTokens: 400
transcriber:
provider: assembly-ai
voice:
provider: 11labs
voiceId: fQj4gJSexpu8RDE2Ii5m
---
# Markdown system prompt starts here
## IDENTITY
You are a bilingual front-desk assistant...
Parsing: GitOps engine: 1. Extracts YAML frontmatter → Vapi assistant config 2. Converts markdown body → system prompt string (sent to LLM as-is)
11.3 Reference Resolution¶
Local filenames resolve to Vapi UUIDs:
# In assistant file:
toolIds:
- search-patient-by-phone-8474536c # filename without .yml
# Engine looks up in .vapi-state.dev.json:
{
"tools": {
"search-patient-by-phone-8474536c": "8474536c-663f-4a94-91ae-19e6221f9af9"
}
}
# Sends to API as:
{
"toolIds": ["8474536c-663f-4a94-91ae-19e6221f9af9"]
}
11.4 Squad Handoff Resolution¶
# Squad YAML:
- assistantId: router-v3 # Resolves to 4f70e214...
assistantOverrides:
tools:append:
- destinations:
- assistantName: vitara-patient-id-en-v3 # Matches assistant name field
# Engine resolves:
# 1. assistantId → UUID (state file)
# 2. assistantName → UUID by looking up assistant by name
# 3. Validates handoff destination exists in squad members
11.5 State File¶
.vapi-state.dev.json (checked into git):
{
"assistants": {
"router-v3": "4f70e214-6111-4f53-86c9-48f8f7c265e1",
"booking-en": "ac25775b-c1cc-41ae-8899-810d4ae62efd",
...
},
"tools": {
"search-patient-by-phone-8474536c": "8474536c-663f-4a94-91ae-19e6221f9af9",
...
},
"squads": {
"vitaravox-v3": "13fdfd19-a2cd-4ca4-8e14-ad2275095e32"
}
}
Purpose: Maps friendly names to Vapi UUIDs (immutable after creation).
11.6 Commands¶
npm run pull:dev # Download state from Vapi
npm run push:dev # Upload local files to Vapi
npm run apply:dev # pull → merge → push
npm run push:dev assistants # Push only assistants
npm run push:dev resources/assistants/router-v3.md # Push single file
npm run call:dev -- -a router-v3 # Test assistant via WebSocket
npm run build # Type-check
Dependency order (push): 1. Tools → 2. Structured Outputs → 3. Assistants → 4. Squads
12. DOCUMENTED ISSUES & LESSONS LEARNED¶
12.1 v2.3.0 Issues (Fixed in v3.0)¶
- transfer_call tool missing from squad — Fixed in v3.0 by adding to all prompts
- Router LLM hallucinating phone numbers — Fixed: server extracts from
call.customer.number - firstMessage causing silence — Fixed: Patient-ID agents removed static firstMessage
- Silent handoffs were too loud — Fixed: Added
content: ""to handoff messages
12.2 v3.0 P0 Fixes (Applied 2026-02-15)¶
- Router maxTokens 150→400 — Prevented GPT-4o prompt truncation
- Patient-ID prompt rewrite — Removed rigid scripting
- Defensive tool-result instruction — All prompts now wait for actual tool response
- Circuit breaker 10s→4s — Tuned for Vapi's 5s timeout
- All prompts clinic-agnostic — Replaced hardcoded clinic name with tool result
12.3 Known Limitations¶
- Clinic timezone hardcoded —
OscarSoapAdapter.tsline:const tz = 'America/Vancouver' -
Solution: Make this clinic-configurable
-
Language detection keyword-based — Not STT-based
- Why: AssemblyAI router is English-only
-
Limitation: Mandarin callers must explicitly request 中文
-
No mid-conversation language switching — Once routed to EN/ZH, stays there
-
Improvement: Implement LLM proxy with language detection (documented in V3-MULTILINGUAL-ARCHITECTURE.md, not yet implemented)
-
Chinese name formatting — No romanization/pinyin in prompts
-
Risk: Name mismatches if caller names have ambiguous spelling
-
No patient data pre-population — Patient must spell name if phone lookup fails
- Improvement: Could use customer name from Vapi metadata if available
12.4 Lessons Learned (From Documentation)¶
- Vapi PATCH timeout is ~15s default — Increase to 30-45s for large squad updates
- Always use
tools:appendin squad handoffs — Don't replace existing tools - Silent transfers = "NEVER mention transferring" — Explicit prompt instruction required
- Past dates must be clamped server-side — LLM not reliable on current date
- First-turn tool call MUST happen — Explicit in prompt: "FIRST RESPONSE MUST CALL"
- Tool request-start messages prevent dead air — Use
blocking: falsefor latency coverage - Chinese requires longer startSpeakingPlan — 1.0s vs. EN's 0.6s
- Clinic-agnostic prompts scale better — Use get_clinic_info tool, not hardcoded names
13. DEPLOYMENT & TESTING STATUS¶
13.1 Current Deployment¶
Live phone: +1 236-305-7446 Squad ID: 13fdfd19-a2cd-4ca4-8e14-ad2275095e32 Agents: 9/9 deployed Tools: 14/14 deployed Git status: All changes committed to GitOps repo
13.2 Testing Completeness¶
From V3-ARCHITECTURE-SNAPSHOT.md:
- ✅ English booking flow — TESTED, WORKING
- ✅ English reschedule flow — TESTED, WORKING after server fix
- ✅ Chinese booking flow — TESTED, WORKING (with TTS limitations on English names)
- ⏳ Chinese reschedule — Not explicitly mentioned as tested
- ⏳ Cross-track language switching — Not implemented
- ✅ Emergency keywords — Hardcoded, not requiring testing
- ✅ Registration flow — Mentioned in prompts, likely tested
- ✅ Transfer escalation — Tool exists, backend logic assumed working
13.3 Pre-Launch Checks (9 Total, All Complete)¶
From memory notes (2026-02-16):
- ✅ Router language detection logic
- ✅ Patient-ID phone search + name fallback
- ✅ Booking slot finding + creation
- ✅ Modification reschedule + cancel
- ✅ Registration data collection + validation
- ✅ Error recovery (3-attempt fallback)
- ✅ Emergency keyword detection
- ✅ Call metadata logging
- ✅ Schedule data flow (informational, non-blocking)
14. SPECIFIC FINDINGS WITH FILE REFERENCES¶
File-by-File Inventory¶
| File | Lines | Key Content | Status |
|---|---|---|---|
/home/ubuntu/vitara-platform/vapi-gitops/resources/squads/vitaravox-v3.yml |
268 | Squad topology, 9 members, handoff definitions | Production |
/home/ubuntu/vitara-platform/vapi-gitops/resources/assistants/router-v3.md |
92 | Router agent, bilingual detection, emergency handling | Production |
/home/ubuntu/vitara-platform/vapi-gitops/resources/assistants/patient-id-en.md |
118 | Patient ID EN, phone search, intent detection | Production |
/home/ubuntu/vitara-platform/vapi-gitops/resources/assistants/patient-id-zh.md |
113 | Patient ID ZH, parallel structure, Chinese grammar | Production |
/home/ubuntu/vitara-platform/vapi-gitops/resources/assistants/booking-en.md |
115 | Booking EN, find slots, create appointment | Production |
/home/ubuntu/vitara-platform/vapi-gitops/resources/assistants/booking-zh.md |
112 | Booking ZH, Chinese-specific date formatting | Production |
/home/ubuntu/vitara-platform/vapi-gitops/resources/assistants/modification-en.md |
117 | Modification EN, reschedule + cancel + check | Production |
/home/ubuntu/vitara-platform/vapi-gitops/resources/assistants/modification-zh.md |
114 | Modification ZH, same functionality in Chinese | Production |
/home/ubuntu/vitara-platform/vapi-gitops/resources/assistants/registration-en.md |
119 | Registration EN, PHI collection, spelling rules | Production |
/home/ubuntu/vitara-platform/vapi-gitops/resources/assistants/registration-zh.md |
116 | Registration ZH, pinyin spelling guidance | Production |
/home/ubuntu/vitara-platform/vapi-gitops/resources/tools/*.yml |
14 files | All tool definitions, server endpoints, parameters | Production |
/home/ubuntu/vitara-platform/admin-dashboard/server/src/routes/vapi-webhook.ts |
600+ | Webhook handler, PHI redaction, server-side logic | Production |
/home/ubuntu/vitara-platform/docs/V3-ARCHITECTURE-SNAPSHOT.md |
150 lines | Deployment diagram, agent inventory, flow examples | Reference |
/home/ubuntu/vitara-platform/docs/V3-TOOL-INVENTORY.md |
200+ lines | Tool specs, parameter schemas, server integration | Reference |
/home/ubuntu/vitara-platform/docs/VAPI-PROMPT-ENGINEERING-REPORT.md |
980 lines | Best practices, GitOps patterns, recommendations | Reference |
/home/ubuntu/vitara-platform/vapi-gitops/.vapi-state.dev.json |
37 lines | UUID mappings for all 9 agents, 14 tools, 1 squad | Deployment |
CRITICAL RECOMMENDATIONS¶
High Priority¶
- Multi-clinic timezone support — Parameterize clinic timezone in admin-dashboard OscarSoapAdapter
- Language detection enhancement — Implement LLM proxy for mid-conversation language switching (documented but not deployed)
- Add request-response-delayed messages — Cover slow API calls (find_earliest, create_appointment)
- HIPAA/PHIPA audit trail — Add legal review for Canadian healthcare compliance
Medium Priority¶
- Monitor Chinese TTS on English names — Watch for space-separated character issues post-launch
- Implement conversation context limiting — Add
lastNMessages: 20to handoffs for very long calls - Extend phone cache TTL logic — Current 1-hour TTL may cause stale clinic resolution
- Test edge cases — Out-of-province health cards, waitlist behavior, slot collision scenarios
Low Priority¶
- Add Liquid conditionals — Support text/voice mode switching (preparatory for future chat)
- Custom variables for multi-tenancy — Replace hardcoded clinic names with
{{clinicName}} - Romanization for Chinese names — Support pinyin input if caller struggles with spelling
CONCLUSION¶
VitaraVox v3.0 represents production-grade voice agent architecture with strong prompt engineering, defensive error recovery, and comprehensive server-side validation. The dual-track multilingual design properly isolates EN/ZH processing while maintaining shared booking/EMR logic. Vapi GitOps enables version-controlled, auditable agent configuration—a best practice for voice systems.
Key strengths: Clinic-agnostic prompts, PHI redaction, 3-attempt fallback, emergency keyword detection, timezone awareness, state management across handoffs.
Key gaps: Hardcoded clinic timezone, keyword-only language detection, no mid-conversation language switching, minimal HIPAA audit trail.
The system is live and functional. The team has systematically applied fixes to maxTokens tuning, defensive tool-result instructions, and prompt rewrites—evidence of mature deployment practices. Recommended next steps focus on operational hardening (timezone, HIPAA compliance) and conversation quality (language detection, latency optimization).