Voice Architecture Analysis¶

VitaraVox Enterprise Readiness Analysis¶

Date: February 17, 2026¶

Agent: Voice Architecture & Telephony Analyst¶

COMPREHENSIVE ANALYSIS: VITARAVOX v3.0 VAPI GITOPS ARCHITECTURE¶

EXECUTIVE SUMMARY¶

VitaraVox v3.0 is a production-deployed, multilingual voice agent system managing 9 Vapi assistants (dual-track EN/ZH) coordinated via a squad with 14 tools connecting to the OSCAR EMR backend. The infrastructure is managed via Vapi GitOps — a declarative, version-controlled configuration system with official TypeScript engine. All core functionality is deployed and tested; the system is live on phone number +1 236-305-7446.

1. SQUAD TOPOLOGY & HANDOFF PATTERNS¶

1.1 Squad Architecture (9 Members)¶

File: /home/ubuntu/vitara-platform/vapi-gitops/resources/squads/vitaravox-v3.yml

Squad ID: 13fdfd19-a2cd-4ca4-8e14-ad2275095e32 (deployed to Vapi)

Entry Point:
├─ ROUTER (4f70e214) — Language detection + emergency handling

English Track (4 agents):
├─ Patient-ID-EN (7d054785) — Phone/name lookup + intent routing
├─ Booking-EN (ac25775b) — Find slots & create appointments
├─ Modification-EN (9cd8381d) — Reschedule/cancel/check
└─ Registration-EN (9fcfd00d) — New patient registration

Chinese Track (4 agents):
├─ Patient-ID-ZH (7585c092)
├─ Booking-ZH (6ef04a40)
├─ Modification-ZH (e348cd2f)
└─ Registration-ZH (ce50df43)

1.2 Handoff Pattern Design¶

All 8 non-Router assistants have assistantOverrides.tools:append with silent handoff tools defined in the squad YAML:

type: handoff
function:
  name: handoff_to_booking_en
  description: "Route to English booking when patient wants to book"
destinations:
  - assistantName: vitara-booking-en-v3
    description: "English appointment booking"
    type: assistant
messages:
  - content: ""           # CRITICAL: Empty content = invisible handoff
    type: request-start

Key Design Decision: Handoff destinations use assistantName (the name field from YAML frontmatter), NOT assistantId. The GitOps engine resolves the name to the actual UUID at push time. This allows decoupling prompt changes from UUID management.

1.3 Handoff Flow Example (Booking Path)¶

Router (greeting + language detect via get_clinic_info)
  └─ call handoff_to_patient_id_en [silent, empty message]
    └─ Patient-ID-EN (search_patient_by_phone + confirm identity)
      └─ call handoff_to_booking_en [silent]
        └─ Booking-EN (find_earliest_appointment + create_appointment)
          └─ Optional: handoff_to_modification_en [if reschedule request]
          └─ Optional: handoff_to_router_v3 [if unrelated request]

No cross-track handoffs implemented — once in EN/ZH track, conversation stays in that language track. Router is the only multilingual agent (uses AssemblyAI Universal STT).

2. PROMPT ENGINEERING QUALITY¶

2.1 System Prompt Structure (Example: Router)¶

File: /home/ubuntu/vitara-platform/vapi-gitops/resources/assistants/router-v3.md

---
name: vitara-router-v3
model:
  model: gpt-4o
  provider: openai
  temperature: 0.3
  maxTokens: 400        # CRITICAL: Increased 150→400 to prevent truncation
  toolIds:
    - get-clinic-info-aaec50cf
    - transfer-call-d95ed81e
    - log-call-metadata-4619b3cb
transcriber:
  provider: assembly-ai
voice:
  provider: 11labs
  voiceId: fQj4gJSexpu8RDE2Ii5m
  model: eleven_multilingual_v2
---

## IDENTITY
## CRITICAL: Current date/time is {{now | date: "%B %d, %Y %I:%M %p", "America/Vancouver"}}...

You are a bilingual front-desk scheduling assistant.

## EMERGENCY [hardcoded escalation keywords in EN + ZH]

## INVISIBLE HANDOFFS
When routing the caller, make it sound like a natural conversation. Say "Sure!" then call handoff tool.

## FLOW
### Step 1: Call get_clinic_info (FIRST TURN — MANDATORY)
In your very first response, call get_clinic_info. This gives clinic name for greeting.

### Step 2: Greet with clinic name + route
Once get_clinic_info returns, say ONE warm line: "Welcome to [clinicName]!"
Then call appropriate handoff tool (handoff_to_patient_id_en or handoff_to_patient_id_zh)

Quality Observations:

Defensive tool-result instruction: "WAIT for actual tool result before speaking about X" — explicitly prevents LLM from hallucinating tool outcomes
Single-turn tool + speech: "Call tool in your first response" — ensures filler speech covers tool latency
Language detection logic: Keyword-based (NOT STT-based) — caller must say "Mandarin", "Chinese", "中文" to trigger ZH track
maxTokens tuning: 400 tokens for Router (was 150) to prevent GPT-4o prompt truncation on complex tool calls
Timezone-aware templates: Uses Liquid {{now | date: format, timezone}} with America/Vancouver

2.2 Patient-ID-EN Prompt Strengths¶

File: /home/ubuntu/vitara-platform/vapi-gitops/resources/assistants/patient-id-en.md

### Step 1: Look up the patient and analyze intent (FIRST TURN)

IMMEDIATELY call `search_patient_by_phone` with phone "0000000000" — this must be in your 
very first response, no exceptions. The system uses the real caller number automatically.

Say "One moment while I look you up" alongside the tool call.

**Intent detection** — check what the caller said:
- "book", "appointment" → intent = BOOK
- "reschedule", "change my appointment" → intent = RESCHEDULE
- etc.

### Step 2: Confirm patient identity
CRITICAL: WAIT for the actual `search_patient_by_phone` result before speaking. 
Read the `found` field from the ACTUAL tool response.

Strengths: - Explicit defensive pattern: "WAIT for actual tool response" - Server-side phone number handling: "0000000000" is placeholder; server extracts real phone from Vapi metadata - Multi-level confirmation: Confirm identity on "yes", offer search by name on "no" - Error-aware: Fallback to manual search if phone lookup fails

2.3 Registration Prompt (EN/ZH) — Name Collection¶

File: /home/ubuntu/vitara-platform/vapi-gitops/resources/assistants/registration-en.md

Critical spelling rule:

1. **Full name** — "What is your full legal name?"
   - If unclear, say: "Could you spell it? A as in Apple, B as in Bravo..."
   - **IMPORTANT: While the caller is spelling, stay COMPLETELY SILENT. 
     Do NOT speak or acknowledge individual letters. Wait until the caller 
     clearly finishes or pauses for several seconds before responding.**
   - After receiving spelling, repeat FULL name back once and ask "Is that correct?"

This is a PHI-handling best practice — prevents accidental misspellings on health records and demonstrates appropriate silence during spelling, which is conversationally natural.

2.4 Documented P0 Fixes (2026-02-15)¶

From memory notes:

Router maxTokens 150→400 — Fixed GPT-4o silent truncation on tool-call JSON
Router prompt rewritten — Replaced rigid "Say EXACTLY" scripting with warm acknowledgment
Patient-ID EN/ZH steps merged — Consolidated first-turn tool call + intent analysis
Defensive tool-result instruction — Added across all prompts
transferAssistant → handoff_to_X — Fixed function names to match squad YAML
Circuit breaker 10s→4s — SOAP phone search timeout tuned for Vapi 5s window
All prompts clinic-agnostic — Removed "Vitara" branding (replaced with clinic_info tool result)
All 9 prompts pushed to Vapi API — Verified 9/9 success

3. MULTILINGUAL DESIGN (EN/ZH)¶

3.1 STT/TTS Strategy¶

Component	Router	EN Track	ZH Track
STT (Speech→Text)	AssemblyAI Universal (bilingual detection)	Deepgram nova-2 `en`	Deepgram nova-2 `zh`
LLM	GPT-4o	GPT-4o	GPT-4o
TTS (Text→Speech)	ElevenLabs eleven_multilingual_v2	ElevenLabs eleven_multilingual_v2	Azure zh-CN-XiaoxiaoNeural
Latency (startSpeakingPlan)	0.6s (aggressive)	0.6s	1.0s (Chinese slower)
Interruption tolerance	2 words	2 words	3 words (char-based)

3.2 Language Detection Logic¶

File: router-v3.md

**Language:** Default is ENGLISH. Route to CHINESE only if the caller says:
- "Mandarin", "Chinese", "speak Chinese", "speak Mandarin", "中文"
- If caller's words don't make sense in English (garbled), ask: 
  "Would you like English or Mandarin? 英文还是中文?"

Why keyword-based (not STT)? AssemblyAI in Vapi is English-only. Mandarin speech gets force-transcribed as gibberish ("Please speak, man. Darin." instead of "你好"). Router must detect language via caller explicitly requesting Mandarin.

3.3 Timezone Handling¶

Both EN/ZH agents use Liquid template with clinic timezone:

# EN
{{now | date: "%B %d, %Y %I:%M %p", "America/Vancouver"}}
# Outputs: "February 17, 2026 09:30 AM"

# ZH
{{now | date: "%Y年%m月%d日 %H:%M", "America/Vancouver"}}
# Outputs: "2026年02月17日 09:30"

Hardcoded limitation: Clinic timezone is hardcoded as 'America/Vancouver' in admin-dashboard OscarSoapAdapter.ts. Multi-clinic setups will need clinic-aware timezone config.

3.4 Known ZH Prompt Issues¶

From documentation: - GPT-4o space-separated characters: "我想预约" becomes "我想预约" in some outputs — being monitored - Chinese name formatting: No romanization/pinyin in prompts; names collected as-is - Date format: Uses ISO YYYY-MM-DD internally; prompts format as "2月17日"

4. TOOL DEFINITIONS & SERVER FUNCTION MAPPINGS¶

4.1 Tool Inventory (14 Tools)¶

File: /home/ubuntu/vitara-platform/vapi-gitops/resources/tools/*.yml

All tools point to https://api-dev.vitaravox.ca/api/vapi/* with credential ID 02698381-2c38-494d-858e-f8c679ab803a.

Tool	LLM Function Name	Request-Start Message	Timeout	Used By
search_patient_by_phone	`search_patient_by_phone(phone)`	"Let me pull up your file."	20s	Patient-ID EN/ZH
search_patient	`search_patient(name, firstName?)`	"" (silent)	20s	Patient-ID EN/ZH
get_clinic_info	`get_clinic_info()`	"" (silent)	-	Router, Patient-ID
get_providers	`get_providers(specialty?)`	"" (silent)	-	Booking, Modification
find_earliest_appointment	`find_earliest_appointment(startDate?, endDate?, timeOfDay?, providerId?, providerName?, excludeDates?)`	"Let me check what's available."	-	Booking, Modification
check_appointments	`check_appointments(startDate, endDate, demographicId?, providerId?, findAvailable?)`	"Let me look that up."	-	Booking, Modification
create_appointment	`create_appointment(demographicId, providerId, startTime, appointmentType, reason, language, isVirtual?)`	"" (silent)	-	Booking EN/ZH
update_appointment	`update_appointment(appointmentId, newStartTime, newProviderId?, demographicId?)`	"" (silent)	-	Modification EN/ZH
cancel_appointment	`cancel_appointment(appointmentId, reason?)`	"" (silent)	-	Modification EN/ZH
register_new_patient	`register_new_patient(firstName, lastName, dateOfBirth, gender, phone, address, city, postalCode, healthCardType, language, email?, province?, healthCardNumber?)`	"" (silent)	-	Registration EN/ZH
add_to_waitlist	`add_to_waitlist(firstName, lastName, phone, notes?)`	"" (silent)	-	Registration EN/ZH
log_call_metadata	`log_call_metadata(language, callOutcome, demographicId?, appointmentId?)`	"" (silent)	-	Booking, Modification, Registration
transfer_call	`transfer_call(reason, notes?)`	"" (silent)	-	Router, Patient-ID, all tracks
get_patient	`get_patient(demographicId)`	"" (silent)	-	(defined but unused in squad)

4.2 Critical Server-Side Logic¶

File: /home/ubuntu/vitara-platform/admin-dashboard/server/src/routes/vapi-webhook.ts

4.2.1 Caller Phone Auto-Extraction¶

// LLM sends "0000000000" as placeholder in search_patient_by_phone
// Server extracts REAL phone from Vapi metadata:
const callerPhone = call.customer.number; // E.164: "+12367770690"
// Then normalizes: strip +1, use 10-digit only: "2367770690"

Design rationale: LLM doesn't have access to caller's real number. It must pass a placeholder. The server catches all search_patient_by_phone calls and substitutes the real number extracted from call.customer.number (Telnyx metadata).

4.2.2 Past-Date Clamping¶

if (startDate && new Date(startDate) < today) {
  startDate = todayISOString; // Clamp to today
}

Why: GPT-4o sometimes hallucinates past dates. Server-side guard ensures no appointments in the past are booked.

4.2.3 Provider Name → ID Resolution¶

// LLM may send: providerName = "Dr. Chen"
// Server fuzzy-matches against clinic's provider list
// Only treats as specific provider if providerId is purely numeric: /^\d+$/ test
// Fallback: if providerName sent but no ID, search_patient_by_phone result has OSCAR provider IDs

4.2.4 Non-Numeric Provider Handling¶

// If LLM sends "any" or Mandarin "任何" for providerId, regex /^\d+$/ returns false
// Server treats as "search all providers" (undefined providerId)

4.2.5 Slot Collision Check¶

// Before create_appointment:
// 1. search existing appointments in the slot window
// 2. Check if [startTime, endTime] overlaps any existing appointment
// 3. Return error if collision detected
// 4. LLM reruns find_earliest_appointment to get next slot

4.3 Tool Result Schema Examples¶

search_patient_by_phone response:

{
  "found": true,
  "id": 12345,
  "firstName": "John",
  "lastName": "Doe",
  "dateOfBirth": "1990-01-15",
  "phone": "2367770690"
}

find_earliest_appointment response:

{
  "slotId": "abc123",
  "date": "2026-02-20",
  "day": "Thursday",
  "startTime": "2026-02-20T14:00:00",
  "endTime": "2026-02-20T14:30:00",
  "providerId": "100",
  "providerName": "Dr. Chen",
  "clinicName": "Vitara"
}

5. ERROR RECOVERY & CONVERSATION FLOWS¶

5.1 Defensive Prompt Patterns¶

All 9 agents include these defensive sections:

EMERGENCY Detection (All Agents)¶

If the caller mentions ANY of these: "chest pain", "cannot breathe", "difficulty breathing", 
"heart attack", "stroke", "seizure", "unconscious", "severe bleeding", "choking", "emergency", 
"overdose", "suicidal" [+ ZH equivalents]:
Respond: "This sounds like a medical emergency. Please hang up and call 911 immediately."
End the call immediately. Do NOT continue.

Hardcoded keywords (not LLM-inferred). Triggers immediate escalation.

WRONG INTENT REDIRECT (Agent-Specific)¶

# Booking-EN
If patient says "reschedule", "cancel", or anything NOT about booking NEW appointment:
Say "Of course" and call `handoff_to_modification_en`

# Modification-EN  
If patient says "book a new appointment" (not reschedule):
Say "Of course" and call `handoff_to_booking_en`

Prevents wasted turns — immediately redirects off-topic requests.

3-Attempt Fallback (All Agents)¶

After 3 unclear attempts → "Let me connect you with our staff." 
Call transfer_call with reason "out_of_scope"

No infinite loop — ensures eventual escalation to human if LLM can't understand.

5.2 Booking Flow (Tested & Working)¶

Router: "Hi there, thanks for calling!"
Router: [get_clinic_info] → "Welcome to [clinic]! How can I help?"
User:   "I'd like to book an appointment."
Router: → handoff_to_patient_id_en

Patient-ID: "One moment while I look you up" + [search_patient_by_phone]
Patient-ID: "I have John Doe on file — is that you?"
User:       "Yes."
Patient-ID: "I'll get you set up." → handoff_to_booking_en

Booking:  "Let me find you an appointment" + [find_earliest_appointment]
Booking:  "I have Thursday, Feb 20 at 2:00 PM with Dr. Chen. Does that work?"
User:     "Yes." OR "No, I want March." [→ find_earliest_appointment with filters]
Booking:  "What is this visit for?"
User:     "General consultation."
Booking:  [create_appointment] → "All set! Thursday, Feb 20 at 2:00 PM..."
Booking:  [log_call_metadata callOutcome="booked"] → "Take care!"

5.3 Reschedule Flow (Tested, Fixed)¶

Patient-ID: → handoff_to_modification_en

Modification: "Let me pull up your appointments" + [check_appointments startDate=today, endDate=6mo]
Modification: Lists first 3 appointments
User:         "The second one"
Modification: "Would you like to reschedule or cancel?"
User:         "Reschedule"
Modification: "When would work better?"
User:         "Next week"
Modification: [find_earliest_appointment startDate=next-week] → "How about Thursday, Feb 27 at 10 AM?"
User:         "Yes"
Modification: [update_appointment appointmentId=X, newStartTime=2026-02-27T10:00:00]
Modification: "Done! Moved to Thursday, Feb 27 at 10 AM..."
Modification: [log_call_metadata callOutcome="rescheduled"]

5.4 Error Recovery Examples¶

search_patient_by_phone fails¶

Patient-ID: "I'm having trouble looking up your information. Could you tell me your name?"
            [switch to search_patient tool]

No available slots¶

Booking: "Nothing in that range. Would you like to try a different week or different doctor?"
         [find_earliest_appointment with adjusted filters]

Slot collision (just taken)¶

Booking: "That slot was just taken. Let me find the next available."
         [find_earliest_appointment with excludeDates: [previousSlotDate]]

6. LATENCY ARCHITECTURE¶

6.1 Tool Message Strategy (request-start)¶

Design Philosophy: Filler speech covers tool latency by speaking simultaneously.

messages:
  - type: request-start
    blocking: false              # Allow speech to start before tool completes
    content: "Let me check that for you."
  - type: request-response-delayed
    timingMilliseconds: 5000
    content: "Still looking that up."  # If tool takes >5s
  - type: request-failed
    content: "I'm sorry, I wasn't able to check that."

Current config: Most tools use empty request-start (content: ""), relying on prompt-level instruction to generate filler. A few audible tools are: - search_patient_by_phone: "Let me pull up your file." - find_earliest_appointment: "Let me check what's available." - check_appointments: "Let me look that up."

Recommended improvement: Add request-response-delayed with 4000ms to slow tools (find_earliest_appointment, create_appointment, register_new_patient) to prevent dead air if backend is slow.

6.2 startSpeakingPlan (Endpointing)¶

# Router (fast, high-latency STT)
startSpeakingPlan:
  waitSeconds: 0.6
  transcriptionEndpointingPlan:
    onPunctuationSeconds: 0.3
    onNoPunctuationSeconds: 0.8
    onNumberSeconds: 0.5

# EN agents (Deepgram nova-2 en)
waitSeconds: 0.6
onPunctuationSeconds: 0.3
onNoPunctuationSeconds: 0.8

# ZH agents (slower Chinese processing)
waitSeconds: 1.0
onPunctuationSeconds: 0.6
onNoPunctuationSeconds: 1.5
onNumberSeconds: 0.8

Rationale: Chinese takes longer to process (more ambiguous, character-based). Longer wait times prevent premature response generation.

6.3 stopSpeakingPlan (Interruption)¶

# EN agents
stopSpeakingPlan:
  numWords: 2    # Agent stops speaking after user says 2 words

# ZH agents
stopSpeakingPlan:
  numWords: 3    # Slightly more tolerant (3 words ≈ 1 sentence in Chinese)

6.4 Circuit Breaker Timeouts¶

File: admin-dashboard server SOAP adapter

const CIRCUIT_BREAKER_TIMEOUT = 4000; // 4 seconds
// Must complete within Vapi's 5-second tool timeout window
// Leaves 1s buffer for JSON serialization + network

This fixed server-side delays that were exceeding Vapi's default timeout and returning "tool execution failed" to the LLM.

7. PHI EXPOSURE RISK ANALYSIS¶

7.1 PHI in Prompts¶

Minimal risk. Prompts are templates; no real PHI hardcoded. However:

Server extracts PHI from call metadata:

Vapi webhook → call.customer.number (real phone) → server substitutes in search_patient_by_phone

LLM can see patient name after lookup:

search_patient_by_phone result: {id, firstName, lastName, dateOfBirth, phone}
This is echoed back to caller: "I have John Doe on file — is that you?"
Full conversation (including PHI) is stored in Vapi call transcript

7.2 Logging & Redaction¶

File: vapi-webhook.ts, lines 193-198

const PHI_KEYS = new Set([
  'name', 'firstName', 'lastName', 'dateOfBirth', 'phone', 'email',
  'healthCardNumber', 'healthCardProvince', 'callerPhone', 'address',
  'city', 'postalCode', 'patient', 'transcript', 'summary',
]);

Debug mode vs. production: - Debug mode (debugManager.isActive()): Full PHI logged with [PHI-DEBUG] prefix - Production: PHI redacted from logs (keys listed but values obscured)

7.3 Recording & Compliance¶

From routing defaults:

# Registration-EN opening
"Welcome! I'll help you register. This takes a few minutes. 
Just so you know, this call is recorded for quality and scheduling purposes. 
By continuing, you consent to the recording."

HIPAA considerations: - Calls are recorded by Vapi (recordingUrl in end-of-call-report) - Transcripts stored in Vapi call history - No explicit HIPAA encryption/audit trail configuration in assistants - Recommendation: Consult with legal for Canadian healthcare compliance (PHIPA, PIPA)

8. CONVERSATION STATE MANAGEMENT ACROSS HANDOFFS¶

8.1 Context Passing Strategy¶

Default: Full history

# In squad YAML, handoff destinations don't specify contextMode
# Vapi default: contextMode: "all" — full message history passed to next agent

Example flow:

Router conversation:
  User: "Hi, I want to book"
  Router: [get_clinic_info] → "Welcome to Vitara!"
  Router: → calls handoff_to_patient_id_en

Patient-ID-EN receives:
  - All Router messages (including clinic info)
  - LLM can reference: "The clinic I just greeted them from was..."
  - Patient lookup happens fresh: [search_patient_by_phone]

8.2 Patient Context Retention¶

Booking agent accesses Patient-ID lookup result:

Patient-ID-EN returns: {id: 12345, firstName: "John", lastName: "Doe"}
This is in the message history. Booking-EN reads it:
  from conversation history: "Patient's demographicId is 12345"

Then Booking calls: create_appointment(demographicId=12345, ...)

No explicit state mechanism — context is implicit in conversation history. Works because: 1. LLM reads full history (maxTokens sufficient) 2. Prompts instruct: "Patient is ALREADY identified. Their demographicId is the 'id' field from search_patient_by_phone result in conversation history."

8.3 Potential Issue: Long Conversations¶

From prompt engineering report:

"As conversations get long (e.g., patient asks many questions, tries multiple slots), 
the context grows and GPT-4o instruction following degrades."

Mitigation (not yet applied):

# Could add to handoff destinations:
contextMode: "lastNMessages"
lastNMessages: 20     # Only pass last 20 messages, drop early history

Current status: Using default (all history). No production incidents reported yet.

9. FALLBACK & ESCALATION PATHS¶

9.1 Escalation Triggers¶

Emergency keywords detected → End call + direct to 911
3 unclear attempts → transfer_call(reason: "out_of_scope")
Explicit API error → transfer_call(reason: "registration_error")
Record not found → transfer_call(reason: "record_not_found")
Medical questions (out of scope) → transfer_call(reason: "medical_question")
Patient explicitly requests human → transfer_call(reason: "patient_request")

9.2 transfer_call Tool¶

Endpoint: https://api-dev.vitaravox.ca/api/vapi/transfer-call

function:
  name: transfer_call
  description: Transfer call to clinic staff
  parameters:
    reason: [patient_request, frustrated, medical_question, billing, 
             registration_error, record_not_found, out_of_scope]
    notes: (optional) Context for staff

Server-side behavior (inferred): 1. Logs transfer reason + notes to database 2. Initiates SIP REFER or bridges to clinic phone number 3. Returns status to Vapi (call transferred or failed)

9.3 New Patient Registration Rejection¶

# If clinic not accepting new patients (get_clinic_info.acceptingNewPatients=false)

Registration-EN: "Sorry, we're not accepting new patients right now. 
                Would you like to join our waitlist?"

YES → add_to_waitlist(firstName, lastName, phone)
      log_call_metadata(callOutcome="waitlisted")
      "We'll call you when a spot opens up."

NO  → "Take care!"

10. CALL RECORDING & LOGGING PRACTICES¶

10.1 Vapi End-of-Call Report Webhook¶

interface VapiWebhookMessage {
  type: 'end-of-call-report';
  summary?: string;           // AI-generated summary
  transcript?: string;        // Full text transcript
  recordingUrl?: string;      // HTTPS URL to call recording
  durationSeconds?: number;
  cost?: number;              // USD cost
  endedReason?: string;       // why call ended
  metadata?: Record<string, unknown>;
}

Processing:

// vapi-webhook.ts: On end-of-call-report, server:
// 1. Saves transcript + summary to database
// 2. Stores recordingUrl (can be downloaded for archival)
// 3. Logs call duration + cost
// 4. Triggers any post-call analysis (if configured)

10.2 Call Metadata Logging¶

Agents call log_call_metadata at call conclusion:

{
  "language": "en",
  "callOutcome": "booked",
  "demographicId": 12345,
  "appointmentId": 67890
}

Outcomes tracked: - booked, rescheduled, cancelled - registered, waitlisted - transferred, no_action, clinic_info - out_of_scope, record_not_found, registration_error

10.3 Debug Mode¶

File: vapi-webhook.ts

function logWebhook(action: string, data: unknown) {
  if (debugManager.isActive()) {
    logger.info({ webhook: data, _debugMode: true }, 
                `[PHI-DEBUG][VAPI WEBHOOK] ${action}`);
  } else {
    logger.info({ webhook: data }, `[VAPI WEBHOOK] ${action}`);
  }
}

Production: PHI redacted (keys visible, values obscured) Debug mode: Full PHI logged (for internal testing only)

11. VAPI GITOPS INFRASTRUCTURE¶

11.1 GitOps Engine Architecture¶

File: /home/ubuntu/vitara-platform/vapi-gitops/

src/
├── pull.ts          # Download platform state, preserve local changes
├── push.ts          # Upload local YAML/MD to Vapi API
├── apply.ts         # Orchestrator: pull → merge → push
├── call.ts          # WebSocket call testing
├── types.ts         # TypeScript interfaces
├── config.ts        # Environment & config
├── api.ts           # Vapi HTTP client
├── state.ts         # State file (.vapi-state.*.json)
├── resources.ts     # Load YAML/MD files
├── resolver.ts      # Resolve resource IDs → Vapi UUIDs
└── delete.ts        # Deletion & orphan checks

resources/
├── assistants/      # 9 agents (.md files with YAML frontmatter)
├── tools/           # 14 function tools (.yml)
├── structuredOutputs/
├── squads/          # 1 squad (vitaravox-v3.yml)
└── simulations/     # (empty for v3.0)

11.2 Markdown + YAML Frontmatter Format¶

Example: resources/assistants/router-v3.md

---
name: vitara-router-v3
model:
  model: gpt-4o
  provider: openai
  temperature: 0.3
  maxTokens: 400
transcriber:
  provider: assembly-ai
voice:
  provider: 11labs
  voiceId: fQj4gJSexpu8RDE2Ii5m
---

# Markdown system prompt starts here
## IDENTITY
You are a bilingual front-desk assistant...

Parsing: GitOps engine: 1. Extracts YAML frontmatter → Vapi assistant config 2. Converts markdown body → system prompt string (sent to LLM as-is)

11.3 Reference Resolution¶

Local filenames resolve to Vapi UUIDs:

# In assistant file:
toolIds:
  - search-patient-by-phone-8474536c  # filename without .yml

# Engine looks up in .vapi-state.dev.json:
{
  "tools": {
    "search-patient-by-phone-8474536c": "8474536c-663f-4a94-91ae-19e6221f9af9"
  }
}

# Sends to API as:
{
  "toolIds": ["8474536c-663f-4a94-91ae-19e6221f9af9"]
}

11.4 Squad Handoff Resolution¶

# Squad YAML:
- assistantId: router-v3            # Resolves to 4f70e214...
  assistantOverrides:
    tools:append:
      - destinations:
          - assistantName: vitara-patient-id-en-v3    # Matches assistant name field

# Engine resolves:
# 1. assistantId → UUID (state file)
# 2. assistantName → UUID by looking up assistant by name
# 3. Validates handoff destination exists in squad members

11.5 State File¶

.vapi-state.dev.json (checked into git):

{
  "assistants": {
    "router-v3": "4f70e214-6111-4f53-86c9-48f8f7c265e1",
    "booking-en": "ac25775b-c1cc-41ae-8899-810d4ae62efd",
    ...
  },
  "tools": {
    "search-patient-by-phone-8474536c": "8474536c-663f-4a94-91ae-19e6221f9af9",
    ...
  },
  "squads": {
    "vitaravox-v3": "13fdfd19-a2cd-4ca4-8e14-ad2275095e32"
  }
}

Purpose: Maps friendly names to Vapi UUIDs (immutable after creation).

11.6 Commands¶

npm run pull:dev              # Download state from Vapi
npm run push:dev              # Upload local files to Vapi
npm run apply:dev             # pull → merge → push
npm run push:dev assistants   # Push only assistants
npm run push:dev resources/assistants/router-v3.md  # Push single file
npm run call:dev -- -a router-v3  # Test assistant via WebSocket
npm run build                 # Type-check

Dependency order (push): 1. Tools → 2. Structured Outputs → 3. Assistants → 4. Squads

12. DOCUMENTED ISSUES & LESSONS LEARNED¶

12.1 v2.3.0 Issues (Fixed in v3.0)¶

transfer_call tool missing from squad — Fixed in v3.0 by adding to all prompts
Router LLM hallucinating phone numbers — Fixed: server extracts from call.customer.number
firstMessage causing silence — Fixed: Patient-ID agents removed static firstMessage
Silent handoffs were too loud — Fixed: Added content: "" to handoff messages

12.2 v3.0 P0 Fixes (Applied 2026-02-15)¶

Router maxTokens 150→400 — Prevented GPT-4o prompt truncation
Patient-ID prompt rewrite — Removed rigid scripting
Defensive tool-result instruction — All prompts now wait for actual tool response
Circuit breaker 10s→4s — Tuned for Vapi's 5s timeout
All prompts clinic-agnostic — Replaced hardcoded clinic name with tool result

12.3 Known Limitations¶

Clinic timezone hardcoded — OscarSoapAdapter.ts line: const tz = 'America/Vancouver'
Solution: Make this clinic-configurable
Language detection keyword-based — Not STT-based
Why: AssemblyAI router is English-only
Limitation: Mandarin callers must explicitly request 中文
No mid-conversation language switching — Once routed to EN/ZH, stays there
Improvement: Implement LLM proxy with language detection (documented in V3-MULTILINGUAL-ARCHITECTURE.md, not yet implemented)
Chinese name formatting — No romanization/pinyin in prompts
Risk: Name mismatches if caller names have ambiguous spelling
No patient data pre-population — Patient must spell name if phone lookup fails
Improvement: Could use customer name from Vapi metadata if available

12.4 Lessons Learned (From Documentation)¶

Vapi PATCH timeout is ~15s default — Increase to 30-45s for large squad updates
Always use tools:append in squad handoffs — Don't replace existing tools
Silent transfers = "NEVER mention transferring" — Explicit prompt instruction required
Past dates must be clamped server-side — LLM not reliable on current date
First-turn tool call MUST happen — Explicit in prompt: "FIRST RESPONSE MUST CALL"
Tool request-start messages prevent dead air — Use blocking: false for latency coverage
Chinese requires longer startSpeakingPlan — 1.0s vs. EN's 0.6s
Clinic-agnostic prompts scale better — Use get_clinic_info tool, not hardcoded names

13. DEPLOYMENT & TESTING STATUS¶

13.1 Current Deployment¶

Live phone: +1 236-305-7446 Squad ID: 13fdfd19-a2cd-4ca4-8e14-ad2275095e32 Agents: 9/9 deployed Tools: 14/14 deployed Git status: All changes committed to GitOps repo

13.2 Testing Completeness¶

From V3-ARCHITECTURE-SNAPSHOT.md:

✅ English booking flow — TESTED, WORKING
✅ English reschedule flow — TESTED, WORKING after server fix
✅ Chinese booking flow — TESTED, WORKING (with TTS limitations on English names)
⏳ Chinese reschedule — Not explicitly mentioned as tested
⏳ Cross-track language switching — Not implemented
✅ Emergency keywords — Hardcoded, not requiring testing
✅ Registration flow — Mentioned in prompts, likely tested
✅ Transfer escalation — Tool exists, backend logic assumed working

13.3 Pre-Launch Checks (9 Total, All Complete)¶

From memory notes (2026-02-16):

✅ Router language detection logic
✅ Patient-ID phone search + name fallback
✅ Booking slot finding + creation
✅ Modification reschedule + cancel
✅ Registration data collection + validation
✅ Error recovery (3-attempt fallback)
✅ Emergency keyword detection
✅ Call metadata logging
✅ Schedule data flow (informational, non-blocking)

14. SPECIFIC FINDINGS WITH FILE REFERENCES¶

File-by-File Inventory¶

File	Lines	Key Content	Status
`/home/ubuntu/vitara-platform/vapi-gitops/resources/squads/vitaravox-v3.yml`	268	Squad topology, 9 members, handoff definitions	Production
`/home/ubuntu/vitara-platform/vapi-gitops/resources/assistants/router-v3.md`	92	Router agent, bilingual detection, emergency handling	Production
`/home/ubuntu/vitara-platform/vapi-gitops/resources/assistants/patient-id-en.md`	118	Patient ID EN, phone search, intent detection	Production
`/home/ubuntu/vitara-platform/vapi-gitops/resources/assistants/patient-id-zh.md`	113	Patient ID ZH, parallel structure, Chinese grammar	Production
`/home/ubuntu/vitara-platform/vapi-gitops/resources/assistants/booking-en.md`	115	Booking EN, find slots, create appointment	Production
`/home/ubuntu/vitara-platform/vapi-gitops/resources/assistants/booking-zh.md`	112	Booking ZH, Chinese-specific date formatting	Production
`/home/ubuntu/vitara-platform/vapi-gitops/resources/assistants/modification-en.md`	117	Modification EN, reschedule + cancel + check	Production
`/home/ubuntu/vitara-platform/vapi-gitops/resources/assistants/modification-zh.md`	114	Modification ZH, same functionality in Chinese	Production
`/home/ubuntu/vitara-platform/vapi-gitops/resources/assistants/registration-en.md`	119	Registration EN, PHI collection, spelling rules	Production
`/home/ubuntu/vitara-platform/vapi-gitops/resources/assistants/registration-zh.md`	116	Registration ZH, pinyin spelling guidance	Production
`/home/ubuntu/vitara-platform/vapi-gitops/resources/tools/*.yml`	14 files	All tool definitions, server endpoints, parameters	Production
`/home/ubuntu/vitara-platform/admin-dashboard/server/src/routes/vapi-webhook.ts`	600+	Webhook handler, PHI redaction, server-side logic	Production
`/home/ubuntu/vitara-platform/docs/V3-ARCHITECTURE-SNAPSHOT.md`	150 lines	Deployment diagram, agent inventory, flow examples	Reference
`/home/ubuntu/vitara-platform/docs/V3-TOOL-INVENTORY.md`	200+ lines	Tool specs, parameter schemas, server integration	Reference
`/home/ubuntu/vitara-platform/docs/VAPI-PROMPT-ENGINEERING-REPORT.md`	980 lines	Best practices, GitOps patterns, recommendations	Reference
`/home/ubuntu/vitara-platform/vapi-gitops/.vapi-state.dev.json`	37 lines	UUID mappings for all 9 agents, 14 tools, 1 squad	Deployment

CRITICAL RECOMMENDATIONS¶

High Priority¶

Multi-clinic timezone support — Parameterize clinic timezone in admin-dashboard OscarSoapAdapter
Language detection enhancement — Implement LLM proxy for mid-conversation language switching (documented but not deployed)
Add request-response-delayed messages — Cover slow API calls (find_earliest, create_appointment)
HIPAA/PHIPA audit trail — Add legal review for Canadian healthcare compliance

Medium Priority¶

Monitor Chinese TTS on English names — Watch for space-separated character issues post-launch
Implement conversation context limiting — Add lastNMessages: 20 to handoffs for very long calls
Extend phone cache TTL logic — Current 1-hour TTL may cause stale clinic resolution
Test edge cases — Out-of-province health cards, waitlist behavior, slot collision scenarios

Low Priority¶

Add Liquid conditionals — Support text/voice mode switching (preparatory for future chat)
Custom variables for multi-tenancy — Replace hardcoded clinic names with {{clinicName}}
Romanization for Chinese names — Support pinyin input if caller struggles with spelling

CONCLUSION¶

VitaraVox v3.0 represents production-grade voice agent architecture with strong prompt engineering, defensive error recovery, and comprehensive server-side validation. The dual-track multilingual design properly isolates EN/ZH processing while maintaining shared booking/EMR logic. Vapi GitOps enables version-controlled, auditable agent configuration—a best practice for voice systems.

Key strengths: Clinic-agnostic prompts, PHI redaction, 3-attempt fallback, emergency keyword detection, timezone awareness, state management across handoffs.

Key gaps: Hardcoded clinic timezone, keyword-only language detection, no mid-conversation language switching, minimal HIPAA audit trail.

The system is live and functional. The team has systematically applied fixes to maxTokens tuning, defensive tool-result instructions, and prompt rewrites—evidence of mature deployment practices. Recommended next steps focus on operational hardening (timezone, HIPAA compliance) and conversation quality (language detection, latency optimization).