Conversation UX¶
Natural conversation design, filler strategy, and bilingual scripts for the v3.0 voice agent
Last Updated: 2026-02-17 (v4.0.1)
Design Philosophy¶
The Vitara voice agent is designed to sound like a warm, efficient clinic receptionist -- not an IVR menu and not a chatbot. Three qualities define the target experience:
-
Responsiveness to context, not just keywords. The agent parses intent from the full sentence, not trigger words. "I was supposed to have an appointment today but something came up" means reschedule, not book.
-
Economy of speech. Humans do not narrate their actions. A receptionist does not say "I am now going to search for your patient record." She says nothing, or "One sec." The fewer words the agent uses, the more human it sounds.
-
Emotional calibration. Softer language when the caller mentions pain ("Let me get you seen right away"), brisk language when the caller is efficient ("Got it -- Tuesday at 2, Dr. Chen. Done.").
Filler Phrase Strategy¶
v4.0.1: Tool-Level request-start¶
Current Strategy
As of v4.0.1, filler phrases for Booking and Modification flows are handled at the tool level via Vapi request-start messages. Registration agents still use prompt-level FILLER PHRASE RULES.
How it works: When a tool is invoked, Vapi immediately plays a pre-configured message (if one exists). The LLM may also speak a brief phrase alongside the tool call (as instructed by the prompt), but the request-start message provides consistent, immediate feedback.
Three tools have audible request-start messages (English only):
| Tool | request-start Content |
|---|---|
search_patient_by_phone |
"Let me pull up your file." |
find_earliest_appointment |
"Let me check what's available." |
check_appointments |
"Let me look that up." |
The remaining 11 tools have silent request-start (empty content string). This includes register_new_patient, create_appointment, update_appointment, cancel_appointment, get_clinic_info, get_providers, search_patient, log_call_metadata, transfer_call, add_to_waitlist, and get_patient.
No Language-Specific request-start
Vapi tool request-start messages are a single string, not language-conditional. The ZH track agents hear the same English request-start message (or silence). For the ZH track, the LLM's own spoken phrase (instructed by the ZH prompt) provides the Mandarin filler. For example, Patient-ID-ZH says "我帮您查一下" alongside the search_patient_by_phone call.
Prompt-Level Filler (Registration Only)¶
Registration EN/ZH agents retain a FILLER PHRASE RULES section in their prompts:
- Before a tool call, say ONE brief natural phrase.
- Use varied phrases: "Let me check.", "Looking into that.", "Processing now." (EN) / "我查一下。", "帮您看看。", "正在处理。" (ZH)
- NEVER repeat the same waiting phrase more than once in the entire call.
Why Not LLM-Generated Filler for Booking/Modification¶
The previous approach (v3.0 launch) had FILLER PHRASE RULES sections in Booking and Modification prompts. This had several problems:
- Latency: The LLM had to generate filler text, then call the tool. Two round-trips instead of one.
- Repetition: The LLM ran out of natural filler phrases after 2-3 tool calls.
- Inconsistency: Different runs produced different filler text.
Tool-level request-start eliminates all three problems for flows with predictable tool-call patterns.
Silent Handoff Design¶
Handoffs between agents are invisible to the caller. No agent ever says "transferring", "connecting you", "assistant", or any system terminology. From the caller's perspective, there is a brief pause (as if the receptionist is pulling up a different screen) and then the conversation continues.
How Handoffs Feel¶
| Transition | What the Caller Hears | What Happens |
|---|---|---|
| Router to Patient-ID | "Sure!" or "Of course, happy to help!" [1-2s pause] | handoff_to_patient_id_en/zh |
| Patient-ID to Booking | "I'll get you set up." [1s pause] | handoff_to_booking_en/zh |
| Patient-ID to Modification (reschedule/cancel) | "I can help with that." [1s pause] | handoff_to_modification_en/zh |
| Patient-ID to Modification (check) | "Let me pull that up." [1s pause] | handoff_to_modification_en/zh |
| Patient-ID to Registration | [silence] | handoff_to_registration_en/zh |
| Booking to Modification | "Of course" | handoff_to_modification_en/zh |
| Modification to Booking | "Of course" | handoff_to_booking_en/zh |
Rules¶
- Never announce the handoff. Do not say "Let me transfer you" or "I'll connect you with our booking team."
- Use transition phrases, not transfer phrases. "I'll get you set up" implies action, not routing.
- The receiving agent does NOT re-greet. It picks up the conversation thread where the previous agent left off.
First-Turn Tool Call Pattern¶
A key design pattern in v4.0.1: agents that need to call a tool on their first turn speak a brief phrase alongside the tool call, while the tool-level request-start message provides additional audible feedback.
How It Works¶
The agent's first LLM turn includes BOTH a brief spoken phrase AND a tool call in the same response:
- LLM generates: "One moment while I look you up" + tool call to
search_patient_by_phone - Vapi speaks the phrase while simultaneously executing the tool call
- Tool-level
request-startmay also play (e.g., "Let me pull up your file.") - Tool result comes back
- Agent speaks the result
Agents Using This Pattern¶
| Agent | Spoken Phrase (EN) | Spoken Phrase (ZH) | First Tool Call | request-start Message |
|---|---|---|---|---|
| Patient-ID | "One moment while I look you up" | "我帮您查一下" | search_patient_by_phone |
"Let me pull up your file." |
| Booking | "Let me find you an appointment" | "我帮您查一下可以的时间" | find_earliest_appointment |
"Let me check what's available." |
| Modification | "Let me pull up your appointments" | "我帮您查一下您的预约" | check_appointments |
"Let me look that up." |
Patient-ID firstMessage Removal
Patient-ID agents had a firstMessage configured in the squad YAML (e.g., "Let me look up your information"). This was removed in the P1 fix because it caused a 16-second silence: the firstMessage played, then the LLM waited for user input before calling the tool. Without firstMessage, the LLM calls the tool immediately on activation.
Conversation Scripts¶
Script 1: Returning Patient, English, Quick Booking¶
Duration: 45-60 seconds | 7 turns
[Vapi plays firstMessage from squad YAML]
SYSTEM: "Hi there, thanks for calling!"
CALLER: "Hi, I need to book an appointment."
[Router calls get_clinic_info]
ROUTER: "Welcome to Maple Medical! Sure, I can help with that."
-> [SILENT HANDOFF to Patient-ID-EN]
[Patient-ID-EN: "One moment while I look you up"]
[Calls search_patient_by_phone(phone: "0000000000")]
[request-start plays: "Let me pull up your file."]
[Server substitutes real phone, found: Sarah Mitchell, id=4521]
PATIENT-ID-EN: "I have Sarah Mitchell on file -- is that you?"
CALLER: "Yes, that's me."
PATIENT-ID-EN: "I'll get you set up."
-> [SILENT HANDOFF to Booking-EN]
[Booking-EN: "Let me find you an appointment"]
[Calls find_earliest_appointment, no filters]
[request-start plays: "Let me check what's available."]
[Result: Monday Feb 17, 10:00 AM, Dr. Patel]
BOOKING-EN: "I have Monday the 17th at 10 AM with Dr. Patel.
Does that work?"
CALLER: "Do you have anything in the afternoon?"
[Booking-EN calls find_earliest_appointment(timeOfDay="afternoon")]
[Result: Monday Feb 17, 2:30 PM, Dr. Patel]
BOOKING-EN: "How about Monday at 2:30, still with Dr. Patel?"
CALLER: "Perfect."
BOOKING-EN: "What's the visit for?"
CALLER: "Just a checkup."
[Booking-EN calls create_appointment(
demographicId=4521, providerId=100, startTime="2026-02-17T14:30:00",
appointmentType="B", reason="checkup", language="en")]
BOOKING-EN: "All set -- Monday the 17th at 2:30 with Dr. Patel.
You'll get a text confirmation. Arrive 10 minutes early with
your health card. Anything else?"
CALLER: "No, that's it. Thanks!"
[Booking-EN calls log_call_metadata(
callOutcome="booked", language="en", demographicId=4521)]
BOOKING-EN: "Take care, Sarah."
Script 2: New Patient, English, Registration¶
Duration: 2-3 minutes | ~16 turns
[firstMessage plays, Router detects English, handoff to Patient-ID-EN]
[Patient-ID-EN: "One moment while I look you up"]
[Calls search_patient_by_phone]
[Not found]
PATIENT-ID-EN: "I'm not finding a file under this phone number.
Are you a new patient?"
CALLER: "Yes, first time calling."
-> [SILENT HANDOFF to Registration-EN]
REGISTRATION-EN: "Welcome! I'll help you register -- it just takes
a few minutes. Just so you know, this call is recorded for quality
and scheduling purposes. By continuing, you consent to the recording."
REGISTRATION-EN: "What is your full legal name? First name and last name."
CALLER: "Priya Ramanathan."
REGISTRATION-EN: "Priya Ramanathan -- did I get that right?"
CALLER: "Yes."
REGISTRATION-EN: "What is your gender? Male, Female, or Other?"
CALLER: "Female."
REGISTRATION-EN: "What is your date of birth?"
CALLER: "March 8th, 1992."
REGISTRATION-EN: "Best phone number to reach you?"
CALLER: "604-555-1234."
REGISTRATION-EN: "What is your address, including city and postal code?"
CALLER: "1455 West Broadway, Vancouver, V6H 1H6."
REGISTRATION-EN: "Do you have a BC Services Card or Personal Health Number?"
CALLER: "Yes, my PHN is 9876543210."
REGISTRATION-EN: "And finally, what is your email address?
Or we can skip this."
CALLER: "priya.r@gmail.com."
REGISTRATION-EN: "Let me confirm: Priya Ramanathan,
born March 8, 1992, phone 604-555-1234,
1455 West Broadway, Vancouver. Everything correct?"
CALLER: "Yes, perfect."
[Registration-EN calls register_new_patient(...)]
REGISTRATION-EN: "Welcome, Priya! You're registered.
Would you like to book your first appointment?"
CALLER: "Yes, please."
-> [SILENT HANDOFF to Booking-EN]
[Booking flow continues as in Script 1]
Script 3: Returning Patient, Mandarin, Reschedule¶
Duration: 45-60 seconds | 7 turns
[firstMessage plays: "Hi there, thanks for calling!"]
CALLER: "你好,我想改一下我的预约。"
[Router calls get_clinic_info]
ROUTER: "好的!"
-> [SILENT HANDOFF to Patient-ID-ZH]
[Patient-ID-ZH: "我帮您查一下"]
[Calls search_patient_by_phone]
[Found: 王建华, id=3287]
PATIENT-ID-ZH: "您好!我查到了王建华的信息,请问是您本人吗?"
CALLER: "是的。"
PATIENT-ID-ZH: "好的。"
-> [SILENT HANDOFF to Modification-ZH]
[Modification-ZH: "我帮您查一下您的预约"]
[Calls check_appointments(demographicId=3287)]
[request-start plays: "Let me look that up."]
[Result: Feb 20, 9:00 AM, Dr. Patel]
MODIFICATION-ZH: "您下一次预约是2月20号星期五上午9点,
Patel医生。您想改到什么时候?"
CALLER: "下个星期二下午有吗?"
[Modification-ZH calls find_earliest_appointment(
startDate="2026-02-24", timeOfDay="afternoon")]
[Result: Feb 24, 2:00 PM, Dr. Patel]
MODIFICATION-ZH: "2月24号星期二下午两点,还是Patel医生,
可以吗?"
CALLER: "可以,就这个吧。"
[Modification-ZH calls update_appointment(...)]
MODIFICATION-ZH: "好的,已经帮您改到2月24号下午两点了。
我们会发短信确认。还有其他需要吗?"
CALLER: "没有了,谢谢。"
[Modification-ZH calls log_call_metadata(
callOutcome="rescheduled", language="zh")]
MODIFICATION-ZH: "祝您健康,再见!"
Script 4: Caller on Behalf of Someone Else¶
Duration: ~60 seconds for identification | 5 turns
[firstMessage plays, Router detects English, handoff to Patient-ID-EN]
[Patient-ID-EN calls search_patient_by_phone]
[Found: Lisa Wong, id=2195]
PATIENT-ID-EN: "I have Lisa Wong on file -- is that you?"
CALLER: "Oh no, I'm calling for my husband.
His name is David Wong."
PATIENT-ID-EN: "Of course! What is David's date of birth?"
CALLER: "June 12th, 1985."
[Patient-ID-EN calls search_patient(
name="David Wong", dateOfBirth="1985-06-12")]
[Found: David Wong, id=2196]
PATIENT-ID-EN: "Got it -- David Wong, born June 12, '85.
What does he need?"
CALLER: "He needs to book a follow-up with Dr. Patel."
PATIENT-ID-EN: "I'll get that set up."
-> [SILENT HANDOFF to Booking-EN with David's context]
Script 5: Emergency Detection¶
Duration: 5 seconds | 1 turn
[firstMessage plays: "Hi there, thanks for calling!"]
CALLER: "Hi, my dad is having chest pain and he can't breathe --
do I bring him in or what should I do?"
ROUTER: "This sounds like a medical emergency. Please hang up
and call 911 immediately."
[Call ends.]
Emergency override supersedes ALL flows. The agent does not ask clarifying questions, does not look up the patient, does not mention the clinic. One message. End.
Mandarin equivalent:
Anti-Patterns to Avoid¶
| Anti-Pattern | Example | Better Alternative |
|---|---|---|
| Restating the obvious | "I understand you want to book an appointment." | Skip to action: "Let me check what's available." |
| System narration | "I'm now going to search our database." | Say nothing, or "One sec." |
| Excessive hedging | "I'd be happy to help you with that!" | "Sure." / "Of course." |
| Over-confirmation | "Tuesday, February 18th, 2026, at 10:00 AM Pacific with Doctor Rajesh Patel, MD." | "Tuesday the 18th at 10 with Dr. Patel." |
| Robotic date formatting | "February 18th, 2026" | "Tuesday the 18th" or "next Tuesday" |
| Action narration | "I am booking that appointment for you now." | [silence] then "All set -- Tuesday at 10." |
| Thank-you loops | "Thank you. And your phone? Thank you. And your address? Thank you." | "Got it." / just move to next question |
| "Unfortunately" | "Unfortunately, no appointments are available." | "Nothing in that range -- want to try next week?" |
Bilingual Considerations¶
Code-Switching¶
Code-switching (mixing EN and ZH in a single utterance) is the norm for bilingual Chinese-Canadians:
- "我想cancel我的appointment" (I want to cancel my appointment)
- "Dr. Chen有空吗?" (Is Dr. Chen available?)
- "我的health card号码是..." (My health card number is...)
The ZH track handles this at the STT level (Deepgram Nova-2 zh transcribes English loan words as-is) and GPT-4o understands mixed-language input natively. The output language remains consistent -- the ZH agent always responds in Chinese:
- Caller: "我想cancel" --> Agent: "好的,帮您取消" (NOT "好的,I'll cancel that")
- Caller: "Doctor Chen" --> Agent: "陈医生" (NOT "Doctor Chen")
Cultural Differences¶
| Aspect | EN Track | ZH Track |
|---|---|---|
| Formality | First-name basis acceptable ("Hi Sarah!") | Formal address preferred ("您" not "你") |
| Questions | Direct ("What's your date of birth?") | Softened with "请问" ("请问您的出生日期?") |
| Refusal | Direct "No" acceptable | Indirect: "这个时间不太方便" (this time isn't very convenient) |
| Sign-off | "Take care!" / "See you Tuesday!" | "祝您健康!" (wishing you health) |
| Privacy framing | Standard | Explain why: "我们需要您的出生日期来找到您的档案" |
ZH Prompt Tone¶
The ZH prompts should feel native to Chinese conversation, not like translated English. For example:
| Literal Translation | Natural Chinese |
|---|---|
| "您好!我来帮您注册。整个过程大概需要几分钟。温馨提示,本次通话将被录音..." | "您好!帮您登记一下信息,几分钟就好。这个电话有录音的,您方便吗?" |
The natural version is half the length, more colloquial, and uses "方便吗" (is that convenient?) as a softer consent mechanism.
Known UX Issues¶
GPT-4o Space-Separated Chinese
GPT-4o occasionally outputs space-separated Chinese characters (e.g., "您 好" instead of "您好"). This affects TTS quality in the ZH track. Monitor frequency; may need post-launch LLM evaluation.
Cold-Start WSDL Delays
The first SOAP call after PM2 restart requires a WSDL fetch that can take 2-4 seconds. If the first caller of the day triggers search_patient_by_phone during cold start, they hear a longer-than-normal pause. Mitigation: warm SOAP clients on server startup.
Missing CONVERSATION STYLE Sections
The P0/P1 prompt rewrites removed CONVERSATION STYLE sections from several agents. These sections defined tone, vocabulary, and sign-off behavior. They need to be restored to maintain consistency.
English request-start on ZH Track
Tool-level request-start messages are English-only (e.g., "Let me pull up your file."). ZH track callers may hear an English phrase before the ZH agent speaks. The LLM's own Mandarin phrase (e.g., "我帮您查一下") mitigates this, but the English request-start may still be audible.
Mid-Call Language Switching
In v3.0, language is locked at the Router level. If a caller starts in English and switches to Chinese mid-call, the EN track agent continues in English. This is a deliberate design choice (per-track STT/TTS cannot switch mid-conversation), but may be confusing for some callers. A future enhancement could detect sustained language switching and route back to the Router.
Voice Agent Design Principles¶
-
The caller is always right about what they need. If they say "book," book. If they change their mind, pivot. Never force a flow.
-
Silence is not failure. A 1-2 second pause while processing is natural. Filling every pause with words is unnatural.
-
Economy of words is the highest virtue. "Tuesday at 10?" is better than "I have an available appointment on Tuesday at 10 AM."
-
Thread, do not restart. Use information from earlier in the conversation. Say "still with Dr. Patel" instead of re-introducing the doctor.
-
Action demonstrates understanding. Do not say "I understand." Just do the thing the caller asked for.
-
Personalize the goodbye. Reference the appointment, use the caller's name, or wish them well in a context-specific way.
-
Healthcare calls are emotional. Even routine booking calls carry the weight of health anxiety. Be warm but efficient. Never dismissive, never saccharine.
-
Bilingual is not just translation. Each language track should feel native to its culture, not like a translated script.
Related Documentation¶
- Squad Architecture -- Agent handoff matrix and call flow
- Agent Behaviors -- Per-agent behavior details
- Tool Inventory -- Tool reference with request-start config