Voice Agent Best Practices¶

VitaraVox Enterprise Readiness Analysis¶

Date: February 17, 2026¶

Agent: Voice Agent UX & Best Practices Researcher¶

Enterprise Voice AI Agents in Healthcare: 2025-2026 Competitive Landscape and Best Practices¶

1. Vapi.ai -- Latest Features, Enterprise Tier, HIPAA/Compliance, Known Limitations¶

Features and Architecture¶

Vapi remains developer-first, offering a real-time voice AI orchestration layer that chains STT, LLM, and TTS providers via API. The standout feature for complex healthcare workflows is Squads -- multi-agent orchestration that allows specialized assistants to hand off conversations while maintaining context. Companies like Fleetworks run 240,000 calls/day through Squads. Vapi also launched Vapi Evals in 2025, a testing framework supporting exact match, regex, and AI-judge validation methods for agent behavior.

Vapi GitOps provides config-as-code tooling with slug-based tool references and environment separation (dev/staging/prod), which is directly relevant to your current v3.0 setup.

HIPAA and Compliance¶

HIPAA can be enabled at the organization level or assistant level (compliancePlan.hipaaEnabled=true)
When HIPAA mode is on, Vapi does NOT store structured outputs -- this limits Insights and Call Logs functionality
PHI may only pass through the /call endpoint; all other API endpoints must not contain PHI
BAA available; costs $1,000/month add-on on pay-as-you-go plans, included in Enterprise tier
SOC 2 certified
Missing: ISO 27001, RBAC, default access logging

Known Limitations for Healthcare¶

Issue	Impact on Healthcare
Latency stacking -- 4-5 API hops per turn	Adds 30-40s of cumulative dead air per 5-min call
Memory/context loss mid-call	Patients asked to repeat name/DOB -- unacceptable in clinical contexts
Breaking changes on platform updates	Production agents can break without warning
No native multi-campaign management	Difficult to manage multiple clinic deployments
Limited reporting for business stakeholders	Insufficient for healthcare QA requirements
Support quality frequently criticized	Multiple users report poor documentation and support
Real cost vs. advertised	Advertised $0.05/min; real cost $0.25-$0.33/min after LLM, STT, TTS, telephony
No omnichannel	Phone-only; no native SMS/chat/web follow-up

Actionable for Vitaravox¶

Your current maxTokens fix (150 to 400 on Router) addresses a real Vapi limitation where GPT-4o tool-call JSON silently truncates. The 4s circuit breaker timeout (under Vapi's 5s tool timeout) is correctly calibrated. The HIPAA structured-output limitation means you cannot rely on Vapi's built-in analytics for PHI-containing calls -- you need your own logging pipeline.

Sources: - Vapi HIPAA Documentation - Vapi AI Review 2025 (Dograh) - Vapi AI Review 2026 (Retell) - Vapi AI Review 2026 (Softailed) - Vapi AI Review 2026 (Lindy) - Vapi Squads Introduction - Vapi AI Pricing Guide 2026 (CloudTalk)

2. Competing Platforms¶

Retell.ai¶

Compliance: HIPAA, SOC 2, GDPR out of the box
Latency: ~714ms response time -- competitive with Vapi
Uptime: 99.99% (vs. Vapi's 99.94%)
QA: Launched Retell Assure (December 2025) -- monitors 100% of calls automatically, flags failures, assigns scores, and recommends remediation. This is a major differentiator vs. Vapi which has no equivalent
Growth: 300%+ quarter-over-quarter user growth, $40M+ ARR as of January 2026
Healthcare: 31+ languages, 85% containment rates, 80% reduction in call handling costs in healthcare deployments
Pricing: Starts at $0.07+/min (more transparent than Vapi)
Limitation: Less developer flexibility than Vapi for custom orchestration; no equivalent to Squads

Bland.ai¶

Compliance: SOC 2 Type II, HIPAA certified
Strengths: Excellent audit tools, system-level logging of all transcripts and model responses; enterprise governance focus
Omnichannel: Voice, SMS, and chat from one platform
Limitations: Lacks ISO 27001, RBAC, on-prem deployment options; pricing not published; enterprise-only positioning
Healthcare fit: Good for large health systems needing governance and audit trails, but less accessible for smaller clinics

Voiceflow¶

Nature: Visual conversation design platform, NOT real-time voice infrastructure
Healthcare: Good for prototyping and designing conversation flows before implementing in Vapi/Retell
Pricing: Free tier available; Pro at $60/editor/month; Business at $150/editor/month
Limitation: Does not handle actual voice orchestration -- must pair with a voice runtime

Parloa¶

Funding: Raised EUR 310M ($350M) Series D in January 2026 -- largest in the European AI agent space
Product: AI Agent Management Platform (AMP) -- design, manage, and evolve AI agents using natural language
Healthcare: Appointment scheduling, prescription refills, insurance verification at scale; EHR integration through standardized protocols
Multilingual: Adapts across dialects and contexts
Limitation: Enterprise-focused, likely expensive; less developer customization than Vapi

PolyAI¶

Scale: 100+ enterprise customers, 2,000+ live deployments, 45 languages, 25+ countries
Funding: $86M Series D (December 2025), $200M+ total, $750M valuation
Product: Agent Studio (April 2025) -- voice-first, omnichannel platform with safety filters, analytics, and workflow management
ROI: Forrester study found 391% ROI with average savings of $10.3M per customer
Limitation: Enterprise-tier pricing; less suitable for small clinics or startups

Summary Comparison¶

Feature	Vapi	Retell	Bland	Parloa	PolyAI
HIPAA	Yes ($1K/mo)	Yes (built-in)	Yes	Yes	Yes
SOC 2	Yes	Yes	Type II	Yes	Yes
ISO 27001	No	Unknown	No	Yes	Yes
Multi-agent	Squads	Limited	Yes	AMP	Agent Studio
QA/Analytics	Basic	Assure (100%)	Audit logs	Enterprise	Enterprise
Latency	~500ms+	~714ms	Unknown	Unknown	Unknown
Languages	Provider-dependent	31+	Unknown	Multi-dialect	45
Starting price	$0.05/min (real: $0.25+)	$0.07+/min	Enterprise only	Enterprise only	Enterprise only

Sources: - Retell AI vs. Bland AI - Top 5 Best AI Voice Agent Platforms (Retell) - Parloa Healthcare Voice AI - Parloa EUR 310M Series D - PolyAI $86M Series D - Top 10 AI Voice Agent Platforms Guide 2026 (Vellum) - Bland AI Alternatives 2026 (Retell)

3. Enterprise Voice Agent Architecture -- What Production-Grade Looks Like in 2026¶

The Four Pillars¶

Every production voice agent rests on four components working in real-time concert:

STT (Ears) -- Speech-to-Text transcription
LLM (Brain) -- Intent understanding, reasoning, tool calling
TTS (Voice) -- Text-to-Speech synthesis
Orchestrator (Conductor) -- Manages real-time flow, state, handoffs, failover

Latency Targets¶

Target: Sub-500ms round-trip for positive user perception
Threshold: Degradation above 800ms produces sharp satisfaction drops
State of the art: Leading implementations achieve 300-500ms (down from 800-1200ms in 2024)
Your current architecture: Vapi's pipeline adds latency from chaining 4-5 API hops

Production Architecture Pattern¶

                    +------------------+
                    |   Telephony      |
                    |   (Vapi/Twilio)  |
                    +--------+---------+
                             |
                    +--------v---------+
                    |   Orchestrator   |
                    |   (Squad Router) |
                    +--------+---------+
                             |
              +--------------+--------------+
              |              |              |
     +--------v---+  +------v------+  +----v--------+
     |    STT     |  |    LLM      |  |    TTS      |
     | (Deepgram/ |  | (GPT-4o/    |  | (ElevenLabs/|
     |  Assembly) |  |  Claude)    |  |  Azure)     |
     +------------+  +------+------+  +-------------+
                            |
                   +--------v---------+
                   |  Tool Execution  |
                   |  (OSCAR SOAP,    |
                   |   FHIR, APIs)    |
                   +------------------+

Key Architecture Decisions for Healthcare¶

Multi-state agent architecture -- Your current Squad model (Router + specialized agents) aligns with industry best practice. Single-agent architectures collapse under complex healthcare workflows.
Stateful context preservation -- The industry has moved toward explicit context passing between agents rather than relying on LLM memory. Your approach of passing patient context via handoff tool parameters is correct.
Tool-level latency management -- Your 4s circuit breaker for SOAP calls is well-calibrated. The industry standard is to keep tool execution under the platform's tool timeout (Vapi: 5s).
Request-start audio messages -- Your v3.0 approach of tool-level request-start messages replacing LLM-generated filler phrases aligns with the emerging pattern of deterministic audio during async operations.

Market Scale¶

Production voice agent implementations grew 340% year-over-year in 2025. 43% of US medical groups expanded voice AI use in 2024, with 70% reporting operational improvements.

Sources: - The Voice AI Stack for Building Agents in 2026 (AssemblyAI) - The State of Voice Agents in 2026 - Voice AI Trends 2026: Enterprise Adoption & ROI Guide - 2025 Product Recap: Building the Voice AI Agent Platform for Enterprise (Regal) - From AI Pilots to Production Reality

4. LLM Choices for Multilingual Healthcare (GPT-4o vs Claude vs Gemini)¶

GPT-4o for Mandarin Medical¶

GPT-4o has been rigorously tested on Chinese medical licensing exams: - 84.2-88.2% accuracy on the Chinese National Medical Licensing Examination (2020/2021 editions) - All models performed better in Chinese than English on Chinese medical queries -- significant finding - In TCM (Traditional Chinese Medicine), GPT-4o, Qwen 2.5 Max, and Doubao 1.5 Pro showed highest alignment with licensed practitioners - Caveat: Research notes "performance disparity might stem from LLMs being primarily trained on English datasets and lacking deep familiarity with Chinese culture, linguistic nuances, and TCM concepts"

Claude (Opus 4, Sonnet) for Medical¶

Claude 3 Opus achieved highest accuracy for most medical exam question groups except prosthetic dentistry in a Polish/English comparative study
Claude Opus 4 (May 2025) brings "unmatched clarity in communication, long session thinking, and emotionally intelligent writing"
Strong at structured reasoning and tool calling -- relevant for multi-step booking workflows
Limitation: Less tested specifically on Mandarin medical terminology compared to GPT-4o

Gemini for Multilingual¶

Gemini 2.5 Pro (June 2025): Mainstream language pairs (English-Mandarin, Spanish-Arabic) at ~98% accuracy
140+ language support
Advantage: Deep Google infrastructure integration
Limitation: Less tested in real-time voice agent tool-calling scenarios

Global Medical Exam Performance¶

Model	Global Medical Exam Accuracy
GPT-o1	95.4%
DeepSeek-R1	92.0%
GPT-4o	89.4%
Claude (various)	Competitive, varies by specialty

Recommendation for Vitaravox v3.0¶

Your decision to use GPT-4o for both EN and ZH tracks at launch is well-supported by the evidence. GPT-4o's 84-88% accuracy on Chinese medical exams, combined with its strong tool-calling capabilities, makes it the safest launch choice. The space-separated Chinese characters issue you noted is a known GPT-4o artifact that should be monitored.

For a post-launch bake-off on the ZH track: - Qwen 2.5 Max (Alibaba, 119 languages) is worth testing -- it showed top TCM alignment - DeepSeek-R1 scored 92% on medical exams but your lesson learned about DeepSeek V3's unreliable tool_choice:"auto" (3-15% failure) is a critical blocker - Claude Opus 4 could be tested for the EN track where its structured reasoning shines

Sources: - GPT Performance on Chinese National Medical Licensing Examination (Nature) - LLMs in Traditional Chinese Medicine Diagnosis (Nature Digital Medicine) - Comparing ChatGPT, Gemini, Claude on Medical Examinations (Nature) - Gemini 3 Multilingual Power 140 Languages (Skywork) - Top 9 Large Language Models February 2026 (Shakudo)

5. STT/TTS Best Practices for Medical Speech Recognition¶

Speech-to-Text (STT)¶

Deepgram Nova-3 Medical (March 2025)¶

Median WER: 3.45% on medical terminology -- 63.6% reduction vs. next-best competitor
Structured transcriptions that integrate with EHR systems
Pricing: $0.0077/min streaming -- more than 2x cheaper than leading cloud providers
Language support: English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, Dutch
Critical gap for Vitaravox: Mandarin Chinese is NOT listed for Nova-3 Medical specifically

Deepgram Nova-3 General¶

14.5% WER on Artificial Analysis benchmarks -- best accuracy among real-time models
Nova-3 expanded with 11 new languages across Europe and Asia
nova-2 with zh language code is what you currently use for ZH track -- this is correct given Nova-3 Medical's Mandarin gap

AssemblyAI Universal¶

Your Router uses AssemblyAI Universal for bilingual (EN/ZH) detection -- this remains the right choice
AssemblyAI does NOT support endpointing, languageDetection, or model fields (different schema from Deepgram; model property causes 400 error) -- your lesson learned is correct

Key STT Considerations for Healthcare¶

Mishearing a single word can be life-threatening -- "hypertension" vs. "hypotension" represent opposite diagnoses
Hospital environments have overlapping speech, machine noise, background voices
Most ASR systems are still trained in clean conditions -- real-world clinical performance degrades

Text-to-Speech (TTS)¶

ElevenLabs¶

V3 model (GA February 2026): Audio tags for inline tone/emotion/delivery control
Multilingual v2: 32 languages including Mandarin Chinese
HIPAA: Zero Retention Mode + BAA available -- no content or data retained, end-to-end encryption
Scribe v2 (January 2026): 90+ language STT with real-time variant
Limitation: eleven_turbo_v2_5 is English-only -- your lesson learned about using eleven_multilingual_v2 for CJK is critical

Azure Speech¶

500+ neural voices across 140+ languages
zh-CN-XiaoxiaoNeural (your ZH track TTS choice) -- one of Microsoft's highest-quality Mandarin voices
Compliance: SOC 1/2/3, ISO 27001, HIPAA, FedRAMP, PCI DSS
Part of Microsoft Foundry ecosystem
Advantage over ElevenLabs for ZH: Native Mandarin optimization, more natural tones and prosody for Chinese

Recommendation¶

Your current split architecture -- ElevenLabs for EN, Azure for ZH -- is the optimal configuration. ElevenLabs V3 offers superior English expressiveness, while Azure's XiaoxiaoNeural provides better Mandarin naturalness than ElevenLabs' multilingual model.

Sources: - Deepgram Nova-3 Medical Launch - Deepgram Nova-3 Medical (AI News) - Best Medical Speech Recognition Software 2025 (AssemblyAI) - ElevenLabs V3 Launch - ElevenLabs vs Azure AI Speech 2026 (Aloa) - ElevenLabs Mandarin Chinese TTS - How to Choose STT and TTS for Voice Agents (Softcery)

6. Conversation Design Patterns for Patient-Facing Voice AI¶

Core Principles¶

Start with high-volume, straightforward tasks -- appointment scheduling, FAQ, reminders -- then expand to complex use cases after proving value and building trust.
Multimodal follow-up -- 2026 best practice is voice + text confirmation. After a booking call, send SMS/email confirmation with appointment details. Vapi's phone-only limitation means you need a separate channel for this.
Natural conversational design -- Generative AI should "feel like talking to a person," using contextual dialogue rather than rigid scripted trees. Your v3.0 Router prompt rewrite (removing "Say EXACTLY 'One moment please'" rigid scripting) directly aligns with this.
Language accessibility -- Systems should handle entire interactions in the patient's preferred language. Your dual-track EN/ZH architecture is forward-thinking; most competitors offer translation layers rather than native language tracks.
Clear escalation paths -- Every conversational AI system needs clear paths to human support when AI reaches its limits. Patients should be able to say "agent" or "operator" at any time.

Healthcare-Specific Patterns¶

Pattern	Implementation
Warm acknowledgment	Replace "Please hold" with contextual response acknowledging what patient said
Zero text on tool calls	Tool-level `request-start` messages instead of LLM-generated filler (your v3.0 approach)
Silent transfers	"NEVER mention transferring" in all squad prompts (your approach)
Defensive tool-result	"WAIT for actual tool result before speaking about the patient" (your P0 fix)
Clinic-agnostic prompts	Remove clinic name references; let `get_clinic_info` populate dynamically (your approach)
Patient confirmation	Always repeat back critical details (date, time, provider) before confirming

Emerging 2026 Pattern: "Clinician Partnership"¶

AI presents opportunities for healthcare professionals to expand their role as trusted experts. The voice agent handles logistics; the clinician handles care. This framing -- "the AI books your appointment, your doctor provides your care" -- improves patient trust.

Sources: - AI Contact Center Trends 2026 (Healthcare IT News) - Conversational AI for Healthcare: Complete Guide 2026 - Transforming Healthcare Delivery with Conversational AI (Nature Digital Medicine) - Voice AI Healthcare Use Cases 2025 (My AI Front Desk)

7. Failover and Escalation -- Industry Standards¶

Escalation Triggers¶

Keyword-based: Patient says "agent," "operator," "help," "emergency"
Emotion-aware models: Detect frustration and route to live nurses before satisfaction drops
Critical symptom detection: AI trained to recognize phrases like "crushing chest pain" or suicidal ideation, immediately bypassing normal protocol to trigger emergency escalation
Confidence threshold: When LLM confidence drops below a threshold, auto-escalate rather than guessing

Handoff Best Practices¶

Clean summaries, not raw transcripts -- Escalations pass structured summaries to human agents
Context preservation -- Patient should not repeat information already provided to the AI
Priority assignment -- Urgent medical queries get different routing than billing questions
Status notification -- Patient informed of transfer and estimated wait time
Warm transfer -- AI introduces the human agent to the context before disconnecting

System Reliability¶

Multi-datacenter deployment with automatic failover
Gartner projection: By 2026, conversational AI will reduce agent labor costs by $80B, and 1 in 10 agent interactions will be automated
Nearly half of U.S. hospitals plan voice AI implementation by 2026

Recommendation for Vitaravox¶

Your P1 items -- adding transfer_call tool to Booking + Registration agents, and handoff_to_router_v3 to Registration agents -- are critical. The industry standard is that every agent must have an escape route to either another agent or a human. No agent should be a dead end.

Sources: - How to Implement AI Voice Agents in Healthcare (Retell) - Leading Voice AI Agents for Healthcare Triage 2025 (Prosper) - Best AI Voice Agents 2026 (GetVoIP) - AI Voice Agents: What They Are and How They Work 2026 (AssemblyAI)

8. Analytics and Quality Assurance¶

The QA Gap¶

Traditional QA teams review 1-2% of calls manually. Modern AI-powered QA evaluates 100% of calls automatically.

Leading Solutions¶

Solution	Capability
Retell Assure (Dec 2025)	Monitors 100% of calls, flags failures, assigns scores, recommends remediation
Cresta	Proprietary AI evaluates 100% of interactions against customizable scorecards
Genesys	Real-time analytics, AI-assisted agent coaching, dead-air detection
PolyAI Agent Studio	Built-in safety filters, analytics, workflow management
Vapi Evals	Functional testing via mock conversations -- pre-deployment only, not runtime QA

Key Metrics to Monitor¶

Containment rate -- % of calls fully handled without human escalation (target: 85%+)
Average handle time -- including AI processing time and dead air
First-call resolution -- did the patient's issue get resolved?
Latency per turn -- target sub-500ms
Sentiment drift -- real-time detection of patient frustration
Compliance violations -- missed disclosures, unauthorized PHI handling
Tool success rate -- % of API calls (OSCAR SOAP, etc.) that succeed
Escalation rate -- and reasons for escalation

Gap for Vitaravox¶

Vapi's HIPAA mode disables structured output storage, which cripples built-in analytics for PHI-containing calls. You need a parallel analytics pipeline -- your server-side webhook (/api/vapi) should log call metadata, tool success/failure rates, and conversation quality metrics to your own HIPAA-compliant datastore. The log_call_metadata function you absorbed into Booking/Modification/Registration is the right foundation for this.

Sources: - Top 10 Enterprise AI Voice Agent Vendors 2026 (Retell) - Top 10 Voice AI Agents for Regulated Customer Success 2026 - Voice AI in 2026 (AssemblyAI)

9. FHIR R4 Integration¶

Current State¶

96% of US hospitals have adopted FHIR APIs
FHIR R4 is the dominant version (22/38 respondents in industry surveys)
FHIR R6 expected 2026 with deeper AI and remote monitoring integration

Integration Patterns¶

SMART-on-FHIR -- Standard for third-party app authorization. Epic and Cerner both support it. AI agents use SMART-on-FHIR to securely fetch/update patient data.
HL7 v2 to FHIR R4 transformation -- An autonomous agent monitors HL7 v2 feeds, transforms to FHIR R4, writes structured data back to EHR with full audit trails.
REST API sync -- HL7/FHIR or REST APIs sync appointments, demographics, and insurance data for contextual voice agent responses.
Bi-directional EHR connectivity -- Voice AI platforms provide real-time data synchronization, handling complex appointment logic across provider types and locations.

OSCAR EMR Context¶

OSCAR's CXF SOAP API (shipping since OSCAR 12) is the universal connector -- not FHIR natively
Your OscarSoapAdapter approach is architecturally correct for current OSCAR deployments
WELL Health Technologies supports OSCAR EMR and is driving modernization in Canadian provinces
Tali AI offers OSCAR Pro integration for AI scribing
AlloMia offers AI voice agent integration with leading Canadian EMRs including OSCAR

Future Path for Vitaravox¶

Your architecture correctly separates the OSCAR-specific adapter (OscarSoapAdapter) from the booking engine abstraction. When clinics running Epic/Cerner adopt the platform, you add a FhirR4Adapter implementing the same interface. This multi-adapter pattern is the industry standard for serving heterogeneous EMR environments.

Sources: - FHIR Healthcare Interoperability Guide 2025 - Building AI Agents for Epic & Cerner EHRs - 7 HIPAA-Compliant AI Agent Use Cases (Augment Code) - AI Integration with Canada's Leading EMRs (AlloMia) - Tali AI - OSCAR Pro Integration

10. Multi-Tenant Architecture¶

Isolation Models¶

Model	Cost	Security	Use Case
Shared schema + tenant ID	Cheapest	Risky for PHI	NOT suitable for healthcare
Schema-per-tenant	Moderate	Good isolation, per-tenant migrations/backups	Small-to-medium clinic deployments
Database-per-tenant	Most expensive	Full isolation	Enterprise health systems demanding full compliance

For healthcare: Physical isolation is common for ultra-sensitive applications such as healthcare SaaS, while most business SaaS relies on logical isolation.

Architecture Best Practices for 2025-2026¶

Every data store (blob, vector, key-value) must be scoped to the tenant -- vector stores should never allow cross-tenant queries
Separate AI inference layer from core SaaS logic -- dedicated ML services per tenant or with strict tenant partitioning
Microservices + serverless components -- 2025-2026 emphasizes this over monoliths
Edge computing for latency-sensitive voice processing

Healthcare Voice AI Multi-Tenant Platforms¶

Synthflow: No-code voice AI with multi-location routing, directs callers to nearest clinic based on location
Prosper AI ($5M raise, October 2025): Default voice AI platform for healthcare's "$450B admin crisis" -- deep EHR integrations, blueprints for both patient-facing and back-office
Cognigy: AI agents for healthcare with enterprise multi-tenant support

Vitaravox Multi-Tenant Design¶

Your current architecture needs these additions for multi-clinic support:

Clinic configuration store -- timezone (currently hardcoded as America/Vancouver), operating hours, provider list, EMR adapter type, Vapi squad ID per clinic
Tenant-scoped SOAP/FHIR clients -- each clinic's EMR connection isolated with separate credentials
Per-clinic Vapi squads OR shared squad with clinic context injected via get_clinic_info tool
Onboarding pipeline -- your 9 pre-launch checks should be automated per tenant
Audit trail per tenant -- separate PHI logging per clinic for compliance

Sources: - How to Build Scalable Multi Tenant Architectures for AI SaaS (Brim Labs) - SaaS Architecture Best Practices 2025 (The Algo) - Multi-Tenancy in SaaS: Architecture, Benefits & Trends - Prosper AI Raises $5M (Healthcare IT Today) - Architectural Approaches for AI/ML in Multitenant Solutions (Microsoft)

Strategic Takeaways for Vitaravox¶

What You Are Doing Right¶

Squad architecture -- Multi-agent with specialized roles is the 2026 standard
Dual-track EN/ZH -- Native language tracks rather than translation layers
GPT-4o for both tracks -- Validated by Chinese medical exam research (84-88% accuracy)
ElevenLabs EN + Azure ZH -- Optimal TTS split
OscarSoapAdapter abstraction -- Ready for multi-EMR future
Tool-level request-start messages -- Replacing LLM filler phrases
GitOps config-as-code -- Industry best practice for voice agent management

Critical Gaps to Address¶

Runtime QA -- Vapi has no equivalent to Retell Assure. Build your own 100%-call monitoring pipeline.
Multi-tenant readiness -- Hardcoded timezone, single-clinic SOAP client, no per-clinic configuration store.
Omnichannel follow-up -- SMS/email appointment confirmations after voice booking (Vapi cannot do this natively).
P1 handoff completeness -- Every agent needs escape routes to either another agent or a human.
SOAP client warmup on PM2 startup -- Cold-start WSDL fetch penalty is a known issue; warm on boot.
ISO 27001 gap -- Neither Vapi nor your stack has this. Canadian healthcare (PHIPA/PIPA/HIA) may require it for enterprise clinic sales.
Deepgram Nova-3 Medical for EN track -- 3.45% WER on medical terminology would be a significant upgrade from Nova-2, but confirm Vapi supports it as a provider option.

Platform Risk¶

Vapi's known issues (breaking updates, poor support, no RBAC, no ISO 27001, real cost 5x advertised) represent genuine platform risk. If Retell AI ships a Squads-equivalent multi-agent feature, or if Parloa's AMP becomes accessible below enterprise pricing, a platform migration should be evaluated. Your GitOps approach and adapter pattern make such a migration feasible.