First Clinic Launch Plan — March 30, 2026¶
4-Week Execution Roadmap¶
Created: February 18, 2026¶
Constraints¶
- Deadline: March 30, 2026 (hard)
- Target: Single clinic deployment (English primary, Mandarin secondary)
- Infrastructure: Current OCI ARM instance — no cloud migration
- Automation: Claude Code available for code changes and deployments
- Non-goals: Redis, ECS/Fargate, LiteLLM, LiveKit, horizontal scaling
What Must Be True on March 30¶
- Real patient data is protected (rotated secrets, encrypted credentials, idempotent operations)
- English booking/rescheduling/cancellation calls work reliably end-to-end
- Mandarin track is either validated or explicitly disabled for launch
- One clinic is fully onboarded (all 7 required pre-launch checks pass)
- Operational runbook exists (what to do when things break at 2am)
- Backup is tested (restore verified, not just dump verified)
Week 1 (Feb 19-25): Security Hardening + Stability¶
Theme: Make the system safe for real patient data.
Day 1-2: Secret Rotation & Environment Hardening¶
| Task | Advisory Ref | Detail |
|---|---|---|
| Rotate JWT_SECRET | Security Finding #1 | Generate 64-char random hex: openssl rand -hex 32. Update .env AND ecosystem.config.cjs |
| Rotate JWT_REFRESH_SECRET | Security Finding #1 | Same — separate 64-char random hex |
| Rotate DATABASE_URL password | Security Finding #1 | ALTER USER vitara WITH PASSWORD '...'; then update .env |
Fix CORS_ORIGIN in .env |
Security Finding #1 | Change to https://dev.vitaravox.ca (match ecosystem.config.cjs) |
| Verify OSCAR_SOAP_PASSWORD | Security Finding #1 | Confirm this is the real clinic credential, not a test default |
| Verify ENCRYPTION_KEY is real | Security Finding #6 | Already set (64-char hex) — confirm it encrypts/decrypts correctly |
Validation: Restart PM2, confirm admin dashboard login works, confirm Vapi webhooks authenticate, confirm OSCAR SOAP connects.
Day 3-4: Idempotency + Webhook Audit Trail¶
| Task | Advisory Ref | Detail |
|---|---|---|
| Add toolCallId dedup | Security Finding #10 | Create processed_tool_calls table (toolCallId TEXT PK, result JSONB, created_at TIMESTAMPTZ, expires_at). Check before processing, cache result, return cached on retry. 24h TTL with daily cleanup. |
| Add webhook tool-call audit logging | Security Finding #19 | On every tool call: write to audit_logs table with action: 'vapi_tool_call', tool name, clinicId, callId, demographicId (if available), outcome (success/error). Use existing audit infrastructure. |
-- Migration: add processed_tool_calls
CREATE TABLE processed_tool_calls (
tool_call_id TEXT PRIMARY KEY,
result JSONB NOT NULL,
clinic_id TEXT,
tool_name TEXT,
created_at TIMESTAMPTZ DEFAULT NOW(),
expires_at TIMESTAMPTZ DEFAULT NOW() + INTERVAL '24 hours'
);
CREATE INDEX idx_ptc_expires ON processed_tool_calls(expires_at);
Validation: Make two identical test calls with same toolCallId — second should return cached result. Check audit_logs table for tool call entries.
Day 5: PM2 Stability Investigation¶
| Task | Advisory Ref | Detail |
|---|---|---|
| Investigate 6666 restart count | Infra Finding #1 | pm2 reset vitara-admin-api to zero the counter. Monitor for 24h. If restarts accumulate, check PM2 error logs for crash patterns. |
| Add PM2 restart delay | Infra recommendation | Add restart_delay: 5000 and max_restarts: 50 to ecosystem.config.cjs |
| Deploy from compiled JS | Best practice | Switch from tsx src/index.ts to node dist/index.js in ecosystem.config — faster startup, lower memory |
// ecosystem.config.cjs — updated
module.exports = {
apps: [{
name: 'vitara-admin-api',
script: 'dist/index.js', // compiled, not tsx
cwd: '/home/ubuntu/vitara-platform/admin-dashboard/server',
env: {
NODE_ENV: 'production',
PORT: 3002,
CORS_ORIGIN: 'https://dev.vitaravox.ca'
},
watch: false,
max_memory_restart: '500M',
restart_delay: 5000,
max_restarts: 50,
kill_timeout: 10000 // match graceful shutdown timeout
}]
};
Build + deploy:
cd /home/ubuntu/vitara-platform/admin-dashboard/server
npx tsc
pm2 delete vitara-admin-api
pm2 start ecosystem.config.cjs
pm2 save
Validation: pm2 list shows 0 restarts. Monitor for 48h over weekend.
Week 2 (Feb 26 - Mar 4): Voice Quality + Mandarin Decision¶
Theme: Make calls sound professional. Decide Mandarin fate.
Day 1-2: P1 Prompt Fixes (All 5 Items)¶
| Task | P1 Ref | Detail |
|---|---|---|
| Restore CONVERSATION STYLE to all 8 non-Router prompts | P1 #1 | Add back warm, professional tone guidance. Not filler phrases — just style: "Be warm, concise, and professional. Use natural transitions." |
| Add slot collision check to Booking EN/ZH prompts | P1 #2 | Add instruction: "Before calling create_appointment, confirm the patient doesn't already have an appointment that day by checking the tool result." Server already prevents double-booking, but LLM should warn the patient. |
Add transfer_call tool to Booking + Registration assistants |
P1 #4 | Add transfer-call-d95ed81e to toolIds in booking-en.md, booking-zh.md, registration-en.md, registration-zh.md YAML frontmatter |
Add handoff_to_router_v3 to Registration EN/ZH in squad YAML |
P1 #5 | Add handoff tool to Registration squad members in vitaravox-v3.yml so patients can escape registration flow |
| Warm SOAP clients on PM2 startup | P1 #3 | Add warmSoapClients() call in index.ts after server.listen() — pre-fetch WSDL for Schedule, Demographic, Provider services |
SOAP warmup implementation:
// In index.ts, after server.listen():
async function warmSoapClients() {
try {
const factory = EmrAdapterFactory.getInstance();
// Warm the adapter for the launch clinic (clinicId from config)
const clinicId = process.env.LAUNCH_CLINIC_ID;
if (clinicId) {
logger.info({ clinicId }, 'Warming SOAP clients for launch clinic...');
const adapter = await factory.getAdapter(clinicId);
if (adapter && 'warmClients' in adapter) {
await (adapter as any).warmClients();
}
logger.info({ clinicId }, 'SOAP clients warmed successfully');
}
} catch (err) {
logger.warn({ err }, 'SOAP client warmup failed (non-fatal)');
}
}
Push prompts to Vapi:
Validation: Make 3 test calls (booking, reschedule, registration). Verify natural conversation tone. Verify first call of day doesn't have long delay.
Day 3-4: Mandarin Testing Sprint¶
This is a focused 2-day test to make a go/no-go decision.
Test matrix (8 scenarios):
| # | Scenario | Pass Criteria |
|---|---|---|
| 1 | Call, say "中文" to trigger ZH track | Router detects and hands off to Patient-ID-ZH within 3s |
| 2 | Chinese caller identifies as existing patient | Phone lookup succeeds, name confirmed in Chinese |
| 3 | Chinese caller books appointment | Slot found, time communicated in Chinese, booking confirmed |
| 4 | Chinese caller reschedules | Existing appointment listed, new slot found, reschedule confirmed |
| 5 | Chinese caller registers as new patient | Name, DOB, phone collected in Chinese, registered |
| 6 | Chinese caller says English name (e.g., "John") | TTS doesn't mangle the English name |
| 7 | Chinese caller triggers emergency keywords (胸痛) | 911 redirect fires |
| 8 | Chinese caller requests human (转人工) | transfer_call fires |
Scoring:
- 7-8 pass → Mandarin launches with English
- 5-6 pass → Mandarin launches with documented limitations
- <5 pass → Mandarin disabled for launch. Router prompt updated: "We currently support English only. Mandarin support coming soon."
How to disable Mandarin if needed:
# In router-v3.md prompt, replace language detection section with:
## LANGUAGE
Currently English only. If the caller speaks Mandarin or requests Chinese:
Say: "I'm sorry, we currently support English only.
For Mandarin assistance, please call the clinic directly at [clinic phone]."
Do NOT route to Chinese track agents.
This is a business decision, not a technical failure. A buggy Mandarin experience is worse than no Mandarin. Better to launch English-only and add Mandarin in a v3.1 patch.
Day 5: Fix Whatever Mandarin Testing Reveals¶
Reserve this day for fixing issues found in Mandarin testing. Common issues from memory:
- GPT-4o space-separated Chinese characters → monitor, may need prompt instruction "Never add spaces between Chinese characters"
- Azure TTS pronunciation of English names in Chinese context → may need phonetic hints
- Deepgram nova-2 ZH transcription accuracy → if poor, consider switching to AssemblyAI Universal for ZH track
Week 3 (Mar 5-11): Clinic Onboarding + Operational Readiness¶
Theme: Configure the actual clinic. Build the safety net.
Day 1-2: Clinic Onboarding¶
Complete all 7 required pre-launch checks for the target clinic:
| Check | What's Needed | Who Provides It |
|---|---|---|
| 1. Clinic info | Name, phone, address, timezone | Clinic admin |
| 2. Business hours | Mon-Fri hours, closed days | Clinic admin |
| 3. Providers | Provider names + OSCAR provider IDs | Clinic admin + OSCAR |
| 4. EMR connection | OSCAR SOAP URL + credentials, verified connectivity | VitaraVox team |
| 5. Vapi phone | Assign Telnyx number, configure in Vapi squad | VitaraVox team |
| 6. Privacy officer | Name + email (PIPEDA requirement) | Clinic admin |
| 7. Encrypted credentials | Verify AES-256-GCM encryption works for stored OSCAR creds | Automated check |
Run onboarding validation:
curl -s https://api-dev.vitaravox.ca/api/admin/clinics/{clinicId}/onboarding \
-H "Authorization: Bearer {token}" | jq '.data.checks'
All 7 required checks must show passed: true.
Day 3: Backup Verification¶
| Task | Detail |
|---|---|
| Test backup script | Run bash /home/ubuntu/vitara-platform/scripts/backup-db.sh manually |
| Test restore | Create test database, pg_restore from latest backup, verify data integrity |
| Verify cron | crontab -l shows daily 2:00 AM backup |
| Test off-site copy | scp or rsync latest backup to a second location (even another directory is better than nothing) |
Day 4: Monitoring Setup¶
| Task | Detail |
|---|---|
| Uptime Kuma health check | Verify GET /health is monitored, alerts fire on failure |
| PM2 error monitoring | Set up pm2 monit or a simple cron that checks pm2 jlist for stopped status |
| Slack/email alerts | Configure Uptime Kuma to notify on downtime (Slack webhook or email) |
| OSCAR connectivity alert | Health endpoint already checks OSCAR — verify it reports degraded when OSCAR is unreachable |
Simple PM2 watchdog (cron every 5 min):
#!/bin/bash
# /home/ubuntu/vitara-platform/scripts/pm2-watchdog.sh
STATUS=$(pm2 jlist | jq -r '.[0].pm2_env.status')
if [ "$STATUS" != "online" ]; then
echo "ALERT: vitara-admin-api is $STATUS" | \
curl -X POST -d "$(cat -)" https://hooks.slack.com/services/YOUR/WEBHOOK/URL
pm2 restart vitara-admin-api
fi
Day 5: Operational Runbook¶
Create a single-page runbook for the on-call person (you, for now):
| Scenario | Action |
|---|---|
| Server unreachable | SSH to OCI, check pm2 status, restart if needed: pm2 restart vitara-admin-api |
| OSCAR SOAP timeout | Check /health endpoint. If OSCAR is down, nothing to do — circuit breaker protects. Notify clinic. |
| Vapi webhook errors | Check pm2 logs vitara-admin-api --lines 50. Look for auth failures or 500s. |
| Database connection refused | sudo systemctl status postgresql. If down: sudo systemctl start postgresql |
| SSL certificate expired | sudo certbot renew && sudo nginx -s reload |
| Need to see PHI for debugging | POST /api/admin/debug {"enabled": true} — auto-expires in 4 hours |
| PM2 keeps restarting | pm2 logs vitara-admin-api --err --lines 100 to find crash cause. May need to roll back last deploy. |
| Patient booked wrong slot | Check audit_logs + OSCAR directly. Manual fix in OSCAR admin UI. |
| Need to disable voice agent | Set clinic status to inactive in admin dashboard. Calls go to transfer number. |
Week 4 (Mar 12-18): Staging Calls + Buffer¶
Theme: Simulate real usage. Fix what breaks. Keep buffer for surprises.
Day 1-2: Staging Call Marathon¶
Run 20+ end-to-end calls simulating real patients:
| Call Type | Count | Variations |
|---|---|---|
| New patient booking | 5 | Morning slot, afternoon slot, specific doctor, any doctor, next available |
| Existing patient booking | 3 | Phone lookup success, phone lookup fail → name search |
| Reschedule | 3 | Pick from list, change doctor, change week |
| Cancel | 2 | With reason, without reason |
| Registration (new patient) | 3 | Full flow with health card, without health card, add to waitlist |
| Edge cases | 4 | Emergency keywords, request human, caller hangs up mid-flow, no available slots |
For each call, verify:
- [ ] Patient identified correctly (or fallback to name search works)
- [ ] Appointment booked/modified in OSCAR (check OSCAR admin UI)
- [ ] CallLog written to database with correct metadata
- [ ] No duplicate bookings (check OSCAR schedule view)
- [ ] Conversation sounded natural (not robotic, no dead air >3s)
- [ ] Call ended cleanly (log_call_metadata fired)
Day 3: Fix Issues from Staging Calls¶
Reserve this entire day for fixing whatever the staging marathon reveals. Common patterns:
- Prompt tweaks (LLM says the wrong thing in edge cases)
- Timing issues (dead air during tool calls → add request-response-delayed messages)
- OSCAR data issues (provider IDs don't match, schedule not configured)
Day 4-5: Buffer¶
Do not schedule work here. This is your safety net for:
- Issues discovered during staging that take longer than a day
- Clinic admin delays in providing information
- OSCAR configuration issues on the clinic side
- Last-minute Mandarin fixes if it was conditionally included
If nothing goes wrong (unlikely), use this time for: - Writing a "What's New" email to the clinic staff - Updating the changelog - Setting up a post-launch check-in schedule with the clinic
Go-Live: March 30 Week (Mar 19-30)¶
Pre-Launch Checklist (Mar 19)¶
Security:
[ ] JWT_SECRET is not a dev default
[ ] DATABASE_URL uses strong password
[ ] ENCRYPTION_KEY encrypts/decrypts correctly
[ ] VAPI_WEBHOOK_SECRET is set and enforced
[ ] toolCallId idempotency is active
[ ] Webhook tool calls are audited
Voice Quality:
[ ] All 9 prompts have CONVERSATION STYLE sections
[ ] transfer_call available on all agents
[ ] handoff_to_router_v3 on Registration agents
[ ] SOAP clients warm on startup
[ ] Mandarin decision made and implemented
Clinic Configuration:
[ ] All 7 onboarding checks pass
[ ] OSCAR SOAP connection verified
[ ] Clinic hours + holidays configured
[ ] Providers mapped to OSCAR IDs
[ ] Privacy officer documented
[ ] Vapi phone number assigned and tested
Operations:
[ ] PM2 restart count stable (<5 in 48h)
[ ] Backup tested with successful restore
[ ] Uptime Kuma monitoring active
[ ] Alert notifications working (Slack/email)
[ ] Runbook written and accessible
[ ] Data retention job running (3:00 AM daily)
Soft Launch (Mar 24-28)¶
- Day 1: Enable for clinic staff only (internal testing with real OSCAR data)
- Day 2-3: Enable for first 10% of callers (route subset of calls to Vapi number)
- Day 4-5: Monitor call logs, fix issues, expand to 50%
Full Launch (Mar 30)¶
- Route all calls to Vapi number
- Monitor first 4 hours actively (watch PM2 logs + call logs in real time)
- Have OSCAR admin UI open to verify appointments are landing correctly
- Keep clinic's original phone line as instant rollback (just revert the phone routing)
Mandarin Decision Matrix¶
Make the call by end of Week 2 (March 4).
┌──────────────────────────────┐
│ Mandarin Testing Results │
│ (8 scenarios tested) │
└──────────┬───────────────────┘
│
┌──────────┴───────────┐
│ │
7-8 pass 5-6 pass <5 pass
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌───────────────────┐ ┌──────────────────┐
│ LAUNCH WITH │ │ LAUNCH WITH │ │ ENGLISH ONLY │
│ FULL ZH │ │ DOCUMENTED │ │ │
│ │ │ LIMITATIONS │ │ Disable ZH in │
│ No changes │ │ │ │ Router prompt │
│ needed │ │ Add warning to │ │ │
│ │ │ clinic: "Mandarin │ │ "Mandarin support│
│ │ │ may have accent │ │ coming soon" │
│ │ │ recognition │ │ │
│ │ │ limitations" │ │ Add to v3.1 │
└──────────────┘ └───────────────────┘ │ roadmap │
└──────────────────┘
Advisory Items Mapped to This Plan¶
From Security Analysis (30 findings)¶
| Finding | Severity | Week | Action |
|---|---|---|---|
| #1 Hardcoded JWT defaults | CRITICAL | Week 1 | Rotate all secrets |
| #2 Dev mode auth skipped | CRITICAL | Week 1 | Verify production mode is enforced |
| #6 Encryption key not enforced | CRITICAL | Week 1 | Verify key works |
| #10 Missing idempotency | MEDIUM | Week 1 | Add toolCallId dedup table |
| #19 Missing webhook audit | MEDIUM | Week 1 | Add tool-call audit logging |
| #3 VAPI_API_KEY in webhook | HIGH | Deferred | Low risk for single clinic |
| #4 metadata.clinicId unvalidated | HIGH | Deferred | Single clinic = low risk |
| #7 Token management gaps | HIGH | Deferred | Acceptable for pilot |
| #12 Rate limiting bypasses | MEDIUM | Deferred | Single clinic = low traffic |
From Infrastructure Advisory¶
| Item | Week | Action |
|---|---|---|
| 6666 PM2 restarts | Week 1 | Reset counter, add restart limits, switch to compiled JS |
| No centralized logging | Deferred | PM2 logs + Uptime Kuma sufficient for 1 clinic |
| Single server (no HA) | Deferred | Acceptable risk for pilot with monitoring |
| No disaster recovery plan | Week 3 | Verify backup + create runbook |
| Dev passwords in production | Week 1 | Rotate all |
From P1 Fix List¶
| Item | Week | Action |
|---|---|---|
| Restore CONVERSATION STYLE | Week 2 | Add to all 8 non-Router prompts |
| Slot collision in prompts | Week 2 | Add instruction to Booking + Modification |
| SOAP warmup on startup | Week 2 | Add warmSoapClients() to index.ts |
| transfer_call on Booking + Registration | Week 2 | Add to toolIds + squad YAML |
| handoff_to_router_v3 on Registration | Week 2 | Add to squad YAML |
Explicitly Deferred (Post-Launch)¶
| Item | Why Deferred |
|---|---|
| Redis (distributed state) | Single instance handles 1 clinic |
| ECS Fargate (auto-scaling) | OCI ARM is sufficient for pilot volume |
| LiteLLM (LLM proxy) | GPT-4o hardwired is fine for 1 clinic |
| LiveKit (voice pipeline) | Vapi works, don't touch before launch |
| Observability (Datadog/Grafana) | PM2 logs + health checks are enough for pilot |
| Multi-clinic timezone support | Single clinic = one timezone |
| Canadian data residency (Azure OpenAI) | Acceptable risk for pilot with BAA |
| WAF / DDoS protection | Low traffic, low risk for pilot |
Risk Register¶
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| OSCAR SOAP connection fails on launch day | Medium | High | Test daily during Week 3-4. Have clinic's direct line as fallback. |
| PM2 crash loop returns | Low | High | Compiled JS + restart limits. Watchdog cron restarts and alerts. |
| LLM hallucinates appointment details | Low | High | Server-side validation catches wrong dates/times. Slot collision check prevents double-booking. |
| Mandarin calls garbled | Medium | Medium | Mandarin go/no-go decision by Mar 4. Easy to disable in Router prompt. |
| Clinic staff unfamiliar with system | Medium | Medium | Pre-launch training call. Written runbook for common questions. |
| Patient data breach | Very Low | Very High | Rotated secrets, encrypted creds, HMAC auth, audit trail, PHI redaction. |
Post-Launch Roadmap (April+)¶
After successful pilot, sequence back to the enterprise stack plan:
| Month | Phase | Trigger |
|---|---|---|
| April | Phase 1: Redis | Preparing for second clinic |
| May | Phase 2: RDS + Observability | Before third clinic |
| June | Phase 3: ECS Fargate | Before scaling past 5 clinics |
| July | Phase 4: LiteLLM | When per-clinic cost tracking needed |
| Q3-Q4 | Phase 5: LiveKit | When 5-language support or Vapi costs demand it |