First Clinic Launch Plan — March 30, 2026¶

4-Week Execution Roadmap¶

Created: February 18, 2026¶

Constraints¶

Deadline: March 30, 2026 (hard)
Target: Single clinic deployment (English primary, Mandarin secondary)
Infrastructure: Current OCI ARM instance — no cloud migration
Automation: Claude Code available for code changes and deployments
Non-goals: Redis, ECS/Fargate, LiteLLM, LiveKit, horizontal scaling

What Must Be True on March 30¶

Real patient data is protected (rotated secrets, encrypted credentials, idempotent operations)
English booking/rescheduling/cancellation calls work reliably end-to-end
Mandarin track is either validated or explicitly disabled for launch
One clinic is fully onboarded (all 7 required pre-launch checks pass)
Operational runbook exists (what to do when things break at 2am)
Backup is tested (restore verified, not just dump verified)

Week 1 (Feb 19-25): Security Hardening + Stability¶

Theme: Make the system safe for real patient data.

Day 1-2: Secret Rotation & Environment Hardening¶

Task	Advisory Ref	Detail
Rotate JWT_SECRET	Security Finding #1	Generate 64-char random hex: `openssl rand -hex 32`. Update `.env` AND `ecosystem.config.cjs`
Rotate JWT_REFRESH_SECRET	Security Finding #1	Same — separate 64-char random hex
Rotate DATABASE_URL password	Security Finding #1	`ALTER USER vitara WITH PASSWORD '...';` then update `.env`
Fix CORS_ORIGIN in `.env`	Security Finding #1	Change to `https://dev.vitaravox.ca` (match ecosystem.config.cjs)
Verify OSCAR_SOAP_PASSWORD	Security Finding #1	Confirm this is the real clinic credential, not a test default
Verify ENCRYPTION_KEY is real	Security Finding #6	Already set (64-char hex) — confirm it encrypts/decrypts correctly

Validation: Restart PM2, confirm admin dashboard login works, confirm Vapi webhooks authenticate, confirm OSCAR SOAP connects.

Day 3-4: Idempotency + Webhook Audit Trail¶

Task	Advisory Ref	Detail
Add toolCallId dedup	Security Finding #10	Create `processed_tool_calls` table (toolCallId TEXT PK, result JSONB, created_at TIMESTAMPTZ, expires_at). Check before processing, cache result, return cached on retry. 24h TTL with daily cleanup.
Add webhook tool-call audit logging	Security Finding #19	On every tool call: write to `audit_logs` table with `action: 'vapi_tool_call'`, tool name, clinicId, callId, demographicId (if available), outcome (success/error). Use existing audit infrastructure.

-- Migration: add processed_tool_calls
CREATE TABLE processed_tool_calls (
  tool_call_id TEXT PRIMARY KEY,
  result JSONB NOT NULL,
  clinic_id TEXT,
  tool_name TEXT,
  created_at TIMESTAMPTZ DEFAULT NOW(),
  expires_at TIMESTAMPTZ DEFAULT NOW() + INTERVAL '24 hours'
);
CREATE INDEX idx_ptc_expires ON processed_tool_calls(expires_at);

Validation: Make two identical test calls with same toolCallId — second should return cached result. Check audit_logs table for tool call entries.

Day 5: PM2 Stability Investigation¶

Task	Advisory Ref	Detail
Investigate 6666 restart count	Infra Finding #1	`pm2 reset vitara-admin-api` to zero the counter. Monitor for 24h. If restarts accumulate, check PM2 error logs for crash patterns.
Add PM2 restart delay	Infra recommendation	Add `restart_delay: 5000` and `max_restarts: 50` to ecosystem.config.cjs
Deploy from compiled JS	Best practice	Switch from `tsx src/index.ts` to `node dist/index.js` in ecosystem.config — faster startup, lower memory

// ecosystem.config.cjs — updated
module.exports = {
  apps: [{
    name: 'vitara-admin-api',
    script: 'dist/index.js',                    // compiled, not tsx
    cwd: '/home/ubuntu/vitara-platform/admin-dashboard/server',
    env: {
      NODE_ENV: 'production',
      PORT: 3002,
      CORS_ORIGIN: 'https://dev.vitaravox.ca'
    },
    watch: false,
    max_memory_restart: '500M',
    restart_delay: 5000,
    max_restarts: 50,
    kill_timeout: 10000                          // match graceful shutdown timeout
  }]
};

Build + deploy:

cd /home/ubuntu/vitara-platform/admin-dashboard/server
npx tsc
pm2 delete vitara-admin-api
pm2 start ecosystem.config.cjs
pm2 save

Validation: pm2 list shows 0 restarts. Monitor for 48h over weekend.

Week 2 (Feb 26 - Mar 4): Voice Quality + Mandarin Decision¶

Theme: Make calls sound professional. Decide Mandarin fate.

Day 1-2: P1 Prompt Fixes (All 5 Items)¶

Task	P1 Ref	Detail
Restore CONVERSATION STYLE to all 8 non-Router prompts	P1 #1	Add back warm, professional tone guidance. Not filler phrases — just style: "Be warm, concise, and professional. Use natural transitions."
Add slot collision check to Booking EN/ZH prompts	P1 #2	Add instruction: "Before calling create_appointment, confirm the patient doesn't already have an appointment that day by checking the tool result." Server already prevents double-booking, but LLM should warn the patient.
Add `transfer_call` tool to Booking + Registration assistants	P1 #4	Add `transfer-call-d95ed81e` to toolIds in booking-en.md, booking-zh.md, registration-en.md, registration-zh.md YAML frontmatter
Add `handoff_to_router_v3` to Registration EN/ZH in squad YAML	P1 #5	Add handoff tool to Registration squad members in vitaravox-v3.yml so patients can escape registration flow
Warm SOAP clients on PM2 startup	P1 #3	Add `warmSoapClients()` call in index.ts after `server.listen()` — pre-fetch WSDL for Schedule, Demographic, Provider services

SOAP warmup implementation:

// In index.ts, after server.listen():
async function warmSoapClients() {
  try {
    const factory = EmrAdapterFactory.getInstance();
    // Warm the adapter for the launch clinic (clinicId from config)
    const clinicId = process.env.LAUNCH_CLINIC_ID;
    if (clinicId) {
      logger.info({ clinicId }, 'Warming SOAP clients for launch clinic...');
      const adapter = await factory.getAdapter(clinicId);
      if (adapter && 'warmClients' in adapter) {
        await (adapter as any).warmClients();
      }
      logger.info({ clinicId }, 'SOAP clients warmed successfully');
    }
  } catch (err) {
    logger.warn({ err }, 'SOAP client warmup failed (non-fatal)');
  }
}

Push prompts to Vapi:

cd /home/ubuntu/vitara-platform/vapi-gitops
npm run push:dev

Validation: Make 3 test calls (booking, reschedule, registration). Verify natural conversation tone. Verify first call of day doesn't have long delay.

Day 3-4: Mandarin Testing Sprint¶

This is a focused 2-day test to make a go/no-go decision.

Test matrix (8 scenarios):

#	Scenario	Pass Criteria
1	Call, say "中文" to trigger ZH track	Router detects and hands off to Patient-ID-ZH within 3s
2	Chinese caller identifies as existing patient	Phone lookup succeeds, name confirmed in Chinese
3	Chinese caller books appointment	Slot found, time communicated in Chinese, booking confirmed
4	Chinese caller reschedules	Existing appointment listed, new slot found, reschedule confirmed
5	Chinese caller registers as new patient	Name, DOB, phone collected in Chinese, registered
6	Chinese caller says English name (e.g., "John")	TTS doesn't mangle the English name
7	Chinese caller triggers emergency keywords (胸痛)	911 redirect fires
8	Chinese caller requests human (转人工)	transfer_call fires

Scoring:

7-8 pass → Mandarin launches with English
5-6 pass → Mandarin launches with documented limitations
<5 pass → Mandarin disabled for launch. Router prompt updated: "We currently support English only. Mandarin support coming soon."

How to disable Mandarin if needed:

# In router-v3.md prompt, replace language detection section with:
## LANGUAGE
Currently English only. If the caller speaks Mandarin or requests Chinese:
Say: "I'm sorry, we currently support English only.
      For Mandarin assistance, please call the clinic directly at [clinic phone]."
Do NOT route to Chinese track agents.

This is a business decision, not a technical failure. A buggy Mandarin experience is worse than no Mandarin. Better to launch English-only and add Mandarin in a v3.1 patch.

Day 5: Fix Whatever Mandarin Testing Reveals¶

Reserve this day for fixing issues found in Mandarin testing. Common issues from memory:

GPT-4o space-separated Chinese characters → monitor, may need prompt instruction "Never add spaces between Chinese characters"
Azure TTS pronunciation of English names in Chinese context → may need phonetic hints
Deepgram nova-2 ZH transcription accuracy → if poor, consider switching to AssemblyAI Universal for ZH track

Week 3 (Mar 5-11): Clinic Onboarding + Operational Readiness¶

Theme: Configure the actual clinic. Build the safety net.

Day 1-2: Clinic Onboarding¶

Complete all 7 required pre-launch checks for the target clinic:

Check	What's Needed	Who Provides It
1. Clinic info	Name, phone, address, timezone	Clinic admin
2. Business hours	Mon-Fri hours, closed days	Clinic admin
3. Providers	Provider names + OSCAR provider IDs	Clinic admin + OSCAR
4. EMR connection	OSCAR SOAP URL + credentials, verified connectivity	VitaraVox team
5. Vapi phone	Assign Telnyx number, configure in Vapi squad	VitaraVox team
6. Privacy officer	Name + email (PIPEDA requirement)	Clinic admin
7. Encrypted credentials	Verify AES-256-GCM encryption works for stored OSCAR creds	Automated check

Run onboarding validation:

curl -s https://api-dev.vitaravox.ca/api/admin/clinics/{clinicId}/onboarding \
  -H "Authorization: Bearer {token}" | jq '.data.checks'

All 7 required checks must show passed: true.

Day 3: Backup Verification¶

Task	Detail
Test backup script	Run `bash /home/ubuntu/vitara-platform/scripts/backup-db.sh` manually
Test restore	Create test database, `pg_restore` from latest backup, verify data integrity
Verify cron	`crontab -l` shows daily 2:00 AM backup
Test off-site copy	`scp` or `rsync` latest backup to a second location (even another directory is better than nothing)

Day 4: Monitoring Setup¶

Task	Detail
Uptime Kuma health check	Verify `GET /health` is monitored, alerts fire on failure
PM2 error monitoring	Set up `pm2 monit` or a simple cron that checks `pm2 jlist` for stopped status
Slack/email alerts	Configure Uptime Kuma to notify on downtime (Slack webhook or email)
OSCAR connectivity alert	Health endpoint already checks OSCAR — verify it reports degraded when OSCAR is unreachable

Simple PM2 watchdog (cron every 5 min):

#!/bin/bash
# /home/ubuntu/vitara-platform/scripts/pm2-watchdog.sh
STATUS=$(pm2 jlist | jq -r '.[0].pm2_env.status')
if [ "$STATUS" != "online" ]; then
  echo "ALERT: vitara-admin-api is $STATUS" | \
    curl -X POST -d "$(cat -)" https://hooks.slack.com/services/YOUR/WEBHOOK/URL
  pm2 restart vitara-admin-api
fi

Day 5: Operational Runbook¶

Create a single-page runbook for the on-call person (you, for now):

Scenario	Action
Server unreachable	SSH to OCI, check `pm2 status`, restart if needed: `pm2 restart vitara-admin-api`
OSCAR SOAP timeout	Check `/health` endpoint. If OSCAR is down, nothing to do — circuit breaker protects. Notify clinic.
Vapi webhook errors	Check `pm2 logs vitara-admin-api --lines 50`. Look for auth failures or 500s.
Database connection refused	`sudo systemctl status postgresql`. If down: `sudo systemctl start postgresql`
SSL certificate expired	`sudo certbot renew && sudo nginx -s reload`
Need to see PHI for debugging	`POST /api/admin/debug {"enabled": true}` — auto-expires in 4 hours
PM2 keeps restarting	`pm2 logs vitara-admin-api --err --lines 100` to find crash cause. May need to roll back last deploy.
Patient booked wrong slot	Check `audit_logs` + OSCAR directly. Manual fix in OSCAR admin UI.
Need to disable voice agent	Set clinic status to `inactive` in admin dashboard. Calls go to transfer number.

Week 4 (Mar 12-18): Staging Calls + Buffer¶

Theme: Simulate real usage. Fix what breaks. Keep buffer for surprises.

Day 1-2: Staging Call Marathon¶

Run 20+ end-to-end calls simulating real patients:

Call Type	Count	Variations
New patient booking	5	Morning slot, afternoon slot, specific doctor, any doctor, next available
Existing patient booking	3	Phone lookup success, phone lookup fail → name search
Reschedule	3	Pick from list, change doctor, change week
Cancel	2	With reason, without reason
Registration (new patient)	3	Full flow with health card, without health card, add to waitlist
Edge cases	4	Emergency keywords, request human, caller hangs up mid-flow, no available slots

For each call, verify:

[ ] Patient identified correctly (or fallback to name search works)
[ ] Appointment booked/modified in OSCAR (check OSCAR admin UI)
[ ] CallLog written to database with correct metadata
[ ] No duplicate bookings (check OSCAR schedule view)
[ ] Conversation sounded natural (not robotic, no dead air >3s)
[ ] Call ended cleanly (log_call_metadata fired)

Day 3: Fix Issues from Staging Calls¶

Reserve this entire day for fixing whatever the staging marathon reveals. Common patterns:

Prompt tweaks (LLM says the wrong thing in edge cases)
Timing issues (dead air during tool calls → add request-response-delayed messages)
OSCAR data issues (provider IDs don't match, schedule not configured)

Day 4-5: Buffer¶

Do not schedule work here. This is your safety net for:

Issues discovered during staging that take longer than a day
Clinic admin delays in providing information
OSCAR configuration issues on the clinic side
Last-minute Mandarin fixes if it was conditionally included

If nothing goes wrong (unlikely), use this time for: - Writing a "What's New" email to the clinic staff - Updating the changelog - Setting up a post-launch check-in schedule with the clinic

Go-Live: March 30 Week (Mar 19-30)¶

Pre-Launch Checklist (Mar 19)¶

Security:
  [  ] JWT_SECRET is not a dev default
  [  ] DATABASE_URL uses strong password
  [  ] ENCRYPTION_KEY encrypts/decrypts correctly
  [  ] VAPI_WEBHOOK_SECRET is set and enforced
  [  ] toolCallId idempotency is active
  [  ] Webhook tool calls are audited

Voice Quality:
  [  ] All 9 prompts have CONVERSATION STYLE sections
  [  ] transfer_call available on all agents
  [  ] handoff_to_router_v3 on Registration agents
  [  ] SOAP clients warm on startup
  [  ] Mandarin decision made and implemented

Clinic Configuration:
  [  ] All 7 onboarding checks pass
  [  ] OSCAR SOAP connection verified
  [  ] Clinic hours + holidays configured
  [  ] Providers mapped to OSCAR IDs
  [  ] Privacy officer documented
  [  ] Vapi phone number assigned and tested

Operations:
  [  ] PM2 restart count stable (<5 in 48h)
  [  ] Backup tested with successful restore
  [  ] Uptime Kuma monitoring active
  [  ] Alert notifications working (Slack/email)
  [  ] Runbook written and accessible
  [  ] Data retention job running (3:00 AM daily)

Soft Launch (Mar 24-28)¶

Day 1: Enable for clinic staff only (internal testing with real OSCAR data)
Day 2-3: Enable for first 10% of callers (route subset of calls to Vapi number)
Day 4-5: Monitor call logs, fix issues, expand to 50%

Full Launch (Mar 30)¶

Route all calls to Vapi number
Monitor first 4 hours actively (watch PM2 logs + call logs in real time)
Have OSCAR admin UI open to verify appointments are landing correctly
Keep clinic's original phone line as instant rollback (just revert the phone routing)

Mandarin Decision Matrix¶

Make the call by end of Week 2 (March 4).

                    ┌──────────────────────────────┐
                    │  Mandarin Testing Results     │
                    │  (8 scenarios tested)          │
                    └──────────┬───────────────────┘
                               │
                    ┌──────────┴───────────┐
                    │                      │
               7-8 pass                 5-6 pass              <5 pass
                    │                      │                      │
                    ▼                      ▼                      ▼
         ┌──────────────┐     ┌───────────────────┐   ┌──────────────────┐
         │ LAUNCH WITH  │     │ LAUNCH WITH       │   │ ENGLISH ONLY     │
         │ FULL ZH      │     │ DOCUMENTED        │   │                  │
         │              │     │ LIMITATIONS        │   │ Disable ZH in   │
         │ No changes   │     │                   │   │ Router prompt    │
         │ needed       │     │ Add warning to    │   │                  │
         │              │     │ clinic: "Mandarin │   │ "Mandarin support│
         │              │     │ may have accent   │   │  coming soon"    │
         │              │     │ recognition       │   │                  │
         │              │     │ limitations"      │   │ Add to v3.1      │
         └──────────────┘     └───────────────────┘   │ roadmap          │
                                                       └──────────────────┘

Advisory Items Mapped to This Plan¶

From Security Analysis (30 findings)¶

Finding	Severity	Week	Action
#1 Hardcoded JWT defaults	CRITICAL	Week 1	Rotate all secrets
#2 Dev mode auth skipped	CRITICAL	Week 1	Verify production mode is enforced
#6 Encryption key not enforced	CRITICAL	Week 1	Verify key works
#10 Missing idempotency	MEDIUM	Week 1	Add toolCallId dedup table
#19 Missing webhook audit	MEDIUM	Week 1	Add tool-call audit logging
#3 VAPI_API_KEY in webhook	HIGH	Deferred	Low risk for single clinic
#4 metadata.clinicId unvalidated	HIGH	Deferred	Single clinic = low risk
#7 Token management gaps	HIGH	Deferred	Acceptable for pilot
#12 Rate limiting bypasses	MEDIUM	Deferred	Single clinic = low traffic

From Infrastructure Advisory¶

Item	Week	Action
6666 PM2 restarts	Week 1	Reset counter, add restart limits, switch to compiled JS
No centralized logging	Deferred	PM2 logs + Uptime Kuma sufficient for 1 clinic
Single server (no HA)	Deferred	Acceptable risk for pilot with monitoring
No disaster recovery plan	Week 3	Verify backup + create runbook
Dev passwords in production	Week 1	Rotate all

From P1 Fix List¶

Item	Week	Action
Restore CONVERSATION STYLE	Week 2	Add to all 8 non-Router prompts
Slot collision in prompts	Week 2	Add instruction to Booking + Modification
SOAP warmup on startup	Week 2	Add warmSoapClients() to index.ts
transfer_call on Booking + Registration	Week 2	Add to toolIds + squad YAML
handoff_to_router_v3 on Registration	Week 2	Add to squad YAML

Explicitly Deferred (Post-Launch)¶

Item	Why Deferred
Redis (distributed state)	Single instance handles 1 clinic
ECS Fargate (auto-scaling)	OCI ARM is sufficient for pilot volume
LiteLLM (LLM proxy)	GPT-4o hardwired is fine for 1 clinic
LiveKit (voice pipeline)	Vapi works, don't touch before launch
Observability (Datadog/Grafana)	PM2 logs + health checks are enough for pilot
Multi-clinic timezone support	Single clinic = one timezone
Canadian data residency (Azure OpenAI)	Acceptable risk for pilot with BAA
WAF / DDoS protection	Low traffic, low risk for pilot

Risk Register¶

Risk	Likelihood	Impact	Mitigation
OSCAR SOAP connection fails on launch day	Medium	High	Test daily during Week 3-4. Have clinic's direct line as fallback.
PM2 crash loop returns	Low	High	Compiled JS + restart limits. Watchdog cron restarts and alerts.
LLM hallucinates appointment details	Low	High	Server-side validation catches wrong dates/times. Slot collision check prevents double-booking.
Mandarin calls garbled	Medium	Medium	Mandarin go/no-go decision by Mar 4. Easy to disable in Router prompt.
Clinic staff unfamiliar with system	Medium	Medium	Pre-launch training call. Written runbook for common questions.
Patient data breach	Very Low	Very High	Rotated secrets, encrypted creds, HMAC auth, audit trail, PHI redaction.

Post-Launch Roadmap (April+)¶

After successful pilot, sequence back to the enterprise stack plan:

Month	Phase	Trigger
April	Phase 1: Redis	Preparing for second clinic
May	Phase 2: RDS + Observability	Before third clinic
June	Phase 3: ECS Fargate	Before scaling past 5 clinics
July	Phase 4: LiteLLM	When per-clinic cost tracking needed
Q3-Q4	Phase 5: LiveKit	When 5-language support or Vapi costs demand it