Infrastructure & Operations Analysis¶

VitaraVox Enterprise Readiness Analysis¶

Date: February 17, 2026 | Updated: 2026-03-09 (v4.3.0)¶

Agent: Infrastructure & Operations Analyst¶

Update Log (v4.3.0 — 2026-03-09)

Since the original audit, the following changes have been deployed:

Change	Impact on This Analysis
SMS Booking Confirmation (Telnyx)	New outbound SMS path; Telnyx API key in env
OSCAR OAuth REST (preferRest flag)	Dual adapter path: SOAP + OAuth REST with split circuit breakers
Provider Config v3.1	3-level inheritance (global → clinic → provider)
Graceful Shutdown	10s drain timeout implemented in `index.ts:169-188`
Debug Manager	`VITARA_DEBUG` with 4h auto-expiry for trace-level PHI logging
Audit Middleware	Now global (all POST/PUT/PATCH/DELETE), not route-specific
express.json size limit	500KB limit added (`index.ts:49`)
9 pre-launch onboarding checks	Clinic readiness validation before go-live

Sections below retain the original audit text. Inline annotations marked [v4.3.0] where findings have been addressed.

INFRASTRUCTURE & OPERATIONAL READINESS ANALYSIS - VitaraVox Platform¶

EXECUTIVE SUMMARY¶

VitaraVox is a single-server, voice-enabled EMR appointment system running on a moderately-sized Linux instance with good foundational security but several operational gaps for true enterprise production deployment. The current setup is development-grade with production aspirations - highly feature-rich but lacking scalability, disaster recovery automation, and zero-downtime deployment capabilities.

Status: READY FOR BETA/PILOT | NOT READY FOR MULTI-CLINIC ENTERPRISE SCALE

1. INFRASTRUCTURE TOPOLOGY¶

Current Deployment¶

Server: Single Ubuntu 24.04 LTS on Oracle Cloud Infrastructure (OCI) ARM, Toronto region
Architecture: Monolithic, single-tenant-unaware at infrastructure layer
Deployment Model: Traditional - no Kubernetes, no auto-scaling

Running Services (Production)¶

PostgreSQL 16 (primary database) - 3 instances visible in process list (main + replicas or backup instances)
nginx (reverse proxy) - 1 master + 2 workers + cache manager
Node.js PM2-managed server - vitara-admin-api (PID 1542090, 24h uptime)
Running: /usr/bin/node --require tsx/dist/preflight.cjs src/index.ts
Port: 3002 (behind nginx reverse proxy)
Memory: ~144MB
Heap: 77.97% used
Standalone Services (co-located infrastructure):
Mattermost (internal comms) - Node.js server
Outline (wiki/documentation) - Node.js server
n8n (workflow automation) - Node.js process
Zatuka Stack components (Vikunja, Uptime Kuma)

Scalability Assessment¶

Current: Vertical only (single instance)
Bottleneck: PostgreSQL connections (max 5 concurrent vitara connections visible in pg_stat_activity)
Load handling: Rate limiting in place (auth: 5/min, webhook: 300/min, api: 100/min)
No clustering: No horizontal scaling, no multi-server replication

2. DATABASE STRATEGY¶

Primary Database: PostgreSQL 16¶

Connection: postgresql://vitara:vitara_dev_password@localhost:5432/vitara_platform
Max Connections Visible: 5 concurrent (connection pooling not visible in config)

Schema¶

Multi-tenant design: - clinics (root entity) - clinic_config (per-clinic OSCAR credentials, encrypted) - clinic_providers (provider display names + metadata) - clinic_hours + clinic_holidays (scheduling constraints) - waitlist (registration waitlist when closed) - call_logs (Vapi call analytics - indexed on clinic_id, created_at, vapi_call_id) - audit_logs (PIPEDA compliance - indexed on clinic_id, user_id, created_at, action) - onboarding_progress (clinic go-live checklist) - support_tickets + ticket_messages (support system) - users + notifications (multi-tenant auth)

Data Security (Encryption)¶

ENCRYPTION_KEY=8065ff53b55a09ffd320e64327288f898017513a6715ff7378e6817d4b7a7f68 (64-char hex = 32 bytes AES)
Encrypted Fields:
  - oscar_consumer_secret_encrypted
  - oscar_token_secret_encrypted
  - clinic_config.vapi_webhook_secret (implied)

Backup Strategy¶

Script: /home/ubuntu/vitara-platform/scripts/backup-db.sh
- Uses pg_dump with gzip compression
- Retention: 14 days of daily backups
- Location: /home/ubuntu/vitara-platform/backups/db/
- Cron: "0 2 * * *" (daily at 2:00 AM) - via install-cron.sh
- No off-site replication visible
- Last backup: 2026-02-10 (visible in backups/ directory structure: vapi-20260210/)

Risk: Single-server backup with no cross-region replication. Patient data (OSCAR) is NOT backed up by Vitara - that's clinic's responsibility via OSCAR's native backup.

3. APPLICATION ARCHITECTURE¶

Tech Stack¶

Component	Version	Notes
Node.js	18.19.1	Compiled with ES2020 target
Express	4.18.2	Minimal REST framework
TypeScript	5.3.3	Strict mode enabled
Prisma	5.22.0	ORM + migrations
PostgreSQL Driver	@prisma/client 5.22.0	Connection pooling via Prisma

Server Architecture (`admin-dashboard/server`)¶

Entry Point: /src/index.ts (166 lines)

Key Middleware Stack [v4.3.0 corrected order per index.ts]: 1. helmet() - Security headers (CSP, HSTS, etc.) — index.ts:39 2. requestLogger - Structured logging via Pino — index.ts:42 3. cors() - CORS with credentialed requests — index.ts:45 4. express.json({ limit: '500kb' }) - Body parsing — index.ts:49 5. auditMiddleware - Global POST/PUT/PATCH/DELETE mutation logging — index.ts:52 6. Rate limiting (3-tier: auth 5/min, webhook 300/min, api 100/min) — per-route 7. Vapi webhook authentication (HMAC-SHA256 + API key + Bearer token support)

Route Organization: - /api/auth - Login/JWT refresh (5/min rate limit) - /api/vapi - Webhook tool handlers (300/min rate limit) - HIGHEST TRAFFIC - /vapi-webhook - Legacy webhook URL (backward compat) - /api/* - Dashboard/clinic management (100/min rate limit) - GET /health - Real health checks (used by Uptime Kuma monitoring)

Critical Services¶

1. Health Service (`health.service.ts`)¶

Performs real, parallelized health checks: - PostgreSQL (SELECT 1) - OSCAR Bridge REST (GET /health) - Vapi API (GET /assistant with Bearer token) - Returns: status (healthy/degraded/down), latency per service, uptime

2. Vapi Webhook Authentication (`vapi-auth.ts`)¶

Supports 3 auth methods (in order): 1. HMAC-SHA256 signature verification (x-vapi-signature + x-vapi-timestamp) - 5-minute replay window - Constant-time comparison to prevent timing attacks 2. API key header (x-api-key) 3. Bearer token (Authorization: Bearer )

Security: In production, BLOCKS ALL REQUESTS if VAPI_WEBHOOK_SECRET is not set. Dev mode skips auth.

3. Audit Middleware (`audit.service.ts`)¶

Captures POST/PUT/PATCH/DELETE mutations
Redacts 23 sensitive fields (passwords, secrets, tokens, encryption keys)
Logs: user ID, email, action, resource, resourceId, clinic ID, IP, user agent, response time
Non-blocking writes (async catch-and-log pattern)
Compliance: PIPEDA 4.1.4

4. Job Scheduler (`scheduler.ts`)¶

Uses node-cron
Runs data retention purge daily at 3:00 AM
Single job visible (data retention)

OSCAR Adapter Pattern (Critical for Booking)¶

Two Adapters Available:

OscarBridgeAdapter (Legacy, REST-based)
Calls OSCAR via REST bridge at http://15.222.50.48:3000/api/v1
Thin wrapper around bridge endpoints
Problem: Bridge is DEV-ONLY; customers don't have this
X-API-Key authentication
OscarSoapAdapter (Production, SOAP-based)
Direct SOAP connection to OSCAR CXF web services
Uses node-soap + WSSecurity (UsernameToken only, NO Timestamp element)
Circuit breakers per service (4s timeout, 50% error threshold, 30s reset)
Handles JAXB Calendar serialization quirks (OSCAR returns Date objects, not strings)
OAuth 1.0a for patient registration (REST API path)
Bridge URL as fallback for phone search (SOAP has no phone search)
Timezone-aware: Configurable clinic timezone (default: America/Vancouver)
[v4.3.0] OscarUniversalAdapter (Hybrid, preferred)
preferRest flag routes through OAuth REST when available (Kai-hosted EMRs)
Split circuit breakers: separate breakers for SOAP vs REST paths
OAuth REST bypasses Kai CloudFlare WAF (which blocks SOAP content-inspection)
Provider 3-tier fallback: REST → SOAP → Bridge
DEFAULT_EMR_TYPE now defaults to oscar-universal (not oscar-soap)

Circuit Breaker Configuration:

Timeout: 4000ms (must be < Vapi's 5s tool-call timeout)
Error Threshold: 50%
Reset Timeout: 30s
Services: ScheduleService, DemographicService, ProviderService

4. RATE LIMITING & DDoS PROTECTION¶

Express Rate Limiting (Built-in)¶

authLimiter:    5 requests/minute per IP
webhookLimiter: 300 requests/minute per IP  
apiLimiter:     100 requests/minute per IP

Trust Proxy: app.set('trust proxy', 1) - Reads real IP from first proxy (nginx)

WAF / Advanced DDoS¶

NOT IMPLEMENTED: No Cloudflare, AWS WAF, or equivalent
RISK: Direct exposure to DDoS attacks on public IP

5. SSL/TLS & REVERSE PROXY¶

Nginx Configuration¶

Master Process: nginx (root)
Worker Processes: 2 workers + cache manager
Inferred Config:
HTTPS termination (SSL/TLS)
Reverse proxy to Node.js on 3002
Response compression (gzip visible in logs)
Cache manager process visible

SSL/TLS Status¶

Obtained via: Inferred from nginx + Let's Encrypt standard practice
Certificate Path: Not accessible (typical: /etc/nginx/ssl/)
Root Cause: nginx runs as root, fs restricted
HSTS: Present in response headers (max-age=31536000; includeSubDomains)
Modern TLS: Likely TLS 1.2+ (nginx >= 1.14)

Reverse Proxy Headers¶

Request headers show proper proxy forwarding:

x-real-ip: 99.185.125.26
x-forwarded-for: 99.185.125.26
x-forwarded-proto: https

6. PM2 PROCESS MANAGEMENT¶

Current Process¶

Process ID: vitara-admin-api
Status: online (6666 restarts! ⚠️)
Uptime: 24h
Script: tsx src/index.ts
Exec Mode: fork_mode
Node.js: 18.19.1 with NODE_ENV=production
Heap Usage: 77.97% (16.16 MiB / 20.72 MiB)
Event Loop Latency: 0.45ms (p95: 1.42ms)

Configuration¶

Restart Strategy: Unknown (likely always/continuous)
6666 restarts in 24h = ~277 crashes per hour ⚠️ CRITICAL CONCERN
Log Paths:
Out: /home/ubuntu/.pm2/logs/vitara-admin-api-out.log
Error: /home/ubuntu/.pm2/logs/vitara-admin-api-error.log
Monitoring: PM2 Plus (not enabled) - shows heapdump/profiling available via CLI

Gap: No Ecosystem Config Found¶

No ecosystem.config.js in repo
PM2 started ad-hoc (not via config file)
Risk: Restart strategy not version-controlled
Missing: Watch & reload, cluster mode, auto-restart on crash (if enabled, why so many restarts?)

7. MONITORING & LOGGING¶

Application Logging (Pino)¶

Production: JSON structured output for log aggregation
Development: Pretty-printed with colors
Log Levels: trace, debug, info, warn, error, fatal
Current Level: info (production) | debug (dev)
Module: pino@10.3.1 + pino-http@11.0.0

Health Endpoint¶

GET /health - Returns detailed service health (database, OSCAR bridge, Vapi)
Used by Uptime Kuma (visible in /opt/zatuka-stack/ - separate service)
Returns HTTP 200 (healthy/degraded), HTTP 503 (down)

Request Logging¶

Every request logged with: - Request ID (UUID for correlation) - Method, URL, query, params, headers - Response status code, latency - User-Agent, IP, Referer

Sample log: 401 response to /api/notifications with full request/response context

Log Aggregation¶

Logs written to: /home/ubuntu/.pm2/logs/vitara-admin-api-*.log
Rotation: PM2 default rotation (likely daily/size-based)
Centralized logging: NOT VISIBLE - no Elasticsearch/Splunk/Datadog integration

Missing Monitoring¶

No Prometheus metrics export
No Grafana dashboards visible
No distributed tracing (Jaeger, Datadog APM)
No error tracking (Sentry)
No APM agent (New Relic, Datadog)

8. ENVIRONMENT MANAGEMENT¶

Secrets & Configuration¶

Current (.env):

PORT=3002
NODE_ENV=production
CORS_ORIGIN=http://localhost:5174 (Note: dev URL in prod config!)

JWT_SECRET=vitara-jwt-secret-dev-2026-change-in-prod (⚠️ WEAK DEFAULT)
VAPI_API_KEY=0fec5f0b-12e8-4782-b961-9740818da17e
VAPI_WEBHOOK_SECRET=0b02f50574bee8b21f59210f19d8bc1a1a880675127ba7dae41c778e88552e49

OSCAR_BRIDGE_URL=http://15.222.50.48:3000/api/v1
OSCAR_SOAP_URL=https://15.222.50.48:8443/oscar
OSCAR_SOAP_USERNAME=129
OSCAR_SOAP_PASSWORD=admin2025 (⚠️ PLAINTEXT PASSWORD!)

DATABASE_URL=postgresql://vitara:vitara_dev_password@localhost:5432/vitara_platform (⚠️ DEV PASSWORD)
ENCRYPTION_KEY=8065ff53... (64-char hex, looks good)

Environment Validation¶

Framework: Zod schema validation at startup
Behavior:
Production: Fails fast if required secrets missing (EXIT 1)
Development: Continues with warnings
Validation Rules:
JWT_SECRET: min 16 chars (production), default fallback (dev)
ENCRYPTION_KEY: exactly 64 hex chars (production), optional (dev)
VAPI_WEBHOOK_SECRET: required (production), skipped (dev)
All EMR URLs have defaults

Secrets Management¶

Method: .env file (gitignored)
Rotation: Manual
Secure Storage: Unknown (likely plaintext on disk until deployed)
No: AWS Secrets Manager, Vault, or equivalent

9. INFRASTRUCTURE-AS-CODE & DEPLOYMENT¶

Terraform¶

File: /home/ubuntu/vitara-platform/terraform/oscar-ec2.tf
Scope: OSCAR EMR instance deployment (NOT main VitaraPlatform)
Provider: AWS (ca-central-1 region)
EC2 Instance: t3a.medium, 30GB gp3 SSD

Detected: - OSCAR EMR (dev instance) is deployed to a separate AWS EC2 instance in ca-central-1 (isolation good!) - VitaraVox platform runs on a separate OCI ARM instance in Toronto region (NOT on AWS) - No Terraform for VitaraVox platform itself — only the dev OSCAR EC2 - User-data script includes Docker, Node.js setup for OSCAR

Vapi GitOps¶

Directory: /home/ubuntu/vitara-platform/vapi-gitops/
Pattern: Declarative YAML configs for Vapi squads/assistants
v3 Squad Config: /resources/squads/vitaravox-v3.yml
Tools: 14+ YAML files in /resources/tools/ (squad member definitions)
Push Script: npm run push:dev (via GitOps)

Vapi v3 Architecture: - Router (entry point) - Patient-ID EN/ZH (language detection) - Booking EN/ZH (appointment booking) - Modification EN/ZH (reschedule/cancel) - Registration EN/ZH (new patient signup) - All use handoff tools for routing

10. ZERO-DOWNTIME DEPLOYMENT CAPABILITY¶

Current State: ⚠️ LIMITED¶

Graceful Shutdown Implementation:

// In src/index.ts
function gracefulShutdown(signal: string) {
  server.close(() => {
    logger.info('All connections drained, exiting');
    process.exit(0);
  });
  // Force exit after 10s if connections don't drain
  setTimeout(() => process.exit(1), 10_000);
}
process.on('SIGTERM', () => gracefulShutdown('SIGTERM'));
process.on('SIGINT', () => gracefulShutdown('SIGINT'));

[v4.3.0] Improved: - ✅ Graceful SIGTERM handling — index.ts:169-188 - ✅ 10s drain timeout with forced exit — critical for non-atomic rescheduling (book-then-cancel) - ❌ No health check for connection draining - ❌ No load balancer integration - ❌ No database migration strategy for zero-downtime - ❌ No blue-green or canary deployment - ❌ PM2 cluster mode not enabled (would allow rolling restarts)

Database Migrations¶

Tool: Prisma (npm run db:migrate)
Gap: No automated pre-deployment migrations in CI/CD
Manual process: Human must run migrations before deploying code

11. DISASTER RECOVERY READINESS¶

What's Protected¶

✅ Daily PostgreSQL backups (14-day retention)
✅ Encrypted credentials in database
✅ Audit trail (audit_logs table)
✅ Configuration version-controlled (git)

Critical Gaps¶

❌ No cross-region replication
❌ No RTO/RPO defined
❌ Backup not tested for restore (potential corruption unknown)
❌ OSCAR patient data NOT backed up by VitaraPlatform
❌ No failover mechanism (single instance = single point of failure)
❌ No documented recovery procedure

OSCAR Patient Data¶

Ownership: Clinic (OSCAR instance)
VitaraPlatform Role: Reads only via SOAP/OAuth
Backup Responsibility: Clinic's OSCAR admin
Data Loss Risk: If clinic's OSCAR is compromised, call history still in Vitara DB

12. SCALABILITY & CONCURRENCY¶

Current Limits¶

Database Connections: - Visible: 5 concurrent vitara connections - Unknown max: Not visible in psql configs accessed - Prisma pooling: Enabled via @prisma/client - Risk: Under load, connection exhaustion possible

Node.js Memory: - Heap: 77.97% used on single process - Uptime: 24h without memory leak visible - Requests: Multiple concurrent (no limit enforced above rate limiting)

Circuit Breaker Limits: - OSCAR SOAP timeout: 4000ms - Vapi webhook timeout: ~5000ms (Vapi's standard) - Error threshold: 50% before breaking

Can This Scale to Multi-Clinic?¶

Current Architecture Can Support: - ✅ Up to ~50-100 clinics (PostgreSQL multi-tenancy designed) - ✅ Up to ~1000 concurrent calls (rate limiting + circuit breakers) - ✅ Clinic data isolation (no data leakage between clinics)

Current Architecture CANNOT Support: - ❌ 1000+ concurrent calls (single Node.js process, single server) - ❌ Geographic distribution (single region) - ❌ High availability (no redundancy) - ❌ Independent clinic scaling (all share single instance) - ❌ Zero-downtime deployments (no clustering or load balancing)

13. SECURITY POSTURE¶

What's Good¶

✅ Helmet security headers
✅ HTTPS/TLS enforced
✅ Rate limiting on auth endpoints
✅ HMAC-SHA256 webhook signature verification
✅ Constant-time token comparison (prevents timing attacks)
✅ Encryption at rest (clinic secrets encrypted with AES)
✅ Audit logging for all mutations (PIPEDA 4.1.4)
✅ Input validation via Zod schema
✅ CORS properly configured with credentials
✅ Sensit fields redacted in audit logs
✅ Node TLS reject unauthorized disabled for self-signed OSCAR cert (acceptable for private network)

What's Weak¶

❌ JWT secrets weak in production (comment says "change in prod")
❌ Database password in .env: vitara_dev_password (DEV VALUE!)
❌ OSCAR SOAP password in plaintext: admin2025
❌ CORS_ORIGIN set to dev URL: http://localhost:5174
❌ No rate limiting on /health endpoint (could be DOS vector)
❌ No CSRF protection visible (SPA doesn't need it, but check middleware)
❌ No API key rotation policy
❌ No IP whitelisting for critical endpoints
❌ Webhook signature stored in config (not rotating)

Compliance Status¶

PIPEDA: Partial compliance
✅ Audit logging
✅ Encryption at rest
⚠️ Access controls (not visible)
❌ Data minimization (full PHI in PHI-DEBUG mode)
❌ Breach notification procedure not documented
PHIPA (Ontario): Unknown
PIPA (BC): Unknown

14. VAPI INTEGRATION & WEBHOOK HANDLING¶

Webhook Endpoints¶

POST /api/vapi/search-patient-by-phone
POST /api/vapi/search-patient
POST /api/vapi/find-earliest-appointment
POST /api/vapi/create-appointment
POST /api/vapi/update-appointment
POST /api/vapi/cancel-appointment
POST /api/vapi/register-new-patient
POST /api/vapi/get-clinic-info
POST /api/vapi/check-appointments
POST /api/vapi/add-to-waitlist
POST /api/vapi/get-patient
POST /api/vapi/get-providers
POST /api/vapi/transfer-call
POST /api/vapi/log-call-metadata

Rate Limit: 300 req/min (bursts allowed)

Tool Inventory¶

14 Vapi tools defined in /vapi-gitops/resources/tools/: - search-patient-4889f4e5.yml - update-appointment-635f59ef.yml - get-clinic-info-aaec50cf.yml - check-appointments-74246333.yml - transfer-call-d95ed81e.yml - search-patient-by-phone-8474536c.yml - add-to-waitlist-0153bac0.yml - cancel-appointment-f6cef2e7.yml - register-new-patient-9a888e09.yml - find-earliest-appointment-7fc7534d.yml - get-patient-d86dee47.yml - log-call-metadata-4619b3cb.yml - get-providers-1ffa2c33.yml - create-appointment-65213356.yml

15. KEY OPERATIONAL METRICS¶

Metric	Value	Assessment
Application Restarts (24h)	6666	🔴 CRITICAL - needs investigation
Heap Usage	77.97%	🟡 High but stable
Event Loop Latency (p95)	1.42ms	🟢 Healthy
DB Connections	5 concurrent	🟡 Low utilization but pooling unknown
Uptime	24h	🟢 Stable despite restarts
Backup Retention	14 days	🟡 Adequate for dev, low for production
Database Size	Unknown	⚠️ Not visible
Rate Limit Headroom	Low	🟡 Webhook at 300/min, typical burst traffic unknown
SSL/TLS Certificate Expiry	Unknown	❌ Not monitorable from here

16. CRITICAL RECOMMENDATIONS¶

Immediate (Week 1)¶

Investigate 6666 PM2 restarts - Memory leak? Segfault? Update logs
Fix production .env:
Change JWT_SECRET from dev value
Change DATABASE_URL password from vitara_dev_password
Change OSCAR_SOAP_PASSWORD from admin2025
Change CORS_ORIGIN from localhost:5174 to production domain
Enable SSL certificate monitoring - Let's Encrypt cert expires in ~90 days?
Set VAPI_DEFAULT_SQUAD_ID in production (currently uses v3 squad hardcoded)

Short Term (Month 1)¶

Implement log aggregation - Centralized logging (ELK, Datadog, or CloudWatch)
Create PM2 ecosystem.config.js - Version-control restart strategy, add cluster mode
Database backup testing - Monthly restore dry-run to S3 or backup server
Add database connection pooling tuning - Set max_connections in postgresql.conf based on load
WAF deployment - Cloudflare or AWS WAF in front of nginx
SSL certificate auto-renewal - Verify certbot runs monthly

Medium Term (Quarter 1)¶

Enable PM2 cluster mode - Rolling restarts without downtime (4 workers per instance)
Multi-region replication - PostgreSQL streaming replication to standby
Load balancer + health checks - AWS ALB or HAProxy (prepare for multi-instance)
Distributed tracing - Add Jaeger/Datadog APM for troubleshooting
Database migrations in CI/CD - Automated pre-deployment Prisma migrations
Secret rotation policy - Quarterly for API keys, immediately for breaches
Comprehensive DR plan - Document RTO/RPO, test failover quarterly

Long Term (Year 1)¶

Kubernetes migration - EKS or GKE for true multi-clinic scaling
Disaster recovery site - Geo-distributed failover (AWS multi-region)
HA PostgreSQL - Managed RDS with automatic failover
Compliance automation - Regular PIPEDA audits, SIEM integration
Performance optimization - Query optimization, caching layer (Redis), CDN for static assets

17. DEPLOYMENT SCRIPT ANALYSIS¶

Backup Script (`/home/ubuntu/vitara-platform/scripts/backup-db.sh`)¶

Uses pg_dump with gzip compression
Automatic daily cron via install-cron.sh
Retention: 14 days
No verification of restore capability
No off-site sync

CONCLUSION¶

VitaraVox Infrastructure Status: BETA-READY, PRODUCTION-ASPIRING

Strengths: - Multi-tenant database design - Real health checking - Proper HMAC webhook auth - Audit logging for compliance - Encrypted credential storage - Circuit breaker pattern for resilience

Critical Weaknesses: - Single-server architecture (no HA) - 6666 PM2 restarts unexplained - Production secrets use dev values - No centralized logging - No zero-downtime deployment - Limited backup testing - No disaster recovery plan - Dev logging in production config

Next Launch Target: Pilot program with 1-2 clinics, after fixing secrets and investigating restarts. Full enterprise deployment requires Kubernetes, multi-region replication, and comprehensive monitoring.

v4.3.0 UPDATE SUMMARY (2026-03-09)¶

Original Finding	Status	Detail
Middleware order undocumented	FIXED	Full stack order with source refs added to deployment docs
No graceful shutdown drain	IMPROVED	10s drain timeout implemented (`index.ts:169-188`)
OSCAR adapter single path	IMPROVED	Dual SOAP + OAuth REST with split circuit breakers
No SMS capability	ADDED	Telnyx SMS with 5-guard consent chain, 6 templates
No debug mode	ADDED	`VITARA_DEBUG` with 4h auto-expiry, env or API activation
Audit middleware route-specific	FIXED	Now global on all mutations (POST/PUT/PATCH/DELETE)
Body size unbounded	FIXED	500KB limit on `express.json()`
No onboarding validation	ADDED	9 pre-launch checks before clinic go-live

Remaining Critical Gaps (unchanged from original audit):

Single-server architecture (no HA)
PM2 restart investigation needed
Production secrets still use dev values (pre-launch hardening planned)
No centralized logging
No WAF/DDoS protection
No cross-region backup replication