Monitoring & Troubleshooting¶
Common issues and solutions for the PM2 + SOAP + REST production stack
Last Updated: 2026-03-09 (v4.3.0)
Quick Reference¶
| Component | How to Check | Port |
|---|---|---|
| API Server | pm2 status vitara-admin-api |
3002 |
| PostgreSQL | pg_isready -h localhost -p 5432 |
5432 |
| NGINX | sudo systemctl status nginx |
80/443 |
| OSCAR SOAP | curl -s <OSCAR_URL>/ws/ScheduleService?wsdl |
8080/443 |
| Vapi API | curl -s -H "Authorization: Bearer $VAPI_API_KEY" https://api.vapi.ai/assistant |
— |
1. API Issues¶
401 Unauthorized¶
Symptom: API returns 401 Unauthorized on any endpoint.
Causes:
- Missing or invalid
x-api-keyheader on Vapi webhook requests - Expired or invalid JWT token on admin API requests
VAPI_WEBHOOK_SECRETnot set in production (server rejects all webhook requests)
Diagnosis:
# Check PM2 process is running
pm2 status vitara-admin-api
# Look for auth rejection logs
pm2 logs vitara-admin-api --lines 50 | grep -i "auth\|401\|VAPI AUTH"
# Verify VAPI_WEBHOOK_SECRET is set in environment
pm2 env vitara-admin-api | grep VAPI_WEBHOOK_SECRET
Solution:
# If VAPI_WEBHOOK_SECRET is missing, add it to the .env file
cd /home/ubuntu/vitara-platform/admin-dashboard/server
nano .env # Add VAPI_WEBHOOK_SECRET=<your-secret>
# Rebuild and restart
npx tsc && pm2 restart vitara-admin-api
Vapi Webhook Auth Methods
The server accepts three auth methods in order: HMAC-SHA256 signature (x-vapi-signature + x-vapi-timestamp), API key (x-api-key header), or Bearer token. Ensure the method configured in Vapi matches VAPI_WEBHOOK_SECRET.
429 Rate Limited¶
Symptom: API returns 429 Too Many Requests.
Causes:
- Exceeding rate limit (5/min auth, 100/min API, 300/min webhooks)
- Vapi retrying failed tool calls rapidly
Diagnosis:
Solution:
- Wait for the rate limit window to reset (1 minute)
- If Vapi is retrying, check why the original tool call is failing (likely a downstream timeout)
- Implement exponential backoff in any custom API clients
503 Service Unavailable¶
Symptom: API returns 503 Service Unavailable.
Causes:
- PM2 process
vitara-admin-apiis down or restarting - PostgreSQL connection pool exhausted
- OSCAR SOAP circuit breaker is OPEN (too many recent failures)
Diagnosis:
# Check PM2 status
pm2 status vitara-admin-api
# Check for circuit breaker messages
pm2 logs vitara-admin-api --lines 100 | grep -i "circuit.breaker\|OPEN\|503"
# Check health endpoint directly (bypasses NGINX)
curl -s http://localhost:3002/health | python3 -m json.tool
Solution:
# If process is stopped or errored
pm2 restart vitara-admin-api
# If circuit breaker is open, check OSCAR connectivity first
curl -s <OSCAR_SOAP_URL>/ws/ScheduleService?wsdl | head -5
# Circuit breaker auto-resets after 30 seconds (half-open state)
# Monitor recovery:
pm2 logs vitara-admin-api --lines 20 | grep "HALF-OPEN\|CLOSED"
Circuit Breaker Behavior
The circuit breaker opens after 50% of requests fail within the monitoring window. When open, all SOAP calls fail immediately with 503 for 30 seconds. After that, one test request is allowed (half-open). If it succeeds, the breaker closes.
2. Database Issues¶
Connection Failed¶
Symptom: ECONNREFUSED or Connection refused in PM2 logs.
Causes:
- PostgreSQL service not running
- Wrong
DATABASE_URLin.env - Port 5432 conflict
Diagnosis:
# Check PostgreSQL is running
pg_isready -h localhost -p 5432
# Check service status
sudo systemctl status postgresql
# Verify connection manually
psql -h localhost -U vitara -d vitara_platform -c "SELECT 1"
# Check what's using port 5432
sudo ss -tlnp | grep 5432
Solution:
# Start PostgreSQL if stopped
sudo systemctl start postgresql
# If DATABASE_URL is wrong, check .env
cat /home/ubuntu/vitara-platform/admin-dashboard/server/.env | grep DATABASE_URL
# Restart PM2 after fixing .env
pm2 restart vitara-admin-api
Authentication Failed¶
Symptom: FATAL: password authentication failed for user "vitara" in logs.
Causes:
- Password in
DATABASE_URLdoes not match PostgreSQL user password - User does not exist in PostgreSQL
Diagnosis:
# Check the DATABASE_URL password matches
cat /home/ubuntu/vitara-platform/admin-dashboard/server/.env | grep DATABASE_URL
# Try connecting with the credentials
psql "postgresql://vitara:<password>@localhost:5432/vitara_platform" -c "SELECT 1"
# List PostgreSQL users
sudo -u postgres psql -c "\du"
Solution:
# Reset the user password in PostgreSQL
sudo -u postgres psql -c "ALTER USER vitara WITH PASSWORD 'new_password';"
# Update DATABASE_URL in .env to match
# Then restart
pm2 restart vitara-admin-api
Migration Failed¶
Symptom: Tables do not exist, Prisma errors about missing relations.
Diagnosis:
# Check current migration status
cd /home/ubuntu/vitara-platform/admin-dashboard/server
npx prisma migrate status
# List existing tables
psql -h localhost -U vitara -d vitara_platform -c "\dt"
Solution:
# Run pending migrations
cd /home/ubuntu/vitara-platform/admin-dashboard/server
npx prisma migrate deploy
# If schema is out of sync (development only!)
npx prisma db push
# Verify tables were created
psql -h localhost -U vitara -d vitara_platform -c "\dt"
# Restart PM2
pm2 restart vitara-admin-api
Never use prisma migrate reset in production
This drops all data. Use prisma migrate deploy to apply pending migrations without data loss.
3. OSCAR SOAP Issues¶
Production EMR Path
The production integration uses OscarSoapAdapter (WS-Security over SOAP). The REST bridge (OscarBridgeAdapter) is legacy and used only in development/fallback. Set DEFAULT_EMR_TYPE=oscar-soap in .env.
WS-Security SecurityError¶
Symptom: SecurityError or WSSecurityException from OSCAR SOAP calls.
Cause: The SOAP client is sending a <wsu:Timestamp> element in the WS-Security header. OSCAR's CXF WSS4J configuration has no Timestamp action configured and rejects it.
Diagnosis:
Solution:
Verify the WS-Security options in OscarSoapAdapter.ts are correct:
// CORRECT — hasTimeStamp MUST be false
client.setSecurity(new soap.WSSecurity(username, password, {
passwordType: 'PasswordText',
mustUnderstand: true,
hasTimeStamp: false, // REQUIRED: Timestamp causes SecurityError
hasNonce: false
}));
Critical: hasTimeStamp: false is REQUIRED
OSCAR CXF has no Timestamp action configured in its WSS4J policy. Including a <wsu:Timestamp> element causes an immediate SecurityError. This is the #1 cause of SOAP auth failures after code changes.
If someone has modified the adapter:
# Check the current setting
grep -n "hasTimeStamp" /home/ubuntu/vitara-platform/admin-dashboard/server/src/adapters/OscarSoapAdapter.ts
# Rebuild and restart after fixing
cd /home/ubuntu/vitara-platform/admin-dashboard/server
npx tsc && pm2 restart vitara-admin-api
WSDL Fetch Timeout / Cold-Start¶
Symptom: First SOAP call after PM2 restart times out or takes 8-15 seconds. Subsequent calls are fast.
Cause: node-soap fetches and parses the WSDL on first use. This cold-start fetch can exceed the 4-second circuit breaker timeout, causing the first request to fail.
Diagnosis:
Solution:
# After PM2 restart, warm the SOAP clients manually
curl -s http://localhost:3002/health | python3 -m json.tool
# The health endpoint triggers OSCAR connectivity checks
# which forces WSDL fetching
# For a more thorough warm-up, call the API once:
curl -s -X POST http://localhost:3002/api/vapi \
-H "Content-Type: application/json" \
-H "x-api-key: <your-webhook-secret>" \
-d '{"message":{"type":"tool-calls","toolCalls":[]}}'
Future Fix
A server-startup SOAP client warm-up is planned but not yet implemented. Until then, expect the first tool call after PM2 restart to be slow or fail.
SOAP Connection Timeout¶
Symptom: ETIMEDOUT or ECONNREFUSED on SOAP calls. Circuit breaker opens.
Causes:
- OSCAR server is down or unreachable
- Firewall blocking outbound connections to OSCAR host
- Wrong
OSCAR_SOAP_URLin.env
Diagnosis:
# Test raw connectivity to OSCAR
curl -s --connect-timeout 5 <OSCAR_SOAP_URL>/ws/ScheduleService?wsdl | head -5
# Check configured URL
pm2 env vitara-admin-api | grep OSCAR_SOAP
# Check for timeout errors
pm2 logs vitara-admin-api --lines 50 | grep -i "timeout\|ETIMEDOUT\|ECONNREFUSED"
Solution:
# Verify network connectivity
ping -c 3 <OSCAR_HOST>
telnet <OSCAR_HOST> <OSCAR_PORT>
# If firewall is blocking, check security group / iptables
sudo iptables -L -n | grep <OSCAR_PORT>
# After connectivity is restored, the circuit breaker will
# auto-recover in ~30 seconds (half-open → closed)
pm2 logs vitara-admin-api | grep "HALF-OPEN"
Timeout Budget
SOAP calls have a 4-second timeout (circuit breaker). Vapi tool calls have a 5-second timeout. If OSCAR takes longer than 4 seconds to respond, the circuit breaker trips. This leaves only 1 second for Node.js processing + network overhead.
SOAP Credential Errors¶
Symptom: 401 Unauthorized or Authentication failed from OSCAR SOAP.
Causes:
- Wrong
OSCAR_SOAP_USERNAMEorOSCAR_SOAP_PASSWORD - OSCAR user account locked or disabled
- Password expired
Diagnosis:
pm2 env vitara-admin-api | grep OSCAR_SOAP_USERNAME
pm2 logs vitara-admin-api --lines 30 | grep -i "auth\|credential\|401"
Solution:
# Verify credentials work with a direct SOAP test
# (replace with actual values)
curl -s -X POST "<OSCAR_URL>/ws/ScheduleService" \
-H "Content-Type: text/xml" \
-d '<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
<soapenv:Header>
<wsse:Security xmlns:wsse="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd">
<wsse:UsernameToken>
<wsse:Username>YOUR_USERNAME</wsse:Username>
<wsse:Password Type="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-username-token-profile-1.0#PasswordText">YOUR_PASSWORD</wsse:Password>
</wsse:UsernameToken>
</wsse:Security>
</soapenv:Header>
<soapenv:Body/>
</soapenv:Envelope>'
Patient Not Found¶
Symptom: SOAP searchDemographic returns empty results, but patient exists in OSCAR.
Causes:
- Phone number format mismatch (OSCAR stores as
6045551234, Vapi sends+16045551234) - Name spelling differences or accent characters
- Searching the wrong OSCAR instance
Diagnosis:
# Check recent search attempts
pm2 logs vitara-admin-api --lines 50 | grep -i "searchDemographic\|patient.*not found"
Solution:
- The server normalizes
+1XXXXXXXXXXtoXXXXXXXXXXbefore SOAP search. If the phone format differs, check thenormalizePhone()function in the adapter. - Verify against OSCAR directly: log in to the OSCAR web UI and search for the patient.
- Non-numeric
providerIdvalues like"any"or"任何"must be handled server-side as "search all providers."
4. OSCAR Bridge Issues (Legacy)¶
Legacy / Development Only
The REST bridge (OscarBridgeAdapter) is the legacy EMR path retained for development and fallback. Production uses OscarSoapAdapter (SOAP). Set DEFAULT_EMR_TYPE=oscar-soap in .env for production.
Bridge Connection Failed¶
Symptom: Health check shows oscarBridge: down.
Diagnosis:
# Check if bridge is configured
pm2 env vitara-admin-api | grep OSCAR_BRIDGE
# Test bridge directly
curl -s -H "X-API-Key: <bridge-api-key>" http://15.222.50.48:3000/api/v1/health
# Check PM2 logs
pm2 logs vitara-admin-api --lines 20 | grep -i "bridge\|OSCAR Bridge"
Solution:
# If bridge is not needed in production, ensure SOAP is primary:
# In .env:
# DEFAULT_EMR_TYPE=oscar-soap
# Restart
pm2 restart vitara-admin-api
Bridge Timeout
The bridge phone-search timeout was reduced from 10s to 4s to match the SOAP adapter. If the bridge is slow, it will trip the same 4-second budget.
5. Voice Agent Issues¶
Vapi Webhook Not Calling Server¶
Symptom: Voice calls connect, but no tool-call requests arrive at the server.
Causes:
- Wrong webhook URL configured in Vapi squad/assistant
- SSL certificate issues preventing Vapi from reaching the server
- NGINX not proxying to port 3002
- Firewall blocking inbound HTTPS
Diagnosis:
# Check PM2 is receiving ANY requests
pm2 logs vitara-admin-api --lines 50 | grep -i "POST\|webhook\|tool-call"
# Test the webhook endpoint externally
curl -s -X POST https://api.vitaravox.ca/api/vapi \
-H "Content-Type: application/json" \
-H "x-api-key: <secret>" \
-d '{"message":{"type":"tool-calls","toolCalls":[]}}'
# Check NGINX is proxying correctly
sudo nginx -t
curl -s http://localhost:3002/health
Solution:
- Verify the webhook URL in the Vapi dashboard matches
https://api.vitaravox.ca/api/vapi - Ensure the SSL certificate is valid (see NGINX Issues)
- Check that NGINX proxies
/api/vapitolocalhost:3002 - Verify firewall allows inbound 443
Call Drops Mid-Conversation¶
Symptom: Calls disconnect unexpectedly during tool execution.
Causes:
- Tool response exceeds Vapi's 5-second timeout
- OSCAR SOAP is slow, circuit breaker trips (4s timeout)
- Vapi
maxDurationexceeded on the call - Server crash during request processing
Diagnosis:
# Check for slow responses or timeouts
pm2 logs vitara-admin-api --lines 100 | grep -i "timeout\|slow\|circuit\|OPEN\|crash"
# Check PM2 process stability
pm2 status vitara-admin-api
# Look at "restart" count — high restart count = crashes
# Check Vapi call logs in the Vapi dashboard for detailed error info
Solution:
- If SOAP is slow: check OSCAR server performance, network latency
- If circuit breaker is tripping: the 4-second SOAP timeout is non-negotiable (Vapi allows 5s total). Fix the underlying OSCAR latency.
- If PM2 is restarting: check for uncaught exceptions (see PM2 Issues)
Wrong Language Response¶
Symptom: Agent responds in English when patient speaks Mandarin, or vice versa.
Causes:
- Router STT (AssemblyAI Universal) misidentified the language
- Patient was transferred to the wrong language track
- GPT-4o outputting space-separated Chinese characters (known issue)
Diagnosis:
# Check Vapi call logs for transcription output
# In Vapi dashboard: Calls → select call → Transcript tab
# Check which assistant handled the call
pm2 logs vitara-admin-api --lines 50 | grep -i "assistant\|transfer\|handoff\|language"
Solution:
- Router STT: The Router uses AssemblyAI Universal Multilingual for bilingual detection. If detection is poor, review the audio quality (background noise degrades accuracy).
- Wrong track transfer: Verify Router prompt contains correct
handoff_to_*tool references. - GPT-4o Chinese spacing: Monitor ZH track outputs. If persistent, may need post-launch LLM swap for the ZH assistants.
STT Configuration
- Router: AssemblyAI Universal (bilingual EN/ZH detection)
- EN track: Deepgram Nova-2
en - ZH track: Deepgram Nova-2
zh - Do NOT use Deepgram
nova-3 multi— it forces English on Mandarin speech.
Silent Transfers Failing¶
Symptom: Patient hears "I'm transferring you to..." instead of a seamless handoff, or transfer fails entirely.
Causes:
- Squad prompts missing "NEVER mention transferring" instruction
transferAssistantused in prompt instead of actual tool function name (e.g.,handoff_to_booking_en)- Missing handoff tool on the squad member
Diagnosis:
# Check the Vapi squad configuration
cd /home/ubuntu/vitara-platform/vapi-gitops
cat squads/dev/*.yaml | grep -i "handoff\|transfer"
# Verify tool names match prompt references
grep -r "handoff_to\|transferAssistant" prompts/
Solution:
- All prompts must reference the actual handoff tool function names (
handoff_to_booking_en, nottransferAssistant) - Every non-Router assistant prompt must include: "NEVER mention transferring or handing off"
- After fixing prompts, push to Vapi:
6. NGINX Issues¶
502 Bad Gateway¶
Symptom: NGINX returns 502 Bad Gateway.
Causes:
- PM2 process
vitara-admin-apiis not running - NGINX
proxy_passpoints to the wrong port - PM2 process is running but not accepting connections
Diagnosis:
# Check PM2 is running
pm2 status vitara-admin-api
# Check NGINX error log
sudo tail -50 /var/log/nginx/error.log
# Test direct connection to API (bypass NGINX)
curl -s http://localhost:3002/health
# Verify NGINX config
sudo nginx -t
Solution:
# If PM2 is stopped
pm2 restart vitara-admin-api
# If NGINX config is wrong, verify proxy_pass port
sudo grep -n "proxy_pass" /etc/nginx/sites-enabled/*
# proxy_pass should point to http://localhost:3002
# After fixing:
sudo nginx -t && sudo nginx -s reload
SSL Certificate Problems¶
Symptom: ERR_CERT_DATE_INVALID, SSL_ERROR_EXPIRED_CERT_ALERT, or Vapi cannot reach the webhook.
Diagnosis:
# Check certificate expiry
sudo openssl x509 -in /etc/letsencrypt/live/api.vitaravox.ca/fullchain.pem -noout -dates
# Check certificate chain
openssl s_client -connect api.vitaravox.ca:443 -servername api.vitaravox.ca </dev/null 2>/dev/null | openssl x509 -noout -dates
# Check certbot renewal status
sudo certbot certificates
Solution:
# Renew certificate
sudo certbot renew
# If renewal fails (port 80 in use)
sudo certbot renew --nginx
# Reload NGINX to pick up new cert
sudo nginx -s reload
Certbot Auto-Renewal
Certbot installs a systemd timer for auto-renewal. Verify it is active: sudo systemctl status certbot.timer. If the timer is not running, renewal will not happen automatically.
NGINX Not Passing Headers¶
Symptom: API receives requests but all auth headers are missing.
Diagnosis:
# Check NGINX config for header forwarding
sudo grep -A 5 "proxy_set_header" /etc/nginx/sites-enabled/*
Solution:
Ensure the NGINX server block includes:
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Critical: pass through auth headers
proxy_pass_request_headers on;
7. PM2 Issues¶
Process Crashing / Restart Loops¶
Symptom: pm2 status shows high restart count or errored status.
Diagnosis:
# Check current status and restart count
pm2 status vitara-admin-api
# Check error logs
pm2 logs vitara-admin-api --err --lines 100
# Check for uncaught exceptions
pm2 logs vitara-admin-api --lines 200 | grep -i "uncaught\|unhandled\|FATAL\|Error:"
Solution:
# Common causes:
# 1. Missing environment variables (Zod validation fails in production)
pm2 env vitara-admin-api | grep -E "JWT_SECRET|ENCRYPTION_KEY|VAPI_WEBHOOK_SECRET"
# 2. Database connection string wrong
pm2 env vitara-admin-api | grep DATABASE_URL
# 3. TypeScript build errors (running stale JS)
cd /home/ubuntu/vitara-platform/admin-dashboard/server
npx tsc # Check for compile errors
pm2 restart vitara-admin-api
# 4. Port already in use
sudo ss -tlnp | grep 3002
Production Startup
In production (NODE_ENV=production), the server exits immediately if required secrets are missing: JWT_SECRET, JWT_REFRESH_SECRET, ENCRYPTION_KEY, VAPI_WEBHOOK_SECRET. Check pm2 logs vitara-admin-api --err for "Environment validation failed" messages.
Memory Leaks¶
Symptom: Memory usage grows continuously. PM2 eventually restarts the process due to OOM.
Diagnosis:
# Real-time monitoring
pm2 monit
# Check memory usage
pm2 status vitara-admin-api
# Look at the "mem" column
# Check for leaking SOAP clients (connections not closed)
pm2 logs vitara-admin-api --lines 100 | grep -i "heap\|memory\|OOM"
Solution:
# Set max memory restart threshold (e.g., 512MB)
pm2 start ecosystem.config.js --max-memory-restart 512M
# Or restart manually if leaking
pm2 restart vitara-admin-api
# For persistent leaks, enable heap profiling:
# NODE_OPTIONS="--max-old-space-size=512 --heapsnapshot-signal=SIGUSR2"
# Then: kill -USR2 <pid> to generate snapshot
Rebuild and Restart¶
The standard rebuild workflow for the PM2 process:
To verify after restart:
pm2 status vitara-admin-api
pm2 logs vitara-admin-api --lines 10
curl -s http://localhost:3002/health | python3 -m json.tool
8. Booking Issues¶
Slot Collision¶
Symptom: Two patients book the same time slot. Or booking returns "slot already taken" when it should be available.
Cause: Race condition between availability check and booking, or stale availability data.
Diagnosis:
pm2 logs vitara-admin-api --lines 100 | grep -i "collision\|already.*booked\|advisory.*lock\|slot.*taken"
How the Protection Works:
- BookingEngine acquires a PostgreSQL advisory lock on
provider+date+slot - Performs a fresh availability check (slot may have been taken since search)
- Creates the appointment via OSCAR SOAP
- Releases the lock
Solution:
# Check if advisory locks are stuck (unlikely but possible)
psql -h localhost -U vitara -d vitara_platform -c "
SELECT pid, granted, objid
FROM pg_locks
WHERE locktype = 'advisory';"
# If there are stuck locks, identify the holding connection
psql -h localhost -U vitara -d vitara_platform -c "
SELECT pid, state, query_start, query
FROM pg_stat_activity
WHERE pid IN (
SELECT pid FROM pg_locks WHERE locktype = 'advisory' AND granted = true
);"
Graceful Degradation
If the advisory lock acquisition fails (PostgreSQL error), the BookingEngine proceeds without the lock rather than rejecting the booking. This means collision protection is best-effort during database issues.
Advisory Lock Timeout¶
Symptom: Booking returns "Slot is currently being booked by another caller".
Cause: Another concurrent booking request holds the advisory lock for the same provider+date+slot.
Solution:
- This is working as intended. The second caller should retry with a different slot.
- If this happens frequently, it indicates high contention for the same provider. Consider suggesting alternative providers or time slots.
No Availability Found¶
Symptom: findAvailableSlots returns empty results, but the provider has open schedule in OSCAR.
Causes:
- OSCAR schedule template codes not configured for the provider
- All slots filtered as non-bookable (schedule code in NON_BOOKABLE_CODES: L, P, V, A, a, B, H, R, E, G, M, m, d, t)
- Business hours in the database do not match OSCAR's schedule
maxAdvanceBookingDaysorminAdvanceBookingHoursfiltering out valid slots
Diagnosis:
# Check what the adapter returns raw
pm2 logs vitara-admin-api --lines 100 | grep -i "getDayWorkSchedule\|scheduleCode\|NON_BOOKABLE\|available"
# Enable debug logging temporarily
pm2 restart vitara-admin-api --update-env -- --env VITARA_DEBUG=true LOG_LEVEL=trace
pm2 logs vitara-admin-api --lines 200 | grep "PHI-DEBUG"
# Remember to disable debug after investigation
pm2 restart vitara-admin-api --update-env -- --env VITARA_DEBUG=false LOG_LEVEL=info
Solution:
- Verify the provider has schedule templates configured in OSCAR
- Check that the schedule codes are bookable (not in the non-bookable list)
- Verify
ScheduleSettingsin the database match the clinic's actual hours
9. Onboarding Issues¶
EMR Connection Test Failure¶
Symptom: Onboarding step 2 (EMR Connection) fails validation.
Causes:
- OSCAR SOAP credentials not saved or encrypted incorrectly
- OSCAR server unreachable from the Vitara server
- Wrong OSCAR URL format
Diagnosis:
# Check the onboarding pre-launch validation
pm2 logs vitara-admin-api --lines 50 | grep -i "emr_connection\|SOAP\|onboarding"
# Test OSCAR SOAP connectivity
curl -s --connect-timeout 5 <OSCAR_SOAP_URL>/ws/ScheduleService?wsdl | head -5
Solution:
- Verify the OSCAR URL includes the context path (e.g.,
https://oscar.clinic.com/oscar) - Verify SOAP username/password are correct (test via OSCAR web UI login)
- Ensure the Vitara server can reach the OSCAR host (no firewall blocking)
- Re-save EMR credentials through the admin UI
Pre-Launch Validation Failures¶
Symptom: The 10-point pre-launch check fails on one or more items.
The checks and their requirements:
| # | Check ID | Required? | What It Validates |
|---|---|---|---|
| 1 | clinic_info |
Yes | Name, phone, address, timezone set |
| 2 | business_hours |
Yes | At least 1 open day configured |
| 3 | active_providers |
Yes | At least 1 provider with OSCAR ID |
| 4 | emr_connection |
Yes | SOAP credentials present and encrypted |
| 5 | vapi_assigned |
Yes | Vapi squad ID linked to clinic |
| 6 | privacy_officer |
Yes | Privacy officer name and email set |
| 7 | credentials_encrypted |
Yes | OSCAR credentials encrypted with AES-256-GCM |
| 8 | test_call |
No | Informational only |
| 9 | oscar_config_synced |
No | Informational — checks sync within 7 days |
| 10 | schedule_data_flow |
No | Informational — tests slot retrieval via adapter |
Diagnosis:
# Run validation via API
curl -s -H "Authorization: Bearer <jwt-token>" \
http://localhost:3002/api/clinic/onboarding/validate | python3 -m json.tool
Non-Blocking Checks
Checks test_call, schedule_data_flow, and oscar_config_synced are informational and do not block go-live. All other checks must pass.
10. Logs Reference¶
PM2 Log Commands¶
# View recent logs (stdout + stderr)
pm2 logs vitara-admin-api --lines 100
# Stream logs in real-time
pm2 logs vitara-admin-api
# Error logs only
pm2 logs vitara-admin-api --err --lines 50
# Output logs only (no errors)
pm2 logs vitara-admin-api --out --lines 50
# Flush (clear) log files
pm2 flush vitara-admin-api
Pino Structured Logging¶
The server uses Pino for structured JSON logging in production and pretty-printed output in development.
Log Levels (from most to least verbose):
| Level | When Used |
|---|---|
trace |
PHI-DEBUG mode only (sensitive data, NEVER in production) |
debug |
Development default: detailed operation traces |
info |
Production default: SOAP client creation, circuit breaker state changes, onboarding events |
warn |
Advisory lock failures, degraded health, circuit breaker OPEN |
error |
Uncaught exceptions, database failures, auth rejections |
fatal |
Process about to exit |
Change Log Level at Runtime:
# Temporarily increase verbosity (resets on restart)
pm2 restart vitara-admin-api --update-env -- --env LOG_LEVEL=debug
# Enable PHI-DEBUG mode (trace + PHI data — use with extreme caution)
pm2 restart vitara-admin-api --update-env -- --env VITARA_DEBUG=true LOG_LEVEL=trace
# Reset to production defaults
pm2 restart vitara-admin-api --update-env -- --env LOG_LEVEL=info VITARA_DEBUG=false
PHI-DEBUG Mode
Setting VITARA_DEBUG=true enables [PHI-DEBUG] trace logging that may include Protected Health Information (patient names, phone numbers, demographics). Never enable in production unless actively debugging a critical issue, and disable immediately after.
Log Locations¶
| Source | Location |
|---|---|
| PM2 stdout | ~/.pm2/logs/vitara-admin-api-out.log |
| PM2 stderr | ~/.pm2/logs/vitara-admin-api-error.log |
| NGINX access | /var/log/nginx/access.log |
| NGINX error | /var/log/nginx/error.log |
| PostgreSQL | /var/log/postgresql/ |
Searching Logs¶
# Search for errors in PM2 logs
pm2 logs vitara-admin-api --lines 500 --nostream | grep -i "error\|fatal"
# Search for specific request patterns
pm2 logs vitara-admin-api --lines 500 --nostream | grep "tool-calls"
# Search NGINX for 5xx errors
sudo grep " 50[0-9] " /var/log/nginx/access.log | tail -20
# Search for circuit breaker events
pm2 logs vitara-admin-api --lines 500 --nostream | grep "circuit.breaker\|OPEN\|HALF-OPEN\|CLOSED"
11. Health Checks¶
Quick Health Check¶
# Via NGINX (tests full path)
curl -s https://api.vitaravox.ca/health | python3 -m json.tool
# Direct to API (bypasses NGINX)
curl -s http://localhost:3002/health | python3 -m json.tool
Expected healthy response:
{
"status": "healthy",
"timestamp": "2026-02-16T12:00:00.000Z",
"services": {
"database": { "status": "healthy", "latencyMs": 2 },
"oscarBridge": { "status": "healthy", "latencyMs": 45 },
"vapiApi": { "status": "healthy", "latencyMs": 120 }
},
"uptime": 86400
}
Comprehensive Health Check Script¶
Save as ~/check-vitara-health.sh and run with bash ~/check-vitara-health.sh:
#!/usr/bin/env bash
set -euo pipefail
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'
pass() { echo -e " ${GREEN}PASS${NC} $1"; }
fail() { echo -e " ${RED}FAIL${NC} $1"; }
warn() { echo -e " ${YELLOW}WARN${NC} $1"; }
echo "=== VitaraVox Health Check ==="
echo " Timestamp: $(date -Iseconds)"
echo ""
# 1. PM2 process
echo "[PM2]"
if pm2 pid vitara-admin-api > /dev/null 2>&1 && [ "$(pm2 pid vitara-admin-api)" != "" ]; then
RESTARTS=$(pm2 jlist 2>/dev/null | python3 -c "
import sys, json
data = json.load(sys.stdin)
for p in data:
if p['name'] == 'vitara-admin-api':
print(p.get('pm2_env', {}).get('restart_time', 'N/A'))
" 2>/dev/null || echo "N/A")
pass "vitara-admin-api running (restarts: ${RESTARTS})"
if [ "$RESTARTS" != "N/A" ] && [ "$RESTARTS" -gt 10 ]; then
warn "High restart count ($RESTARTS) — check for crash loops"
fi
else
fail "vitara-admin-api NOT running"
fi
echo ""
# 2. PostgreSQL
echo "[PostgreSQL]"
if pg_isready -h localhost -p 5432 -q 2>/dev/null; then
pass "PostgreSQL accepting connections on port 5432"
else
fail "PostgreSQL not responding on port 5432"
fi
if psql -h localhost -U vitara -d vitara_platform -c "SELECT 1" > /dev/null 2>&1; then
TABLE_COUNT=$(psql -h localhost -U vitara -d vitara_platform -t -c "
SELECT count(*) FROM information_schema.tables
WHERE table_schema = 'public';" 2>/dev/null | tr -d ' ')
pass "Database vitara_platform accessible (${TABLE_COUNT} tables)"
else
fail "Cannot connect to vitara_platform database"
fi
echo ""
# 3. API Health
echo "[API Server]"
HEALTH=$(curl -s --connect-timeout 5 http://localhost:3002/health 2>/dev/null)
if [ -n "$HEALTH" ]; then
STATUS=$(echo "$HEALTH" | python3 -c "import sys,json; print(json.load(sys.stdin).get('status','unknown'))" 2>/dev/null || echo "parse_error")
if [ "$STATUS" = "healthy" ]; then
pass "API health: healthy"
elif [ "$STATUS" = "degraded" ]; then
warn "API health: degraded"
echo "$HEALTH" | python3 -m json.tool 2>/dev/null
else
fail "API health: $STATUS"
fi
else
fail "API not responding on port 3002"
fi
echo ""
# 4. NGINX
echo "[NGINX]"
if sudo systemctl is-active nginx > /dev/null 2>&1; then
pass "NGINX service active"
else
fail "NGINX service not active"
fi
if sudo nginx -t 2>&1 | grep -q "successful"; then
pass "NGINX config valid"
else
fail "NGINX config invalid"
fi
# SSL check
CERT_EXPIRY=$(echo | openssl s_client -connect api.vitaravox.ca:443 -servername api.vitaravox.ca 2>/dev/null | openssl x509 -noout -enddate 2>/dev/null | cut -d= -f2)
if [ -n "$CERT_EXPIRY" ]; then
EXPIRY_EPOCH=$(date -d "$CERT_EXPIRY" +%s 2>/dev/null || echo 0)
NOW_EPOCH=$(date +%s)
DAYS_LEFT=$(( (EXPIRY_EPOCH - NOW_EPOCH) / 86400 ))
if [ "$DAYS_LEFT" -gt 14 ]; then
pass "SSL certificate valid ($DAYS_LEFT days remaining)"
elif [ "$DAYS_LEFT" -gt 0 ]; then
warn "SSL certificate expires in $DAYS_LEFT days — renew soon"
else
fail "SSL certificate EXPIRED"
fi
else
warn "Could not check SSL certificate"
fi
echo ""
# 5. Disk & Memory
echo "[System Resources]"
DISK_PCT=$(df / --output=pcent | tail -1 | tr -d ' %')
if [ "$DISK_PCT" -lt 80 ]; then
pass "Disk usage: ${DISK_PCT}%"
elif [ "$DISK_PCT" -lt 90 ]; then
warn "Disk usage: ${DISK_PCT}% — getting full"
else
fail "Disk usage: ${DISK_PCT}% — critically full"
fi
MEM_PCT=$(free | awk '/Mem:/ {printf "%.0f", $3/$2*100}')
if [ "$MEM_PCT" -lt 80 ]; then
pass "Memory usage: ${MEM_PCT}%"
elif [ "$MEM_PCT" -lt 90 ]; then
warn "Memory usage: ${MEM_PCT}%"
else
fail "Memory usage: ${MEM_PCT}% — critically high"
fi
echo ""
echo "=== Check Complete ==="
PM2 Monitoring¶
# Real-time dashboard
pm2 monit
# Process list with memory/CPU
pm2 status
# Detailed process info
pm2 show vitara-admin-api
# JSON process info (for scripting)
pm2 jlist
12. Debug Mode (VITARA_DEBUG)¶
Source: lib/debug-manager.ts:1-164
Debug mode enables verbose PHI logging in PM2 logs for diagnosing production issues. It auto-expires after 4 hours to prevent accidental exposure.
DEBUG MODE LIFECYCLE
Activate (env or API)
│
▼
┌────────────────────────┐
│ ACTIVE │
│ • Log level → trace │
│ • PHI appears in logs │
│ • 4h timer starts │
└────────┬───────────────┘
│
┌────┴────────┐
│ │
4h expiry Manual disable
│ (API call)
▼ │
┌────────────────────────┐
│ INACTIVE │
│ • Log level → normal │
│ • PHI stripped │
│ • Timer cleared │
└────────────────────────┘
Activation Methods¶
| Method | Command | Requires Restart? |
|---|---|---|
| Environment | Set VITARA_DEBUG=true in .env |
Yes (pm2 restart) |
| Runtime API | POST /api/admin/debug {"enabled": true} |
No |
Checking Status¶
# Via API
curl -s -H "Authorization: Bearer <jwt>" \
http://localhost:3002/api/admin/debug | python3 -m json.tool
# Response:
# {
# "active": true,
# "activatedAt": "2026-03-09T10:00:00.000Z",
# "expiresAt": "2026-03-09T14:00:00.000Z",
# "source": "api"
# }
Disabling¶
# Via API (immediate, no restart)
curl -s -X POST -H "Authorization: Bearer <jwt>" \
-H "Content-Type: application/json" \
-d '{"enabled": false}' \
http://localhost:3002/api/admin/debug
# Or wait for auto-expiry (4 hours)
Security
Debug mode does NOT disable any security controls (HMAC, CORS, rate limiting). It only enables verbose logging that includes PHI (patient names, DOB, phone numbers) in PM2 logs. Never enable unless actively diagnosing a critical issue.
13. SMS Troubleshooting¶
SMS Not Sending¶
Symptom: Appointment booked/rescheduled/cancelled but no SMS received.
Guard chain (source: sms.service.ts:55-100): All 5 checks must pass for an SMS to send.
SMS GUARD CHAIN
fireSmsBehindWebhook() called
│
▼
1. TELNYX_API_KEY set? ── NO ──► skip (no Telnyx configured)
│ YES
▼
2. smsSenderNumber set? ── NO ──► skip (clinic has no SMS number)
│ YES
▼
3. smsConsent ≠ false? ── NO ──► skip (patient declined)
│ YES
▼
4. Patient phone valid? ── NO ──► skip (no E.164 phone)
│ YES
▼
5. smsEnabled ≠ false? ── NO ──► skip (clinic disabled SMS)
│ YES
▼
SEND via Telnyx API
Diagnosis:
# Check env vars
pm2 env vitara-admin-api | grep TELNYX
# Check clinic SMS config
psql -U vitara -d vitara_platform -c "
SELECT \"smsSenderNumber\", \"smsEnabled\", \"smsLanguage\"
FROM \"ClinicConfig\"
WHERE \"clinicId\" = '<clinic-id>';"
# Check recent SMS attempts in logs
pm2 logs vitara-admin-api --lines 100 | grep -i "sms\|telnyx"
Solutions:
| Guard Failed | Fix |
|---|---|
No TELNYX_API_KEY |
Add to .env, restart PM2 |
No smsSenderNumber |
Set via Admin UI → Clinic → SMS Config |
smsConsent = false |
Patient declined — cannot override |
| Invalid phone | Check patientPhone in OSCAR — must be 10-digit Canadian |
smsEnabled = false |
Enable via Admin UI or API: POST /api/clinic/config/sms |
SMS Consent Not Passed¶
Symptom: smsConsent is always true even when patient declined.
Cause: The voice agent must pass smsConsent: false in the create_appointment, update_appointment, or cancel_appointment tool call if the patient declined during the Patient-ID phase.
Diagnosis: Check Vapi call transcript for the consent disclosure response. If the patient said "no texts" but the tool call still has smsConsent: true, the prompt may need updating.
See: SMS Integration and Prompt Engineering — SMS Consent UX
14. Cache & Adapter Issues¶
EMR Adapter Cache¶
Source: EmrAdapterFactory.ts:43-130
The adapter factory caches EMR adapter instances per clinic with a 5-minute TTL. Stale cache can cause issues after credential changes.
Symptom: OSCAR calls fail with auth errors after credentials were updated.
Diagnosis:
Solution:
# Restart PM2 to clear all adapter caches
pm2 restart vitara-admin-api
# The 5-min TTL means waiting also works, but restart is faster
Circuit Breaker Stuck Open¶
Symptom: All OSCAR calls return 503 even though OSCAR is reachable.
The circuit breaker opens after 50% failure rate within its monitoring window. Once open, it stays open for 30 seconds, then enters half-open state.
Diagnosis:
# Check breaker state
pm2 logs vitara-admin-api --lines 200 | grep -i "circuit\|OPEN\|HALF-OPEN\|CLOSED"
# Verify OSCAR is actually reachable
curl -s --connect-timeout 5 <OSCAR_URL>/ws/ScheduleService?wsdl | head -5
Solution:
- Wait 30 seconds for half-open → one test request → auto-close
- If OSCAR is down: fix OSCAR first, breaker auto-recovers
- If persistent: restart PM2 to reset breaker state
Split Circuit Breakers
SOAP and REST adapters have independent circuit breakers. A SOAP failure does not trip the REST breaker and vice versa. For preferRest clinics, the REST breaker must be healthy for most operations.
Getting Help¶
If issues persist after following this guide:
-
Collect diagnostics:
-
Check documentation: vitdocs.vitaravox.ca
-
Contact support: support@vitaravox.com
Include in your support request:
- Error messages from PM2 logs
- Output of
pm2 status - Output of the health check script
- Steps to reproduce the issue
- Environment (production / development)