Skip to content

Monitoring & Troubleshooting

Common issues and solutions for the PM2 + SOAP + REST production stack

Last Updated: 2026-03-09 (v4.3.0)


Quick Reference

Component How to Check Port
API Server pm2 status vitara-admin-api 3002
PostgreSQL pg_isready -h localhost -p 5432 5432
NGINX sudo systemctl status nginx 80/443
OSCAR SOAP curl -s <OSCAR_URL>/ws/ScheduleService?wsdl 8080/443
Vapi API curl -s -H "Authorization: Bearer $VAPI_API_KEY" https://api.vapi.ai/assistant

1. API Issues

401 Unauthorized

Symptom: API returns 401 Unauthorized on any endpoint.

Causes:

  • Missing or invalid x-api-key header on Vapi webhook requests
  • Expired or invalid JWT token on admin API requests
  • VAPI_WEBHOOK_SECRET not set in production (server rejects all webhook requests)

Diagnosis:

# Check PM2 process is running
pm2 status vitara-admin-api

# Look for auth rejection logs
pm2 logs vitara-admin-api --lines 50 | grep -i "auth\|401\|VAPI AUTH"

# Verify VAPI_WEBHOOK_SECRET is set in environment
pm2 env vitara-admin-api | grep VAPI_WEBHOOK_SECRET

Solution:

# If VAPI_WEBHOOK_SECRET is missing, add it to the .env file
cd /home/ubuntu/vitara-platform/admin-dashboard/server
nano .env  # Add VAPI_WEBHOOK_SECRET=<your-secret>

# Rebuild and restart
npx tsc && pm2 restart vitara-admin-api

Vapi Webhook Auth Methods

The server accepts three auth methods in order: HMAC-SHA256 signature (x-vapi-signature + x-vapi-timestamp), API key (x-api-key header), or Bearer token. Ensure the method configured in Vapi matches VAPI_WEBHOOK_SECRET.

429 Rate Limited

Symptom: API returns 429 Too Many Requests.

Causes:

  • Exceeding rate limit (5/min auth, 100/min API, 300/min webhooks)
  • Vapi retrying failed tool calls rapidly

Diagnosis:

pm2 logs vitara-admin-api --lines 100 | grep -i "rate.limit\|429"

Solution:

  • Wait for the rate limit window to reset (1 minute)
  • If Vapi is retrying, check why the original tool call is failing (likely a downstream timeout)
  • Implement exponential backoff in any custom API clients

503 Service Unavailable

Symptom: API returns 503 Service Unavailable.

Causes:

  • PM2 process vitara-admin-api is down or restarting
  • PostgreSQL connection pool exhausted
  • OSCAR SOAP circuit breaker is OPEN (too many recent failures)

Diagnosis:

# Check PM2 status
pm2 status vitara-admin-api

# Check for circuit breaker messages
pm2 logs vitara-admin-api --lines 100 | grep -i "circuit.breaker\|OPEN\|503"

# Check health endpoint directly (bypasses NGINX)
curl -s http://localhost:3002/health | python3 -m json.tool

Solution:

# If process is stopped or errored
pm2 restart vitara-admin-api

# If circuit breaker is open, check OSCAR connectivity first
curl -s <OSCAR_SOAP_URL>/ws/ScheduleService?wsdl | head -5

# Circuit breaker auto-resets after 30 seconds (half-open state)
# Monitor recovery:
pm2 logs vitara-admin-api --lines 20 | grep "HALF-OPEN\|CLOSED"

Circuit Breaker Behavior

The circuit breaker opens after 50% of requests fail within the monitoring window. When open, all SOAP calls fail immediately with 503 for 30 seconds. After that, one test request is allowed (half-open). If it succeeds, the breaker closes.


2. Database Issues

Connection Failed

Symptom: ECONNREFUSED or Connection refused in PM2 logs.

Causes:

  • PostgreSQL service not running
  • Wrong DATABASE_URL in .env
  • Port 5432 conflict

Diagnosis:

# Check PostgreSQL is running
pg_isready -h localhost -p 5432

# Check service status
sudo systemctl status postgresql

# Verify connection manually
psql -h localhost -U vitara -d vitara_platform -c "SELECT 1"

# Check what's using port 5432
sudo ss -tlnp | grep 5432

Solution:

# Start PostgreSQL if stopped
sudo systemctl start postgresql

# If DATABASE_URL is wrong, check .env
cat /home/ubuntu/vitara-platform/admin-dashboard/server/.env | grep DATABASE_URL

# Restart PM2 after fixing .env
pm2 restart vitara-admin-api

Authentication Failed

Symptom: FATAL: password authentication failed for user "vitara" in logs.

Causes:

  • Password in DATABASE_URL does not match PostgreSQL user password
  • User does not exist in PostgreSQL

Diagnosis:

# Check the DATABASE_URL password matches
cat /home/ubuntu/vitara-platform/admin-dashboard/server/.env | grep DATABASE_URL

# Try connecting with the credentials
psql "postgresql://vitara:<password>@localhost:5432/vitara_platform" -c "SELECT 1"

# List PostgreSQL users
sudo -u postgres psql -c "\du"

Solution:

# Reset the user password in PostgreSQL
sudo -u postgres psql -c "ALTER USER vitara WITH PASSWORD 'new_password';"

# Update DATABASE_URL in .env to match
# Then restart
pm2 restart vitara-admin-api

Migration Failed

Symptom: Tables do not exist, Prisma errors about missing relations.

Diagnosis:

# Check current migration status
cd /home/ubuntu/vitara-platform/admin-dashboard/server
npx prisma migrate status

# List existing tables
psql -h localhost -U vitara -d vitara_platform -c "\dt"

Solution:

# Run pending migrations
cd /home/ubuntu/vitara-platform/admin-dashboard/server
npx prisma migrate deploy

# If schema is out of sync (development only!)
npx prisma db push

# Verify tables were created
psql -h localhost -U vitara -d vitara_platform -c "\dt"

# Restart PM2
pm2 restart vitara-admin-api

Never use prisma migrate reset in production

This drops all data. Use prisma migrate deploy to apply pending migrations without data loss.


3. OSCAR SOAP Issues

Production EMR Path

The production integration uses OscarSoapAdapter (WS-Security over SOAP). The REST bridge (OscarBridgeAdapter) is legacy and used only in development/fallback. Set DEFAULT_EMR_TYPE=oscar-soap in .env.

WS-Security SecurityError

Symptom: SecurityError or WSSecurityException from OSCAR SOAP calls.

Cause: The SOAP client is sending a <wsu:Timestamp> element in the WS-Security header. OSCAR's CXF WSS4J configuration has no Timestamp action configured and rejects it.

Diagnosis:

pm2 logs vitara-admin-api --lines 50 | grep -i "SecurityError\|WSS\|security"

Solution:

Verify the WS-Security options in OscarSoapAdapter.ts are correct:

// CORRECT — hasTimeStamp MUST be false
client.setSecurity(new soap.WSSecurity(username, password, {
  passwordType: 'PasswordText',
  mustUnderstand: true,
  hasTimeStamp: false,   // REQUIRED: Timestamp causes SecurityError
  hasNonce: false
}));

Critical: hasTimeStamp: false is REQUIRED

OSCAR CXF has no Timestamp action configured in its WSS4J policy. Including a <wsu:Timestamp> element causes an immediate SecurityError. This is the #1 cause of SOAP auth failures after code changes.

If someone has modified the adapter:

# Check the current setting
grep -n "hasTimeStamp" /home/ubuntu/vitara-platform/admin-dashboard/server/src/adapters/OscarSoapAdapter.ts

# Rebuild and restart after fixing
cd /home/ubuntu/vitara-platform/admin-dashboard/server
npx tsc && pm2 restart vitara-admin-api

WSDL Fetch Timeout / Cold-Start

Symptom: First SOAP call after PM2 restart times out or takes 8-15 seconds. Subsequent calls are fast.

Cause: node-soap fetches and parses the WSDL on first use. This cold-start fetch can exceed the 4-second circuit breaker timeout, causing the first request to fail.

Diagnosis:

pm2 logs vitara-admin-api --lines 30 | grep -i "WSDL\|cold.start\|timeout\|Schedule client created"

Solution:

# After PM2 restart, warm the SOAP clients manually
curl -s http://localhost:3002/health | python3 -m json.tool

# The health endpoint triggers OSCAR connectivity checks
# which forces WSDL fetching

# For a more thorough warm-up, call the API once:
curl -s -X POST http://localhost:3002/api/vapi \
  -H "Content-Type: application/json" \
  -H "x-api-key: <your-webhook-secret>" \
  -d '{"message":{"type":"tool-calls","toolCalls":[]}}'

Future Fix

A server-startup SOAP client warm-up is planned but not yet implemented. Until then, expect the first tool call after PM2 restart to be slow or fail.

SOAP Connection Timeout

Symptom: ETIMEDOUT or ECONNREFUSED on SOAP calls. Circuit breaker opens.

Causes:

  • OSCAR server is down or unreachable
  • Firewall blocking outbound connections to OSCAR host
  • Wrong OSCAR_SOAP_URL in .env

Diagnosis:

# Test raw connectivity to OSCAR
curl -s --connect-timeout 5 <OSCAR_SOAP_URL>/ws/ScheduleService?wsdl | head -5

# Check configured URL
pm2 env vitara-admin-api | grep OSCAR_SOAP

# Check for timeout errors
pm2 logs vitara-admin-api --lines 50 | grep -i "timeout\|ETIMEDOUT\|ECONNREFUSED"

Solution:

# Verify network connectivity
ping -c 3 <OSCAR_HOST>
telnet <OSCAR_HOST> <OSCAR_PORT>

# If firewall is blocking, check security group / iptables
sudo iptables -L -n | grep <OSCAR_PORT>

# After connectivity is restored, the circuit breaker will
# auto-recover in ~30 seconds (half-open → closed)
pm2 logs vitara-admin-api | grep "HALF-OPEN"

Timeout Budget

SOAP calls have a 4-second timeout (circuit breaker). Vapi tool calls have a 5-second timeout. If OSCAR takes longer than 4 seconds to respond, the circuit breaker trips. This leaves only 1 second for Node.js processing + network overhead.

SOAP Credential Errors

Symptom: 401 Unauthorized or Authentication failed from OSCAR SOAP.

Causes:

  • Wrong OSCAR_SOAP_USERNAME or OSCAR_SOAP_PASSWORD
  • OSCAR user account locked or disabled
  • Password expired

Diagnosis:

pm2 env vitara-admin-api | grep OSCAR_SOAP_USERNAME
pm2 logs vitara-admin-api --lines 30 | grep -i "auth\|credential\|401"

Solution:

# Verify credentials work with a direct SOAP test
# (replace with actual values)
curl -s -X POST "<OSCAR_URL>/ws/ScheduleService" \
  -H "Content-Type: text/xml" \
  -d '<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
        <soapenv:Header>
          <wsse:Security xmlns:wsse="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd">
            <wsse:UsernameToken>
              <wsse:Username>YOUR_USERNAME</wsse:Username>
              <wsse:Password Type="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-username-token-profile-1.0#PasswordText">YOUR_PASSWORD</wsse:Password>
            </wsse:UsernameToken>
          </wsse:Security>
        </soapenv:Header>
        <soapenv:Body/>
      </soapenv:Envelope>'

Patient Not Found

Symptom: SOAP searchDemographic returns empty results, but patient exists in OSCAR.

Causes:

  • Phone number format mismatch (OSCAR stores as 6045551234, Vapi sends +16045551234)
  • Name spelling differences or accent characters
  • Searching the wrong OSCAR instance

Diagnosis:

# Check recent search attempts
pm2 logs vitara-admin-api --lines 50 | grep -i "searchDemographic\|patient.*not found"

Solution:

  • The server normalizes +1XXXXXXXXXX to XXXXXXXXXX before SOAP search. If the phone format differs, check the normalizePhone() function in the adapter.
  • Verify against OSCAR directly: log in to the OSCAR web UI and search for the patient.
  • Non-numeric providerId values like "any" or "任何" must be handled server-side as "search all providers."

4. OSCAR Bridge Issues (Legacy)

Legacy / Development Only

The REST bridge (OscarBridgeAdapter) is the legacy EMR path retained for development and fallback. Production uses OscarSoapAdapter (SOAP). Set DEFAULT_EMR_TYPE=oscar-soap in .env for production.

Bridge Connection Failed

Symptom: Health check shows oscarBridge: down.

Diagnosis:

# Check if bridge is configured
pm2 env vitara-admin-api | grep OSCAR_BRIDGE

# Test bridge directly
curl -s -H "X-API-Key: <bridge-api-key>" http://15.222.50.48:3000/api/v1/health

# Check PM2 logs
pm2 logs vitara-admin-api --lines 20 | grep -i "bridge\|OSCAR Bridge"

Solution:

# If bridge is not needed in production, ensure SOAP is primary:
# In .env:
# DEFAULT_EMR_TYPE=oscar-soap

# Restart
pm2 restart vitara-admin-api

Bridge Timeout

The bridge phone-search timeout was reduced from 10s to 4s to match the SOAP adapter. If the bridge is slow, it will trip the same 4-second budget.


5. Voice Agent Issues

Vapi Webhook Not Calling Server

Symptom: Voice calls connect, but no tool-call requests arrive at the server.

Causes:

  • Wrong webhook URL configured in Vapi squad/assistant
  • SSL certificate issues preventing Vapi from reaching the server
  • NGINX not proxying to port 3002
  • Firewall blocking inbound HTTPS

Diagnosis:

# Check PM2 is receiving ANY requests
pm2 logs vitara-admin-api --lines 50 | grep -i "POST\|webhook\|tool-call"

# Test the webhook endpoint externally
curl -s -X POST https://api.vitaravox.ca/api/vapi \
  -H "Content-Type: application/json" \
  -H "x-api-key: <secret>" \
  -d '{"message":{"type":"tool-calls","toolCalls":[]}}'

# Check NGINX is proxying correctly
sudo nginx -t
curl -s http://localhost:3002/health

Solution:

  1. Verify the webhook URL in the Vapi dashboard matches https://api.vitaravox.ca/api/vapi
  2. Ensure the SSL certificate is valid (see NGINX Issues)
  3. Check that NGINX proxies /api/vapi to localhost:3002
  4. Verify firewall allows inbound 443

Call Drops Mid-Conversation

Symptom: Calls disconnect unexpectedly during tool execution.

Causes:

  • Tool response exceeds Vapi's 5-second timeout
  • OSCAR SOAP is slow, circuit breaker trips (4s timeout)
  • Vapi maxDuration exceeded on the call
  • Server crash during request processing

Diagnosis:

# Check for slow responses or timeouts
pm2 logs vitara-admin-api --lines 100 | grep -i "timeout\|slow\|circuit\|OPEN\|crash"

# Check PM2 process stability
pm2 status vitara-admin-api
# Look at "restart" count — high restart count = crashes

# Check Vapi call logs in the Vapi dashboard for detailed error info

Solution:

  • If SOAP is slow: check OSCAR server performance, network latency
  • If circuit breaker is tripping: the 4-second SOAP timeout is non-negotiable (Vapi allows 5s total). Fix the underlying OSCAR latency.
  • If PM2 is restarting: check for uncaught exceptions (see PM2 Issues)

Wrong Language Response

Symptom: Agent responds in English when patient speaks Mandarin, or vice versa.

Causes:

  • Router STT (AssemblyAI Universal) misidentified the language
  • Patient was transferred to the wrong language track
  • GPT-4o outputting space-separated Chinese characters (known issue)

Diagnosis:

# Check Vapi call logs for transcription output
# In Vapi dashboard: Calls → select call → Transcript tab

# Check which assistant handled the call
pm2 logs vitara-admin-api --lines 50 | grep -i "assistant\|transfer\|handoff\|language"

Solution:

  • Router STT: The Router uses AssemblyAI Universal Multilingual for bilingual detection. If detection is poor, review the audio quality (background noise degrades accuracy).
  • Wrong track transfer: Verify Router prompt contains correct handoff_to_* tool references.
  • GPT-4o Chinese spacing: Monitor ZH track outputs. If persistent, may need post-launch LLM swap for the ZH assistants.

STT Configuration

  • Router: AssemblyAI Universal (bilingual EN/ZH detection)
  • EN track: Deepgram Nova-2 en
  • ZH track: Deepgram Nova-2 zh
  • Do NOT use Deepgram nova-3 multi — it forces English on Mandarin speech.

Silent Transfers Failing

Symptom: Patient hears "I'm transferring you to..." instead of a seamless handoff, or transfer fails entirely.

Causes:

  • Squad prompts missing "NEVER mention transferring" instruction
  • transferAssistant used in prompt instead of actual tool function name (e.g., handoff_to_booking_en)
  • Missing handoff tool on the squad member

Diagnosis:

# Check the Vapi squad configuration
cd /home/ubuntu/vitara-platform/vapi-gitops
cat squads/dev/*.yaml | grep -i "handoff\|transfer"

# Verify tool names match prompt references
grep -r "handoff_to\|transferAssistant" prompts/

Solution:

  • All prompts must reference the actual handoff tool function names (handoff_to_booking_en, not transferAssistant)
  • Every non-Router assistant prompt must include: "NEVER mention transferring or handing off"
  • After fixing prompts, push to Vapi:
cd /home/ubuntu/vitara-platform/vapi-gitops
npm run push:dev

6. NGINX Issues

502 Bad Gateway

Symptom: NGINX returns 502 Bad Gateway.

Causes:

  • PM2 process vitara-admin-api is not running
  • NGINX proxy_pass points to the wrong port
  • PM2 process is running but not accepting connections

Diagnosis:

# Check PM2 is running
pm2 status vitara-admin-api

# Check NGINX error log
sudo tail -50 /var/log/nginx/error.log

# Test direct connection to API (bypass NGINX)
curl -s http://localhost:3002/health

# Verify NGINX config
sudo nginx -t

Solution:

# If PM2 is stopped
pm2 restart vitara-admin-api

# If NGINX config is wrong, verify proxy_pass port
sudo grep -n "proxy_pass" /etc/nginx/sites-enabled/*

# proxy_pass should point to http://localhost:3002
# After fixing:
sudo nginx -t && sudo nginx -s reload

SSL Certificate Problems

Symptom: ERR_CERT_DATE_INVALID, SSL_ERROR_EXPIRED_CERT_ALERT, or Vapi cannot reach the webhook.

Diagnosis:

# Check certificate expiry
sudo openssl x509 -in /etc/letsencrypt/live/api.vitaravox.ca/fullchain.pem -noout -dates

# Check certificate chain
openssl s_client -connect api.vitaravox.ca:443 -servername api.vitaravox.ca </dev/null 2>/dev/null | openssl x509 -noout -dates

# Check certbot renewal status
sudo certbot certificates

Solution:

# Renew certificate
sudo certbot renew

# If renewal fails (port 80 in use)
sudo certbot renew --nginx

# Reload NGINX to pick up new cert
sudo nginx -s reload

Certbot Auto-Renewal

Certbot installs a systemd timer for auto-renewal. Verify it is active: sudo systemctl status certbot.timer. If the timer is not running, renewal will not happen automatically.

NGINX Not Passing Headers

Symptom: API receives requests but all auth headers are missing.

Diagnosis:

# Check NGINX config for header forwarding
sudo grep -A 5 "proxy_set_header" /etc/nginx/sites-enabled/*

Solution:

Ensure the NGINX server block includes:

proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Critical: pass through auth headers
proxy_pass_request_headers on;
sudo nginx -t && sudo nginx -s reload

7. PM2 Issues

Process Crashing / Restart Loops

Symptom: pm2 status shows high restart count or errored status.

Diagnosis:

# Check current status and restart count
pm2 status vitara-admin-api

# Check error logs
pm2 logs vitara-admin-api --err --lines 100

# Check for uncaught exceptions
pm2 logs vitara-admin-api --lines 200 | grep -i "uncaught\|unhandled\|FATAL\|Error:"

Solution:

# Common causes:
# 1. Missing environment variables (Zod validation fails in production)
pm2 env vitara-admin-api | grep -E "JWT_SECRET|ENCRYPTION_KEY|VAPI_WEBHOOK_SECRET"

# 2. Database connection string wrong
pm2 env vitara-admin-api | grep DATABASE_URL

# 3. TypeScript build errors (running stale JS)
cd /home/ubuntu/vitara-platform/admin-dashboard/server
npx tsc  # Check for compile errors
pm2 restart vitara-admin-api

# 4. Port already in use
sudo ss -tlnp | grep 3002

Production Startup

In production (NODE_ENV=production), the server exits immediately if required secrets are missing: JWT_SECRET, JWT_REFRESH_SECRET, ENCRYPTION_KEY, VAPI_WEBHOOK_SECRET. Check pm2 logs vitara-admin-api --err for "Environment validation failed" messages.

Memory Leaks

Symptom: Memory usage grows continuously. PM2 eventually restarts the process due to OOM.

Diagnosis:

# Real-time monitoring
pm2 monit

# Check memory usage
pm2 status vitara-admin-api
# Look at the "mem" column

# Check for leaking SOAP clients (connections not closed)
pm2 logs vitara-admin-api --lines 100 | grep -i "heap\|memory\|OOM"

Solution:

# Set max memory restart threshold (e.g., 512MB)
pm2 start ecosystem.config.js --max-memory-restart 512M

# Or restart manually if leaking
pm2 restart vitara-admin-api

# For persistent leaks, enable heap profiling:
# NODE_OPTIONS="--max-old-space-size=512 --heapsnapshot-signal=SIGUSR2"
# Then: kill -USR2 <pid> to generate snapshot

Rebuild and Restart

The standard rebuild workflow for the PM2 process:

cd /home/ubuntu/vitara-platform/admin-dashboard/server
npx tsc && pm2 restart vitara-admin-api

To verify after restart:

pm2 status vitara-admin-api
pm2 logs vitara-admin-api --lines 10
curl -s http://localhost:3002/health | python3 -m json.tool

8. Booking Issues

Slot Collision

Symptom: Two patients book the same time slot. Or booking returns "slot already taken" when it should be available.

Cause: Race condition between availability check and booking, or stale availability data.

Diagnosis:

pm2 logs vitara-admin-api --lines 100 | grep -i "collision\|already.*booked\|advisory.*lock\|slot.*taken"

How the Protection Works:

  1. BookingEngine acquires a PostgreSQL advisory lock on provider+date+slot
  2. Performs a fresh availability check (slot may have been taken since search)
  3. Creates the appointment via OSCAR SOAP
  4. Releases the lock

Solution:

# Check if advisory locks are stuck (unlikely but possible)
psql -h localhost -U vitara -d vitara_platform -c "
  SELECT pid, granted, objid
  FROM pg_locks
  WHERE locktype = 'advisory';"

# If there are stuck locks, identify the holding connection
psql -h localhost -U vitara -d vitara_platform -c "
  SELECT pid, state, query_start, query
  FROM pg_stat_activity
  WHERE pid IN (
    SELECT pid FROM pg_locks WHERE locktype = 'advisory' AND granted = true
  );"

Graceful Degradation

If the advisory lock acquisition fails (PostgreSQL error), the BookingEngine proceeds without the lock rather than rejecting the booking. This means collision protection is best-effort during database issues.

Advisory Lock Timeout

Symptom: Booking returns "Slot is currently being booked by another caller".

Cause: Another concurrent booking request holds the advisory lock for the same provider+date+slot.

Solution:

  • This is working as intended. The second caller should retry with a different slot.
  • If this happens frequently, it indicates high contention for the same provider. Consider suggesting alternative providers or time slots.

No Availability Found

Symptom: findAvailableSlots returns empty results, but the provider has open schedule in OSCAR.

Causes:

  • OSCAR schedule template codes not configured for the provider
  • All slots filtered as non-bookable (schedule code in NON_BOOKABLE_CODES: L, P, V, A, a, B, H, R, E, G, M, m, d, t)
  • Business hours in the database do not match OSCAR's schedule
  • maxAdvanceBookingDays or minAdvanceBookingHours filtering out valid slots

Diagnosis:

# Check what the adapter returns raw
pm2 logs vitara-admin-api --lines 100 | grep -i "getDayWorkSchedule\|scheduleCode\|NON_BOOKABLE\|available"

# Enable debug logging temporarily
pm2 restart vitara-admin-api --update-env -- --env VITARA_DEBUG=true LOG_LEVEL=trace
pm2 logs vitara-admin-api --lines 200 | grep "PHI-DEBUG"

# Remember to disable debug after investigation
pm2 restart vitara-admin-api --update-env -- --env VITARA_DEBUG=false LOG_LEVEL=info

Solution:

  • Verify the provider has schedule templates configured in OSCAR
  • Check that the schedule codes are bookable (not in the non-bookable list)
  • Verify ScheduleSettings in the database match the clinic's actual hours

9. Onboarding Issues

EMR Connection Test Failure

Symptom: Onboarding step 2 (EMR Connection) fails validation.

Causes:

  • OSCAR SOAP credentials not saved or encrypted incorrectly
  • OSCAR server unreachable from the Vitara server
  • Wrong OSCAR URL format

Diagnosis:

# Check the onboarding pre-launch validation
pm2 logs vitara-admin-api --lines 50 | grep -i "emr_connection\|SOAP\|onboarding"

# Test OSCAR SOAP connectivity
curl -s --connect-timeout 5 <OSCAR_SOAP_URL>/ws/ScheduleService?wsdl | head -5

Solution:

  1. Verify the OSCAR URL includes the context path (e.g., https://oscar.clinic.com/oscar)
  2. Verify SOAP username/password are correct (test via OSCAR web UI login)
  3. Ensure the Vitara server can reach the OSCAR host (no firewall blocking)
  4. Re-save EMR credentials through the admin UI

Pre-Launch Validation Failures

Symptom: The 10-point pre-launch check fails on one or more items.

The checks and their requirements:

# Check ID Required? What It Validates
1 clinic_info Yes Name, phone, address, timezone set
2 business_hours Yes At least 1 open day configured
3 active_providers Yes At least 1 provider with OSCAR ID
4 emr_connection Yes SOAP credentials present and encrypted
5 vapi_assigned Yes Vapi squad ID linked to clinic
6 privacy_officer Yes Privacy officer name and email set
7 credentials_encrypted Yes OSCAR credentials encrypted with AES-256-GCM
8 test_call No Informational only
9 oscar_config_synced No Informational — checks sync within 7 days
10 schedule_data_flow No Informational — tests slot retrieval via adapter

Diagnosis:

# Run validation via API
curl -s -H "Authorization: Bearer <jwt-token>" \
  http://localhost:3002/api/clinic/onboarding/validate | python3 -m json.tool

Non-Blocking Checks

Checks test_call, schedule_data_flow, and oscar_config_synced are informational and do not block go-live. All other checks must pass.


10. Logs Reference

PM2 Log Commands

# View recent logs (stdout + stderr)
pm2 logs vitara-admin-api --lines 100

# Stream logs in real-time
pm2 logs vitara-admin-api

# Error logs only
pm2 logs vitara-admin-api --err --lines 50

# Output logs only (no errors)
pm2 logs vitara-admin-api --out --lines 50

# Flush (clear) log files
pm2 flush vitara-admin-api

Pino Structured Logging

The server uses Pino for structured JSON logging in production and pretty-printed output in development.

Log Levels (from most to least verbose):

Level When Used
trace PHI-DEBUG mode only (sensitive data, NEVER in production)
debug Development default: detailed operation traces
info Production default: SOAP client creation, circuit breaker state changes, onboarding events
warn Advisory lock failures, degraded health, circuit breaker OPEN
error Uncaught exceptions, database failures, auth rejections
fatal Process about to exit

Change Log Level at Runtime:

# Temporarily increase verbosity (resets on restart)
pm2 restart vitara-admin-api --update-env -- --env LOG_LEVEL=debug

# Enable PHI-DEBUG mode (trace + PHI data — use with extreme caution)
pm2 restart vitara-admin-api --update-env -- --env VITARA_DEBUG=true LOG_LEVEL=trace

# Reset to production defaults
pm2 restart vitara-admin-api --update-env -- --env LOG_LEVEL=info VITARA_DEBUG=false

PHI-DEBUG Mode

Setting VITARA_DEBUG=true enables [PHI-DEBUG] trace logging that may include Protected Health Information (patient names, phone numbers, demographics). Never enable in production unless actively debugging a critical issue, and disable immediately after.

Log Locations

Source Location
PM2 stdout ~/.pm2/logs/vitara-admin-api-out.log
PM2 stderr ~/.pm2/logs/vitara-admin-api-error.log
NGINX access /var/log/nginx/access.log
NGINX error /var/log/nginx/error.log
PostgreSQL /var/log/postgresql/

Searching Logs

# Search for errors in PM2 logs
pm2 logs vitara-admin-api --lines 500 --nostream | grep -i "error\|fatal"

# Search for specific request patterns
pm2 logs vitara-admin-api --lines 500 --nostream | grep "tool-calls"

# Search NGINX for 5xx errors
sudo grep " 50[0-9] " /var/log/nginx/access.log | tail -20

# Search for circuit breaker events
pm2 logs vitara-admin-api --lines 500 --nostream | grep "circuit.breaker\|OPEN\|HALF-OPEN\|CLOSED"

11. Health Checks

Quick Health Check

# Via NGINX (tests full path)
curl -s https://api.vitaravox.ca/health | python3 -m json.tool

# Direct to API (bypasses NGINX)
curl -s http://localhost:3002/health | python3 -m json.tool

Expected healthy response:

{
  "status": "healthy",
  "timestamp": "2026-02-16T12:00:00.000Z",
  "services": {
    "database": { "status": "healthy", "latencyMs": 2 },
    "oscarBridge": { "status": "healthy", "latencyMs": 45 },
    "vapiApi": { "status": "healthy", "latencyMs": 120 }
  },
  "uptime": 86400
}

Comprehensive Health Check Script

Save as ~/check-vitara-health.sh and run with bash ~/check-vitara-health.sh:

#!/usr/bin/env bash
set -euo pipefail

RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'

pass() { echo -e "  ${GREEN}PASS${NC} $1"; }
fail() { echo -e "  ${RED}FAIL${NC} $1"; }
warn() { echo -e "  ${YELLOW}WARN${NC} $1"; }

echo "=== VitaraVox Health Check ==="
echo "  Timestamp: $(date -Iseconds)"
echo ""

# 1. PM2 process
echo "[PM2]"
if pm2 pid vitara-admin-api > /dev/null 2>&1 && [ "$(pm2 pid vitara-admin-api)" != "" ]; then
  RESTARTS=$(pm2 jlist 2>/dev/null | python3 -c "
import sys, json
data = json.load(sys.stdin)
for p in data:
    if p['name'] == 'vitara-admin-api':
        print(p.get('pm2_env', {}).get('restart_time', 'N/A'))
" 2>/dev/null || echo "N/A")
  pass "vitara-admin-api running (restarts: ${RESTARTS})"
  if [ "$RESTARTS" != "N/A" ] && [ "$RESTARTS" -gt 10 ]; then
    warn "High restart count ($RESTARTS) — check for crash loops"
  fi
else
  fail "vitara-admin-api NOT running"
fi
echo ""

# 2. PostgreSQL
echo "[PostgreSQL]"
if pg_isready -h localhost -p 5432 -q 2>/dev/null; then
  pass "PostgreSQL accepting connections on port 5432"
else
  fail "PostgreSQL not responding on port 5432"
fi

if psql -h localhost -U vitara -d vitara_platform -c "SELECT 1" > /dev/null 2>&1; then
  TABLE_COUNT=$(psql -h localhost -U vitara -d vitara_platform -t -c "
    SELECT count(*) FROM information_schema.tables
    WHERE table_schema = 'public';" 2>/dev/null | tr -d ' ')
  pass "Database vitara_platform accessible (${TABLE_COUNT} tables)"
else
  fail "Cannot connect to vitara_platform database"
fi
echo ""

# 3. API Health
echo "[API Server]"
HEALTH=$(curl -s --connect-timeout 5 http://localhost:3002/health 2>/dev/null)
if [ -n "$HEALTH" ]; then
  STATUS=$(echo "$HEALTH" | python3 -c "import sys,json; print(json.load(sys.stdin).get('status','unknown'))" 2>/dev/null || echo "parse_error")
  if [ "$STATUS" = "healthy" ]; then
    pass "API health: healthy"
  elif [ "$STATUS" = "degraded" ]; then
    warn "API health: degraded"
    echo "$HEALTH" | python3 -m json.tool 2>/dev/null
  else
    fail "API health: $STATUS"
  fi
else
  fail "API not responding on port 3002"
fi
echo ""

# 4. NGINX
echo "[NGINX]"
if sudo systemctl is-active nginx > /dev/null 2>&1; then
  pass "NGINX service active"
else
  fail "NGINX service not active"
fi

if sudo nginx -t 2>&1 | grep -q "successful"; then
  pass "NGINX config valid"
else
  fail "NGINX config invalid"
fi

# SSL check
CERT_EXPIRY=$(echo | openssl s_client -connect api.vitaravox.ca:443 -servername api.vitaravox.ca 2>/dev/null | openssl x509 -noout -enddate 2>/dev/null | cut -d= -f2)
if [ -n "$CERT_EXPIRY" ]; then
  EXPIRY_EPOCH=$(date -d "$CERT_EXPIRY" +%s 2>/dev/null || echo 0)
  NOW_EPOCH=$(date +%s)
  DAYS_LEFT=$(( (EXPIRY_EPOCH - NOW_EPOCH) / 86400 ))
  if [ "$DAYS_LEFT" -gt 14 ]; then
    pass "SSL certificate valid ($DAYS_LEFT days remaining)"
  elif [ "$DAYS_LEFT" -gt 0 ]; then
    warn "SSL certificate expires in $DAYS_LEFT days — renew soon"
  else
    fail "SSL certificate EXPIRED"
  fi
else
  warn "Could not check SSL certificate"
fi
echo ""

# 5. Disk & Memory
echo "[System Resources]"
DISK_PCT=$(df / --output=pcent | tail -1 | tr -d ' %')
if [ "$DISK_PCT" -lt 80 ]; then
  pass "Disk usage: ${DISK_PCT}%"
elif [ "$DISK_PCT" -lt 90 ]; then
  warn "Disk usage: ${DISK_PCT}% — getting full"
else
  fail "Disk usage: ${DISK_PCT}% — critically full"
fi

MEM_PCT=$(free | awk '/Mem:/ {printf "%.0f", $3/$2*100}')
if [ "$MEM_PCT" -lt 80 ]; then
  pass "Memory usage: ${MEM_PCT}%"
elif [ "$MEM_PCT" -lt 90 ]; then
  warn "Memory usage: ${MEM_PCT}%"
else
  fail "Memory usage: ${MEM_PCT}% — critically high"
fi

echo ""
echo "=== Check Complete ==="

PM2 Monitoring

# Real-time dashboard
pm2 monit

# Process list with memory/CPU
pm2 status

# Detailed process info
pm2 show vitara-admin-api

# JSON process info (for scripting)
pm2 jlist

12. Debug Mode (VITARA_DEBUG)

Source: lib/debug-manager.ts:1-164

Debug mode enables verbose PHI logging in PM2 logs for diagnosing production issues. It auto-expires after 4 hours to prevent accidental exposure.

DEBUG MODE LIFECYCLE

  Activate (env or API)
┌────────────────────────┐
│ ACTIVE                 │
│  • Log level → trace   │
│  • PHI appears in logs │
│  • 4h timer starts     │
└────────┬───────────────┘
    ┌────┴────────┐
    │             │
 4h expiry    Manual disable
    │         (API call)
    ▼             │
┌────────────────────────┐
│ INACTIVE               │
│  • Log level → normal  │
│  • PHI stripped         │
│  • Timer cleared       │
└────────────────────────┘

Activation Methods

Method Command Requires Restart?
Environment Set VITARA_DEBUG=true in .env Yes (pm2 restart)
Runtime API POST /api/admin/debug {"enabled": true} No

Checking Status

# Via API
curl -s -H "Authorization: Bearer <jwt>" \
  http://localhost:3002/api/admin/debug | python3 -m json.tool

# Response:
# {
#   "active": true,
#   "activatedAt": "2026-03-09T10:00:00.000Z",
#   "expiresAt": "2026-03-09T14:00:00.000Z",
#   "source": "api"
# }

Disabling

# Via API (immediate, no restart)
curl -s -X POST -H "Authorization: Bearer <jwt>" \
  -H "Content-Type: application/json" \
  -d '{"enabled": false}' \
  http://localhost:3002/api/admin/debug

# Or wait for auto-expiry (4 hours)

Security

Debug mode does NOT disable any security controls (HMAC, CORS, rate limiting). It only enables verbose logging that includes PHI (patient names, DOB, phone numbers) in PM2 logs. Never enable unless actively diagnosing a critical issue.


13. SMS Troubleshooting

SMS Not Sending

Symptom: Appointment booked/rescheduled/cancelled but no SMS received.

Guard chain (source: sms.service.ts:55-100): All 5 checks must pass for an SMS to send.

SMS GUARD CHAIN

  fireSmsBehindWebhook() called
  1. TELNYX_API_KEY set?  ── NO ──► skip (no Telnyx configured)
         │ YES
  2. smsSenderNumber set?  ── NO ──► skip (clinic has no SMS number)
         │ YES
  3. smsConsent ≠ false?  ── NO ──► skip (patient declined)
         │ YES
  4. Patient phone valid?  ── NO ──► skip (no E.164 phone)
         │ YES
  5. smsEnabled ≠ false?  ── NO ──► skip (clinic disabled SMS)
         │ YES
  SEND via Telnyx API

Diagnosis:

# Check env vars
pm2 env vitara-admin-api | grep TELNYX

# Check clinic SMS config
psql -U vitara -d vitara_platform -c "
  SELECT \"smsSenderNumber\", \"smsEnabled\", \"smsLanguage\"
  FROM \"ClinicConfig\"
  WHERE \"clinicId\" = '<clinic-id>';"

# Check recent SMS attempts in logs
pm2 logs vitara-admin-api --lines 100 | grep -i "sms\|telnyx"

Solutions:

Guard Failed Fix
No TELNYX_API_KEY Add to .env, restart PM2
No smsSenderNumber Set via Admin UI → Clinic → SMS Config
smsConsent = false Patient declined — cannot override
Invalid phone Check patientPhone in OSCAR — must be 10-digit Canadian
smsEnabled = false Enable via Admin UI or API: POST /api/clinic/config/sms

Symptom: smsConsent is always true even when patient declined.

Cause: The voice agent must pass smsConsent: false in the create_appointment, update_appointment, or cancel_appointment tool call if the patient declined during the Patient-ID phase.

Diagnosis: Check Vapi call transcript for the consent disclosure response. If the patient said "no texts" but the tool call still has smsConsent: true, the prompt may need updating.

See: SMS Integration and Prompt Engineering — SMS Consent UX


14. Cache & Adapter Issues

EMR Adapter Cache

Source: EmrAdapterFactory.ts:43-130

The adapter factory caches EMR adapter instances per clinic with a 5-minute TTL. Stale cache can cause issues after credential changes.

Symptom: OSCAR calls fail with auth errors after credentials were updated.

Diagnosis:

pm2 logs vitara-admin-api --lines 50 | grep -i "adapter.*cache\|cache.*hit\|cache.*miss\|warmUp"

Solution:

# Restart PM2 to clear all adapter caches
pm2 restart vitara-admin-api

# The 5-min TTL means waiting also works, but restart is faster

Circuit Breaker Stuck Open

Symptom: All OSCAR calls return 503 even though OSCAR is reachable.

The circuit breaker opens after 50% failure rate within its monitoring window. Once open, it stays open for 30 seconds, then enters half-open state.

Diagnosis:

# Check breaker state
pm2 logs vitara-admin-api --lines 200 | grep -i "circuit\|OPEN\|HALF-OPEN\|CLOSED"

# Verify OSCAR is actually reachable
curl -s --connect-timeout 5 <OSCAR_URL>/ws/ScheduleService?wsdl | head -5

Solution:

  • Wait 30 seconds for half-open → one test request → auto-close
  • If OSCAR is down: fix OSCAR first, breaker auto-recovers
  • If persistent: restart PM2 to reset breaker state

Split Circuit Breakers

SOAP and REST adapters have independent circuit breakers. A SOAP failure does not trip the REST breaker and vice versa. For preferRest clinics, the REST breaker must be healthy for most operations.


Getting Help

If issues persist after following this guide:

  1. Collect diagnostics:

    pm2 logs vitara-admin-api --lines 200 --nostream > /tmp/vitara-logs.txt
    pm2 status > /tmp/vitara-status.txt
    curl -s http://localhost:3002/health > /tmp/vitara-health.json
    

  2. Check documentation: vitdocs.vitaravox.ca

  3. Contact support: support@vitaravox.com

Include in your support request:

  • Error messages from PM2 logs
  • Output of pm2 status
  • Output of the health check script
  • Steps to reproduce the issue
  • Environment (production / development)