Troubleshooting Guide
Troubleshooting Guide
This guide covers common issues you may encounter when installing, configuring, or running OpsKnight.
Table of Contents
- Installation Issues
- Database Connection Problems
- Authentication Failures
- Notification Delivery Issues
- Performance Troubleshooting
- Debug Logging
Installation Issues
npm install fails with permission errors
Symptoms:
EACCESpermission denied errorsEPERMoperation not permitted
Solutions:
-
Don't use sudo with npm. Instead, fix npm permissions:
mkdir ~/.npm-global npm config set prefix '~/.npm-global' echo 'export PATH=~/.npm-global/bin:$PATH' >> ~/.bashrc source ~/.bashrc -
Use a Node version manager like nvm or fnm:
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.0/install.sh | bash nvm install 20 nvm use 20
Build fails with "out of memory"
Symptoms:
JavaScript heap out of memory- Build process killed
Solutions:
-
Increase Node.js memory limit:
export NODE_OPTIONS="--max-old-space-size=4096" npm run build -
Use the standalone build (recommended for production):
npm run build # This creates .next/standalone with optimized bundle
Optional dependencies fail to install
Symptoms:
- Warnings about
twilio,resend, or@aws-sdk/client-sns - These are optional and won't prevent OpsKnight from running
Solution: Install only the providers you need:
# For Twilio SMS/WhatsApp
npm install twilio
# For Resend email
npm install resend
# For AWS SNS
npm install @aws-sdk/client-sns
# For SendGrid email
npm install @sendgrid/mail
Database Connection Problems
"Connection refused" or "Connection timed out"
Symptoms:
ECONNREFUSEDerrorConnection timed outafter deployment
Solutions:
-
Verify PostgreSQL is running:
# Check if PostgreSQL is running pg_isready -h localhost -p 5432 # For Docker docker ps | grep postgres -
Check your DATABASE_URL format:
postgresql://USER:PASSWORD@HOST:PORT/DATABASE?sslmode=require -
For cloud databases (Supabase, Neon, etc.):
- Enable SSL: Add
?sslmode=requireto the connection string - Check firewall rules allow your IP
- Verify the database is in the same region as your app
- Enable SSL: Add
"Prisma Client not initialized"
Symptoms:
@prisma/client did not initialize yetPrismaClient is unable to be run in the browser
Solutions:
-
Regenerate Prisma Client:
npx prisma generate -
For production builds:
# Include in your build script npm run build # This runs prisma generate automatically
Migration errors
Symptoms:
Migration failedduring deploymentP3009: migrate found failed migrations
Solutions:
-
Check migration status:
npx prisma migrate status -
Use the safe migration script:
npm run prisma:migrate:safe -
For failed migrations, use auto-recovery:
npm run prisma:auto-recover -
Manual recovery (last resort):
-- Connect to your database and mark failed migration as applied UPDATE "_prisma_migrations" SET finished_at = NOW(), applied_steps_count = 1 WHERE migration_name = 'YYYYMMDDHHMMSS_migration_name' AND finished_at IS NULL;
Authentication Failures
"Invalid credentials" but credentials are correct
Symptoms:
- Login fails with valid email/password
- "Invalid credentials" error
Solutions:
-
Check if the user exists:
# Use the OpsKnight CLI npm run ops user:list -
Reset the password:
npm run ops user:reset-password --email [email protected] -
Verify the encryption key hasn't changed (see Encryption Key Issues)
Session expires immediately
Symptoms:
- Logged out after every page refresh
- "Session expired" error immediately after login
Solutions:
-
Check NEXTAUTH_URL matches your actual URL:
# .env NEXTAUTH_URL=https://your-actual-domain.com -
Verify cookies are being set:
- Open browser DevTools → Application → Cookies
- Look for
next-auth.session-tokencookie - Check if
Secureflag matches your protocol (HTTPS vs HTTP)
-
For reverse proxy setups, ensure headers are forwarded:
# nginx example proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme;
OIDC/SSO login fails
Symptoms:
- "Invalid state" or "Invalid nonce" errors
- Redirect loop during SSO login
Solutions:
-
Verify callback URL is configured correctly:
- Callback URL should be:
https://your-domain.com/api/auth/callback/oidc - This must exactly match what's configured in your IdP
- Callback URL should be:
-
Check the OIDC configuration in Settings → Security → SSO
-
For Azure AD: Ensure
emailclaim is included in the token -
Enable debug logging (see Debug Logging)
Notification Delivery Issues
Emails not being sent
Symptoms:
- No emails received
- "Email provider not configured" warnings in logs
Solutions:
-
Verify email provider is configured:
- Go to Settings → System → Email Providers
- Ensure a provider is enabled and has valid credentials
-
Check the from email address:
- For Resend/SendGrid: Domain must be verified
- For SMTP: Check your mail server allows this sender
-
Test email delivery:
- Go to Settings → Notifications → Test Email
- Check logs for delivery errors
SMS not being delivered
Symptoms:
- SMS notifications not received
- "Twilio package not installed" error
Solutions:
-
Install the Twilio package:
npm install twilio -
Verify Twilio configuration:
- Account SID and Auth Token in Settings → System → Notification Providers
- From Number must be a valid Twilio phone number
- For trial accounts: Target number must be verified
-
Check phone number format:
- Must be in E.164 format:
+15551234567 - Include country code
- Must be in E.164 format:
Push notifications not working
Symptoms:
- "Push notifications not enabled" error
- Browser doesn't prompt for notification permission
Solutions:
-
Check VAPID keys are configured:
- Settings → System → Notification Providers → Web Push
- Generate new keys if needed:
npx web-push generate-vapid-keys
-
Verify HTTPS:
- Push notifications only work over HTTPS
- Exception:
localhostfor development
-
Check browser permissions:
- Click the lock icon in the URL bar
- Ensure notifications are "Allowed"
Performance Troubleshooting
Slow page loads
Symptoms:
- Pages take several seconds to load
- Timeout errors
Solutions:
-
Check database query performance:
# Enable query logging LOG_LEVEL=debug npm start -
Optimize database:
-- Check for missing indexes SELECT schemaname, tablename, indexname FROM pg_indexes WHERE schemaname = 'public'; -- Analyze tables ANALYZE; -
Increase connection pool:
DATABASE_URL="...?connection_limit=20"
High memory usage
Symptoms:
- Server runs out of memory
- OOM killer terminates process
Solutions:
-
For standalone builds, memory usage should be lower:
node .next/standalone/server.js -
Set appropriate memory limits:
# Docker docker run --memory=512m opsknight # Node.js NODE_OPTIONS="--max-old-space-size=512" node server.js -
Check for memory leaks using Node.js diagnostics:
node --inspect .next/standalone/server.js # Connect Chrome DevTools to take heap snapshots
Cron jobs not running
Symptoms:
- Escalations not triggered on schedule
- Scheduled jobs stuck as "pending"
Solutions:
-
Check cron scheduler status:
- Go to Settings → System → Background Jobs
- Verify "Cron Scheduler" shows as "Running"
-
Enable internal cron:
# Default is enabled ENABLE_INTERNAL_CRON=true -
Check for lock issues:
-- View scheduler state SELECT * FROM "CronSchedulerState"; -- Clear stale lock if needed UPDATE "CronSchedulerState" SET "lockedBy" = NULL, "lockedAt" = NULL WHERE id = 'singleton';
Debug Logging
Enable verbose logging
# Set log level
LOG_LEVEL=debug npm start
# For JSON output (better for log aggregation)
LOG_FORMAT=json npm start
Log levels
| Level | Description |
|---|---|
error |
Only errors |
warn |
Errors and warnings |
info |
Normal operation logs (default) |
debug |
Detailed debugging information |
Common log locations
- Application logs: stdout/stderr (or configured log destination)
- Database logs: PostgreSQL log files
- Cron scheduler: Look for
[Cron]prefix in logs - Notifications: Look for
component: 'sms',component: 'email', etc.
Encryption Key Issues
If you see CRITICAL: Encryption Key failed canary check:
- The encryption key has changed or is invalid
- Data encrypted with the old key cannot be decrypted
Recovery steps:
- Restore the original
ENCRYPTION_KEYfrom backup - If key is lost, you'll need to re-enter all encrypted credentials (API keys, etc.)
Getting Help
If you're still stuck:
- Search existing issues: GitHub Issues
- Check discussions: GitHub Discussions
- Open a new issue with:
- OpsKnight version
- Node.js version
- PostgreSQL version
- Relevant log output (redact sensitive data!)
- Steps to reproduce
Last updated for v1
Edit this page on GitHub