Troubleshooting

Troubleshooting Preview Environments #

This guide helps you diagnose and resolve common issues with Preview Environments.

Quick Diagnostics #

Before diving into specific issues, run these checks:

1. Check GitHub Actions Status #

# Navigate to your PR on GitHub
→ Click "Actions" tab
→ Find "Preview Environment" workflow
→ Check for errors or failures

2. Check Coder Workspace Status #

# Visit Coder Dashboard
→ Go to https://coder.internal.gotofu.com/workspaces
→ Find workspace: coder-preview-{PR-number}
→ Check status: should be "Running" and "Healthy"

3. Verify Preview URL #

# From PR comments, find the preview URL
→ It should match: https://app--dev--{workspace}--{owner}.coder.internal.gotofu.com/
→ Click the link to test access

Common Issues #

Environment Creation Problems #

Issue: “Workflow fails at workspace creation step” #

Symptoms:

  • GitHub Actions shows error in “Create workspace” step
  • No workspace appears in Coder dashboard

Common Causes:

  1. Coder service issue

    Check:
    - Is Coder dashboard accessible?
    - Are other workspaces working?
    
    Solution:
    - Wait a few minutes and retry
    - Contact DevOps if Coder is down
    
  2. AWS capacity issues

    Error: "InsufficientInstanceCapacity"
    
    Solution:
    - GitHub Actions will automatically retry
    - If persists, try different time of day
    - Contact DevOps to check AWS status
    
  3. Invalid parameters

    Error: "Parameter validation failed"
    
    Check:
    - PR URL is correct
    - Tokens are valid (not expired)
    
    Solution:
    - Verify PR is against 'main' branch
    - Check if workflow parameters are correct
    

Issue: “Workspace created but build fails” #

Symptoms:

  • Workspace appears in Coder
  • Build status shows “Failed”
  • Health check never passes

Common Causes:

  1. Compilation errors

    Check build logs for:
    - Rust compilation errors
    - TypeScript type errors
    - Python dependency issues
    
    Solution:
    - Fix errors in your code
    - Push new commit to trigger rebuild
    - Verify changes work locally first
    
  2. Database migration issues

    Error: "Migration failed" or "Could not apply migrations"
    
    Solution:
    - Check migration files for syntax errors
    - Ensure migrations are sequential
    - Test migrations locally: mise run db-migrate
    
  3. Docker image pull failures

    Error: "Failed to pull image" or "Image not found"
    
    Solution:
    - Usually temporary network issue
    - Wait and retry
    - Check if image exists in registry
    

Issue: “Build takes too long (>60 minutes)” #

Symptoms:

  • Workspace building for over an hour
  • No error messages but not completing

Diagnosis:

# Access workspace terminal via Coder
→ Open workspace in Coder dashboard
→ Click "Terminal"
→ Check processes: ps aux | grep docker
→ Check logs: docker compose logs -f

Solutions:

  1. Check build progress

    # In workspace terminal
    cd ~/bonsai
    docker compose -f docker-compose.preview.yml ps
    docker compose -f docker-compose.preview.yml logs --tail=100
    
  2. Restart stuck services

    # Restart all services
    docker compose -f docker-compose.preview.yml restart
    
    # Or restart specific service
    docker compose -f docker-compose.preview.yml restart bonsapi
    
  3. Nuclear option: Rebuild workspace

    # Remove preview label from PR
    → Wait 1 minute
    → Re-add preview label
    

Access Issues #

Issue: “Cannot access preview URL” #

Symptoms:

  • Preview URL returns 404 or connection timeout
  • “This site can’t be reached”

Diagnosis:

  1. Verify workspace is running

    Coder Dashboard → Find workspace
    Status should be: "Running" + "Healthy"
    
    If not:
    - Status "Stopped": Start workspace
    - Status "Unhealthy": Check build logs
    
  2. Check URL format

    Correct format:
    https://app--dev--coder-preview-123--john.coder.internal.gotofu.com/
    
    Common mistakes:
    - Missing 'https://'
    - Extra spaces
    - Wrong subdomain
    
  3. Verify network connectivity

    # Test if you can reach Coder
    ping coder.internal.gotofu.com
    
    # If fails: Check VPN connection
    # internal.gotofu.com requires VPN
    

Solutions:

  1. Start workspace

    Coder Dashboard → Workspace → Click "Start"
    Wait 2-3 minutes for services to start
    
  2. Restart Coder agent

    # In workspace terminal
    sudo systemctl restart coder
    
  3. Verify nginx is running

    # In workspace terminal
    cd ~/bonsai
    docker compose -f docker-compose.preview.yml ps nginx-proxy
    
    # Should show "Up"
    # If not:
    docker compose -f docker-compose.preview.yml restart nginx-proxy
    

Issue: “502 Bad Gateway” error #

Symptoms:

  • Can reach URL but see “502 Bad Gateway”
  • Intermittent access

Common Causes:

  1. Service not ready

    Services still starting up
    
    Solution:
    - Wait 5-10 minutes
    - Check service status
    
  2. Backend service crashed

    # Check service status
    docker compose -f docker-compose.preview.yml ps
    
    # Check logs for crashes
    docker compose -f docker-compose.preview.yml logs bonsapi --tail=100
    
  3. Nginx configuration issue

    # Test nginx config
    docker exec bonsai-nginx-proxy nginx -t
    
    # Restart nginx if needed
    docker compose -f docker-compose.preview.yml restart nginx-proxy
    

Solutions:

  1. Restart failing service

    # Identify failed service
    docker compose -f docker-compose.preview.yml ps
    
    # Restart it
    docker compose -f docker-compose.preview.yml restart <service-name>
    
  2. Check resource usage

    # In workspace terminal
    docker stats
    
    # If memory/CPU maxed out:
    # Restart heavy services or rebuild
    

Issue: “Authentication loop” or “Cannot sign in” #

Symptoms:

  • Redirected to login repeatedly
  • “Authentication failed” error
  • Stuck on auth page

Common Causes:

  1. Cookie issues

    Solution:
    - Clear browser cookies for the domain
    - Try incognito/private window
    - Try different browser
    
  2. Clerk configuration issue

    Check:
    - Environment variables set correctly
    - CLERK_SECRET_KEY present
    - Domain configuration correct
    
    Solution:
    - Verify .env file in workspace
    - Restart webapp service
    
  3. Session storage issues

    # Check Redis is running
    docker compose -f docker-compose.preview.yml ps redis
    
    # If not healthy:
    docker compose -f docker-compose.preview.yml restart redis
    

Update/Sync Issues #

Issue: “Environment not updating after push” #

Symptoms:

  • Pushed new commits but changes not reflected
  • Old code still running

Diagnosis:

  1. Check if workflow triggered

    GitHub → Actions → Look for new "Preview Environment" run
    Should trigger on every push to PR branch
    
    If not triggering:
    - Verify PR has 'preview' label
    - Check workflow file for conditions
    
  2. Check rebuild status

    Coder Dashboard → Workspace → Check "Latest Build"
    Should show recent build time
    

Solutions:

  1. Manual restart

    Coder Dashboard → Workspace → Click "Restart"
    This forces rebuild with latest code
    
  2. Check build logs

    Coder Dashboard → Workspace → "Builds" → Latest
    Look for errors in build process
    
  3. Force rebuild via GitHub

    # Remove and re-add label
    PR → Remove 'preview' label
    Wait 1 minute
    PR → Add 'preview' label back
    

Issue: “Rebuild fails after working initially” #

Symptoms:

  • Environment worked before
  • After update, build fails
  • Previously green builds now red

Common Causes:

  1. New code has errors

    Solution:
    - Check build logs for compile errors
    - Test locally: mise run ci
    - Fix errors and push again
    
  2. New dependencies

    If you added new dependencies:
    - Ensure they're in package.json/Cargo.toml
    - Check for platform compatibility
    - Verify versions are correct
    
  3. Database migration issues

    Error: "Migration conflict" or "Duplicate migration"
    
    Solution:
    - Check migration file names
    - Ensure sequential numbering
    - Verify no conflicts with main branch
    

Performance Issues #

Issue: “Environment is very slow” #

Symptoms:

  • Pages load slowly
  • API responses delayed
  • Timeouts

Common Causes:

  1. Resource exhaustion

    # Check resource usage
    docker stats
    
    # High CPU/Memory usage indicates:
    - Heavy background job running
    - Memory leak
    - Inefficient code
    
    Solution:
    - Check background job queues
    - Review code for performance issues
    - Restart services to clear memory
    
  2. Database performance

    # Check database size
    docker exec -it database psql -U postgres -c "SELECT pg_size_pretty(pg_database_size('bonsai'));"
    
    # Check slow queries
    docker exec -it database psql -U postgres -c "SELECT * FROM pg_stat_activity WHERE state = 'active';"
    
  3. Too many concurrent processes

    # Check running containers
    docker ps
    
    # Stop unnecessary services
    docker compose -f docker-compose.preview.yml stop <service-name>
    

Solutions:

  1. Restart services

    docker compose -f docker-compose.preview.yml restart
    
  2. Clear cache

    # Clear Redis cache
    docker exec -it bonsai-redis redis-cli FLUSHALL
    
  3. Optimize queries

    - Review database query performance
    - Add indexes if needed
    - Optimize N+1 queries
    

Issue: “Out of disk space” #

Symptoms:

  • “No space left on device” errors
  • Cannot write files
  • Database errors

Diagnosis:

# Check disk usage
df -h

# Check Docker disk usage
docker system df

# Find large directories
du -sh ~/bonsai/* | sort -h

Solutions:

  1. Clean Docker cache

    # Remove unused images
    docker image prune -a
    
    # Remove unused volumes
    docker volume prune
    
    # Remove build cache
    docker builder prune
    
  2. Clean build artifacts

    cd ~/bonsai
    
    # Clean Rust builds
    cargo clean
    
    # Clean Node modules (will need reinstall)
    rm -rf node_modules apps/webapp/node_modules
    
  3. Request larger disk

    Contact DevOps to:
    - Increase EBS volume size
    - Or provision new workspace with larger disk
    

Integration Issues #

For integration-specific issues, see Integration Setup guide.

Issue: “OAuth callback fails” #

Quick Fixes:

  • Verify redirect URI configured correctly
  • Check URL exactly matches (including protocol)
  • Contact developer to verify OAuth app settings

Issue: “Integration disconnects frequently” #

Diagnosis:

  • Check token expiration times
  • Verify webhook URLs if applicable
  • Review integration logs

Solutions:

  • Reconnect integration
  • Check for API rate limits
  • Verify credentials haven’t expired

Service-Specific Issues #

BonsAPI (Backend) #

Check if running:

docker compose -f docker-compose.preview.yml ps bonsapi

View logs:

docker compose -f docker-compose.preview.yml logs -f bonsapi

Common errors:

  • Database connection failed → Check DATABASE_URL
  • Redis connection failed → Restart Redis
  • RabbitMQ connection failed → Check RabbitMQ health

Webapp (Frontend) #

Check if running:

docker compose -f docker-compose.preview.yml ps webapp

View logs:

docker compose -f docker-compose.preview.yml logs -f webapp

Common errors:

  • API connection failed → Check NEXT_PUBLIC_BONSAPI_HOST
  • Build errors → Check TypeScript compilation
  • Module not found → Reinstall dependencies

Database (PostgreSQL) #

Check if running:

docker compose -f docker-compose.preview.yml ps database

Access database:

docker exec -it database psql -U postgres -d bonsai

Common issues:

  • Connection refused → Database not started
  • Authentication failed → Check credentials
  • Too many connections → Restart services

Background Workers #

Check status:

# View all workers
docker compose -f docker-compose.preview.yml ps | grep bonsai-

# Check specific worker
docker compose -f docker-compose.preview.yml logs bonsai-invoice

Common issues:

  • Not processing jobs → Check RabbitMQ queues
  • Crashing repeatedly → Review error logs
  • High CPU usage → Check for infinite loops

Advanced Debugging #

Accessing Workspace Terminal #

# Via Coder web terminal
Coder Dashboard → Workspace → Terminal

# Or via SSH
ssh coder.coder-preview-{PR-number}

Useful Commands #

# See all services
docker compose -f docker-compose.preview.yml ps

# Follow logs for all services
docker compose -f docker-compose.preview.yml logs -f

# Check specific service logs
docker compose -f docker-compose.preview.yml logs -f <service-name>

# Restart specific service
docker compose -f docker-compose.preview.yml restart <service-name>

# Check resource usage
docker stats

# Check disk space
df -h

# Check memory
free -h

# Check running processes
ps aux | grep docker

Database Debugging #

# Access database shell
docker exec -it database psql -U postgres -d bonsai

# Check tables
\dt

# Check recent migrations
SELECT * FROM atlas_schema_revisions ORDER BY executed_at DESC LIMIT 5;

# Check database size
SELECT pg_size_pretty(pg_database_size('bonsai'));

# Find slow queries
SELECT pid, query_start, state, query
FROM pg_stat_activity
WHERE state = 'active';

RabbitMQ Debugging #

# Access RabbitMQ management UI
# Via Coder workspace apps or:
http://localhost:15672

# Check queues via CLI
docker exec bonsai-rabbitmq rabbitmqctl list_queues

# Check connections
docker exec bonsai-rabbitmq rabbitmqctl list_connections

Getting Help #

Before Asking for Help #

Collect this information:

  1. Preview URL

    Example: https://app--dev--coder-preview-123--john.coder.internal.gotofu.com/
    
  2. PR Number and Link

    Example: #123 - https://github.com/tofu2-limited/bonsai/pull/123
    
  3. Error Message

    Copy exact error message from:
    - Browser console (F12)
    - GitHub Actions logs
    - Coder build logs
    - Service logs
    
  4. Steps to Reproduce

    1. Go to...
    2. Click...
    3. See error...
    
  5. What You’ve Tried

    - Restarted services
    - Cleared cache
    - etc.
    

Where to Get Help #

For Infrastructure Issues:

  • Slack: #devops
  • Contact: DevOps team

For Application Issues:

  • Slack: #engineering
  • Tag: @backend-team or @frontend-team

For Integration Issues:

  • Slack: #engineering
  • Tag: @integration-team

For Urgent Production Impact:

  • Slack: #incidents
  • Follow incident response process

Prevention Tips #

Before Creating Preview #

  1. ✅ Run mise run ci locally
  2. ✅ Test changes on local environment
  3. ✅ Ensure migrations work
  4. ✅ Verify no compilation errors
  5. ✅ Check PR is against ‘main’ branch

During Testing #

  1. ✅ Monitor resource usage
  2. ✅ Check logs regularly
  3. ✅ Document issues as you find them
  4. ✅ Clean up test data periodically
  5. ✅ Report problems early

After Testing #

  1. ✅ Document any environment-specific issues
  2. ✅ Close PR when done (auto-deletes environment)
  3. ✅ Share learnings with team
  4. ✅ Update this documentation if needed

Next Steps #