Database Access

Database Access #

This runbook covers how to safely access production and development databases for troubleshooting, data inspection, and emergency operations.

When to Use This Runbook #

  • Investigating data-related issues in production
  • Running read-only queries to diagnose problems
  • Verifying data integrity after incidents
  • Performing emergency data fixes (with proper approvals)
  • Analyzing slow queries or database performance

Overview #

BonsAI uses PostgreSQL databases across environments:

Environment Database Access Method
Local Coder workspace CloudBeaver or direct connection
Development RDS PostgreSQL ?
Production RDS PostgreSQL Via Tailscale VPN

Local/Development Database Access #

CloudBeaver provides a web-based interface for database management.

Access from Coder Workspace:

  1. Open your Coder workspace
  2. Click “Database” app in the workspace dashboard
  3. Access at http://localhost:5432

First-Time Setup:

# If CloudBeaver is not running
mise run dev

Then follow the setup wizard:

  1. Username: admin
  2. Password: admin123
  3. Add PostgreSQL connection:
    • Host: database
    • Port: 5432
    • Database: bonsai
    • Username: postgres
    • Password: postgres

See: CloudBeaver Documentation for detailed usage.

Direct Database Connection #

For command-line access:

# Connect to local database
psql postgresql://postgres:postgres@localhost:5432/bonsai

# Or using Docker
docker exec -it bonsai-database psql -U postgres -d bonsai

Production Database Access #

Important Security Notice #

CRITICAL: Production database access should be:

  • Read-only unless absolutely necessary
  • Approved by a team lead or senior engineer
  • Documented in incident logs
  • Used only for troubleshooting and emergency fixes

Prerequisites #

  1. Tailscale VPN Access

    • Contact DevOps team for Tailscale setup
    • Requires company authentication
  2. AWS SSO Configuration

    aws configure sso
    
    • SSO start URL: https://tofu-bonsai.awsapps.com/start
    • Region: eu-central-1
    • Role: full_access (production) or bonsai_developers (dev only)
  3. Database Credentials

    • Stored in 1Password
    • Access with appropriate permissions only

Accessing Production Database #

Step 1: Get Database Connection Details

# Set your AWS profile
export AWS_PROFILE=bonsai-prod

# Get database endpoint from Doppler
doppler secrets get DATABASE_URL --project bonsai --config prod --plain

The DATABASE_URL format:

postgresql://username:password@host:5432/database_name

Step 2: Connect via Tailscale (NOT YET CONFIGURED)

Once Tailscale is set up:

# Ensure Tailscale is running
sudo tailscale up

# Connect to database
psql "postgresql://username:password@internal-host:5432/bonsai"

Step 3: Using CloudBeaver for Production

CloudBeaver can be configured to connect to production:

  1. Add new connection in CloudBeaver
  2. Use Tailscale-accessible hostname
  3. Enter credentials from Doppler
  4. Enable read-only mode for safety

Safe Query Practices #

Always Start with Read-Only Queries #

-- Check current transaction isolation
SHOW transaction_read_only;

-- Set to read-only (session level)
SET SESSION CHARACTERISTICS AS TRANSACTION READ ONLY;

-- Verify data before changes
SELECT * FROM users WHERE id = 'target-id';

Use Transactions for Safety #

-- Start transaction
BEGIN;

-- Make your changes
UPDATE documents SET status = 'processed' WHERE id = 'doc-123';

-- Verify the change
SELECT * FROM documents WHERE id = 'doc-123';

-- If correct, commit. If not, rollback.
COMMIT;  -- or ROLLBACK;

Query Timeouts #

Set statement timeout to prevent long-running queries:

-- Set 30 second timeout
SET statement_timeout = '30s';

-- Your query here
SELECT * FROM large_table WHERE ...;

Common Database Operations #

Investigating User Issues #

-- Find user by email
SELECT id, email, created_at, updated_at
FROM users
WHERE email = 'user@example.com';

-- Check user's documents
SELECT id, name, status, created_at
FROM documents
WHERE user_id = 'user-uuid'
ORDER BY created_at DESC
LIMIT 10;

-- Check user's invoices
SELECT id, invoice_number, status, total_amount
FROM invoices
WHERE organization_id IN (
  SELECT organization_id FROM users WHERE id = 'user-uuid'
)
ORDER BY created_at DESC;

Checking Document Processing Status #

-- Find stuck documents
SELECT id, name, status, created_at, updated_at
FROM documents
WHERE status = 'processing'
  AND updated_at < NOW() - INTERVAL '1 hour'
ORDER BY created_at DESC;

-- Check processing queue
SELECT
  status,
  COUNT(*) as count,
  AVG(EXTRACT(EPOCH FROM (NOW() - created_at))) as avg_age_seconds
FROM documents
GROUP BY status;

Analyzing Performance Issues #

-- Find slow queries (requires pg_stat_statements extension)
SELECT
  query,
  calls,
  total_exec_time / 1000 as total_time_sec,
  mean_exec_time / 1000 as mean_time_sec,
  max_exec_time / 1000 as max_time_sec
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

-- Check table sizes
SELECT
  schemaname,
  tablename,
  pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

-- Check index usage
SELECT
  schemaname,
  tablename,
  indexname,
  idx_scan,
  idx_tup_read,
  idx_tup_fetch
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;

Checking Database Health #

-- Active connections
SELECT
  datname,
  count(*) as connections,
  state
FROM pg_stat_activity
GROUP BY datname, state
ORDER BY connections DESC;

-- Long-running queries
SELECT
  pid,
  now() - pg_stat_activity.query_start AS duration,
  state,
  query
FROM pg_stat_activity
WHERE state != 'idle'
  AND query_start < NOW() - INTERVAL '5 minutes'
ORDER BY duration DESC;

-- Lock information
SELECT
  l.locktype,
  l.mode,
  l.granted,
  a.query,
  a.pid
FROM pg_locks l
JOIN pg_stat_activity a ON l.pid = a.pid
WHERE NOT l.granted;

Database Backup and Restore #

Checking Backup Status #

Backups are automated through AWS RDS:

# List recent RDS snapshots
aws rds describe-db-snapshots \
  --db-instance-identifier bonsai-prod-db \
  --query 'DBSnapshots[*].[DBSnapshotIdentifier,SnapshotCreateTime,Status]' \
  --output table

Creating Manual Backup #

# Create manual snapshot
aws rds create-db-snapshot \
  --db-instance-identifier bonsai-prod-db \
  --db-snapshot-identifier bonsai-manual-$(date +%Y%m%d-%H%M%S)

Exporting Specific Tables #

For data exports or point-in-time backups:

# Export single table
pg_dump -h <host> -U <user> -d bonsai -t users --data-only > users_backup.sql

# Export schema only
pg_dump -h <host> -U <user> -d bonsai --schema-only > schema_backup.sql

# Export with specific format (custom format for pg_restore)
pg_dump -h <host> -U <user> -d bonsai -Fc > bonsai_backup.dump

Point-in-Time Recovery (PITR) #

RDS supports Point-in-Time Recovery:

# Restore to specific timestamp
aws rds restore-db-instance-to-point-in-time \
  --source-db-instance-identifier bonsai-prod-db \
  --target-db-instance-identifier bonsai-restored-$(date +%Y%m%d) \
  --restore-time "2025-10-21T10:30:00Z"

IMPORTANT: PITR creates a new database instance. Never restore over production!

Emergency Data Fixes #

Before Making Changes #

  1. Get approval - From team lead or senior engineer
  2. Take backup - Manual snapshot or table export
  3. Test in dev first - If possible, reproduce the fix
  4. Document everything - Commands, timestamps, results
  5. Have rollback plan - Know how to undo the change

Example: Fixing Stuck Document #

-- 1. Document current state
SELECT id, status, updated_at FROM documents WHERE id = 'doc-123';

-- 2. Take a backup (in a separate query/transaction)
CREATE TABLE documents_backup_20251021 AS
SELECT * FROM documents WHERE id = 'doc-123';

-- 3. Make the change in a transaction
BEGIN;
UPDATE documents
SET status = 'failed',
    error_message = 'Manually failed after investigation',
    updated_at = NOW()
WHERE id = 'doc-123';

-- 4. Verify
SELECT id, status, updated_at FROM documents WHERE id = 'doc-123';

-- 5. Commit or rollback
COMMIT;  -- or ROLLBACK if something looks wrong

Example: Resetting User Password #

-- This is just an example - actual password reset should go through API
BEGIN;

-- Document the change
INSERT INTO audit_log (action, user_id, timestamp, details)
VALUES ('password_reset', 'user-uuid', NOW(), 'Emergency reset via DBA');

-- Update password (hashed appropriately)
UPDATE users
SET password_hash = <new_hashed_password>,
    password_updated_at = NOW()
WHERE id = 'user-uuid';

COMMIT;

Troubleshooting Database Issues #

Cannot Connect to Database #

Symptoms: Connection timeout or refused

Checks:

  1. Verify Tailscale is connected (for production)
  2. Check security groups allow your IP
  3. Verify credentials in Doppler
  4. Check RDS instance status:
    aws rds describe-db-instances \
      --db-instance-identifier bonsai-prod-db \
      --query 'DBInstances[0].DBInstanceStatus'
    

High Connection Count #

Symptoms: “Too many connections” error

Investigation:

-- Check connection count by state
SELECT state, count(*)
FROM pg_stat_activity
GROUP BY state;

-- Find idle connections
SELECT pid, usename, application_name, state, query
FROM pg_stat_activity
WHERE state = 'idle'
  AND query_start < NOW() - INTERVAL '10 minutes';

Solutions:

  1. Identify and terminate idle connections:

    SELECT pg_terminate_backend(pid)
    FROM pg_stat_activity
    WHERE state = 'idle'
      AND query_start < NOW() - INTERVAL '30 minutes';
    
  2. Scale up connection pool limits in BonsAPI

  3. Check for connection leaks in application code

Slow Query Performance #

Symptoms: High query execution time, timeouts

Investigation:

-- Enable query logging (requires superuser)
ALTER DATABASE bonsai SET log_min_duration_statement = 1000; -- 1 second

-- Check for missing indexes
SELECT
  schemaname || '.' || tablename as table,
  seq_scan,
  seq_tup_read,
  idx_scan,
  seq_tup_read / seq_scan as avg_seq_tup
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_tup_read DESC
LIMIT 10;

Solutions:

  1. Add indexes for frequently queried columns
  2. Analyze query execution plans: EXPLAIN ANALYZE <query>
  3. Update table statistics: ANALYZE table_name;
  4. Consider query optimization or caching

Disk Space Issues #

Symptoms: Database writes failing, “disk full” errors

Investigation:

# Check RDS storage
aws rds describe-db-instances \
  --db-instance-identifier bonsai-prod-db \
  --query 'DBInstances[0].[AllocatedStorage,StorageType]'
-- Check database size
SELECT
  pg_database.datname,
  pg_size_pretty(pg_database_size(pg_database.datname)) as size
FROM pg_database
ORDER BY pg_database_size(pg_database.datname) DESC;

Solutions:

  1. Increase RDS storage (auto-scaling should handle this)
  2. Archive old data
  3. VACUUM old tables to reclaim space:
    VACUUM FULL table_name;
    

Database Migrations #

For schema changes and migrations, see:

Checking Migration Status #

# Via Atlas CLI
mise run db:status

# Or check migrations table
psql -c "SELECT version, applied_at FROM atlas_schema_revisions ORDER BY applied_at DESC LIMIT 10;"

Best Practices #

Query Safety #

  • Always use LIMIT for SELECT queries on large tables
  • Test queries in development first
  • Use transactions for any data modifications
  • Set statement timeouts to prevent runaway queries
  • Never run DELETE or UPDATE without WHERE clause

Connection Management #

  • Close connections when done
  • Use connection pooling (built into BonsAPI)
  • Don’t leave long-running transactions open
  • Monitor connection count regularly

Data Privacy #

  • Never export PII without authorization
  • Mask sensitive data in logs and outputs
  • Follow GDPR and data protection policies
  • Audit all production database access

Performance #

  • Use appropriate indexes
  • Avoid SELECT * in production code
  • Regular VACUUM and ANALYZE
  • Monitor and optimize slow queries

See Also #