Appearance
Jailer Integration Plan
Executive Summary
Goal: Provide developers with fast, manageable database subsets while maintaining staging environment relevance and testing production backups.
Flow:
text
Production DB (30GB)
↓ Daily backup
Staging Server (Ubuntu, DigitalOcean MySQL)
↓ Import full dump (tests backup validity)
↓ Extract Jailer subsets (1-5GB)
↓ Upload to storage
Dev Machines (macOS, Docker)
↓ Download subset
↓ Import locallyBenefits:
- ✅ Tests production backups daily (staging import)
- ✅ Keeps staging relevant (fresh prod data)
- ✅ Devs get manageable subsets (30GB → 2GB)
- ✅ Fast dev imports (60min → 5min)
- ✅ Automated daily refresh
Architecture Overview
Components
text
┌─────────────────────────────────────────────────────────────────┐
│ PRODUCTION │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ MySQL 8.0 (DigitalOcean Managed) │ │
│ │ - 30GB database │ │
│ │ - Daily mysqldump backup │ │
│ └──────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
↓
Daily at 2:00 AM UTC
↓
┌─────────────────────────────────────────────────────────────────┐
│ STAGING │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Ubuntu 22.04 Server │ │
│ │ ┌────────────────────────────────────────────────────┐ │ │
│ │ │ 1. Import full dump (30GB) │ │ │
│ │ │ - Validates backup integrity │ │ │
│ │ │ - Keeps staging fresh │ │ │
│ │ │ - Takes 25-40 mins │ │ │
│ │ └────────────────────────────────────────────────────┘ │ │
│ │ ┌────────────────────────────────────────────────────┐ │ │
│ │ │ 2. Jailer extraction (multiple profiles) │ │ │
│ │ │ - Small (1% sample, 300MB) │ │ │
│ │ │ - Medium (10% sample, 2GB) │ │ │
│ │ │ - Recent (6 months, 3GB) │ │ │
│ │ │ - Takes 15-30 mins total │ │ │
│ │ └────────────────────────────────────────────────────┘ │ │
│ │ ┌────────────────────────────────────────────────────┐ │ │
│ │ │ 3. Upload to DigitalOcean Spaces (S3) │ │ │
│ │ │ - jailer-dumps/small-YYYYMMDD.sql.gz │ │ │
│ │ │ - jailer-dumps/medium-YYYYMMDD.sql.gz │ │ │
│ │ │ - jailer-dumps/recent-YYYYMMDD.sql.gz │ │ │
│ │ │ - jailer-dumps/latest-small.sql.gz (symlink) │ │ │
│ │ └────────────────────────────────────────────────────┘ │ │
│ └──────────────────────────────────────────────────────────┘ │
│ MySQL: DigitalOcean Managed Database │
└─────────────────────────────────────────────────────────────────┘
↓
Download on demand
↓
┌─────────────────────────────────────────────────────────────────┐
│ DEV MACHINES (macOS) │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Docker Compose (MySQL 8.0) │ │
│ │ - Download latest subset from Spaces │ │
│ │ - Import locally (2-5 mins) │ │
│ │ - Work with realistic subset │ │
│ └──────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘Phase 1: Staging Server Setup (Ubuntu)
1.1 Prerequisites
Server Specs (Recommended):
- CPU: 4+ cores
- RAM: 16GB+
- Disk: 200GB SSD
- OS: Ubuntu 22.04 LTS
Access Required:
- SSH access to staging server
- MySQL connection to DigitalOcean managed DB (staging)
- DigitalOcean Spaces API keys (S3-compatible storage)
- Read access to production DB backups
1.2 Install Dependencies
bash
# Update system
sudo apt update && sudo apt upgrade -y
# Install MySQL client (for accessing managed DB)
sudo apt install -y mysql-client
# Install Java (required for Jailer)
sudo apt install -y openjdk-17-jdk
# Install compression tools
sudo apt install -y pigz pv
# Install s3cmd for DigitalOcean Spaces
sudo apt install -y s3cmd
# Verify installations
java -version
mysql --version
pigz --version
s3cmd --version1.3 Install Jailer
bash
# Create directory
sudo mkdir -p /opt/jailer
cd /opt/jailer
# Download latest Jailer (check https://github.com/Wisser/Jailer/releases)
JAILER_VERSION="16.3"
sudo wget https://github.com/Wisser/Jailer/releases/download/v${JAILER_VERSION}/jailer_${JAILER_VERSION}.zip
# Extract
sudo unzip jailer_${JAILER_VERSION}.zip
sudo chmod +x jailer.sh
# Verify
./jailer.sh version
# Create symlink for easier access
sudo ln -s /opt/jailer/jailer.sh /usr/local/bin/jailer1.4 Configure MySQL Connection
Create /opt/jailer-dumps/.my.cnf:
ini
[client]
host = staging-db.digitalocean.com
port = 25060
user = doadmin
password = YOUR_STAGING_DB_PASSWORD
ssl-ca = /opt/jailer-dumps/ca-certificate.crtDownload DigitalOcean MySQL CA certificate:
bash
sudo mkdir -p /opt/jailer-dumps
cd /opt/jailer-dumps
sudo wget https://raw.githubusercontent.com/DigitalOcean/engineering-blog/master/ca-certificate.crt
sudo chmod 600 .my.cnf1.5 Configure DigitalOcean Spaces (S3)
bash
# Configure s3cmd
s3cmd --configure
# Enter when prompted:
# - Access Key: YOUR_SPACES_ACCESS_KEY
# - Secret Key: YOUR_SPACES_SECRET_KEY
# - Region: nyc3 (or your region)
# - Endpoint: nyc3.digitaloceanspaces.com
# - DNS-style bucket: %(bucket)s.nyc3.digitaloceanspaces.com
# Test connection
s3cmd ls s3://ixdf-jailer-dumps/
# Create bucket if doesn't exist
s3cmd mb s3://ixdf-jailer-dumpsPhase 2: Staging Server Scripts
2.1 Directory Structure
bash
/opt/jailer-dumps/
├── .my.cnf # MySQL credentials
├── ca-certificate.crt # DigitalOcean SSL cert
├── config/
│ ├── exclusions.txt # Tables to exclude
│ └── profiles/
│ ├── small.conf # 1% sample config
│ ├── medium.conf # 10% sample config
│ └── recent.conf # 6 months config
├── model/ # Jailer data model (generated)
├── dumps/ # Temporary dump storage
│ ├── production-full.sql.gz
│ └── subsets/
│ ├── small-YYYYMMDD.sql.gz
│ ├── medium-YYYYMMDD.sql.gz
│ └── recent-YYYYMMDD.sql.gz
├── scripts/
│ ├── 1-import-production-dump.sh
│ ├── 2-generate-jailer-model.sh
│ ├── 3-extract-jailer-subsets.sh
│ ├── 4-upload-to-spaces.sh
│ └── daily-refresh.sh # Main orchestrator
└── logs/
└── daily-refresh-YYYYMMDD.log2.2 Script: Import Production Dump
File: /opt/jailer-dumps/scripts/1-import-production-dump.sh
bash
#!/bin/bash
set -e
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
BASE_DIR="$(dirname "$SCRIPT_DIR")"
DUMP_DIR="$BASE_DIR/dumps"
LOG_FILE="$BASE_DIR/logs/import-$(date +%Y%m%d-%H%M%S).log"
exec > >(tee -a "$LOG_FILE") 2>&1
echo "================================"
echo "STEP 1: Import Production Dump"
echo "Started: $(date)"
echo "================================"
# Download production dump from forge server
echo "Downloading production dump..."
scp forge@information-architecture.org:/home/forge/backups/lastbackup--dev.sql.gz \
"$DUMP_DIR/production-full.sql.gz"
DUMP_SIZE=$(du -h "$DUMP_DIR/production-full.sql.gz" | cut -f1)
echo "Downloaded: $DUMP_SIZE"
# Decompress with pigz
echo "Decompressing with pigz..."
pigz -d -k -f "$DUMP_DIR/production-full.sql.gz"
echo "Decompressed: $(du -h "$DUMP_DIR/production-full.sql" | cut -f1)"
# Import to staging database
echo "Importing to staging database..."
DB_NAME="ixdf_staging"
# Drop and recreate database
mysql --defaults-extra-file="$BASE_DIR/.my.cnf" -e "DROP DATABASE IF EXISTS \`$DB_NAME\`"
mysql --defaults-extra-file="$BASE_DIR/.my.cnf" -e "CREATE DATABASE \`$DB_NAME\` DEFAULT CHARACTER SET utf8mb4 DEFAULT COLLATE utf8mb4_unicode_ci"
# Import with optimizations
mysql --defaults-extra-file="$BASE_DIR/.my.cnf" "$DB_NAME" -e "
SET SESSION foreign_key_checks = 0;
SET SESSION unique_checks = 0;
SET SESSION autocommit = 0;
SET SESSION sql_log_bin = 0;
"
pv "$DUMP_DIR/production-full.sql" | \
mysql --defaults-extra-file="$BASE_DIR/.my.cnf" "$DB_NAME" \
--max-allowed-packet=1G \
--net-buffer-length=1M
# Re-enable checks
mysql --defaults-extra-file="$BASE_DIR/.my.cnf" "$DB_NAME" -e "
SET SESSION autocommit = 1;
SET SESSION unique_checks = 1;
SET SESSION foreign_key_checks = 1;
"
echo "✅ Import complete!"
echo "Finished: $(date)"
echo ""2.3 Script: Generate Jailer Model
File: /opt/jailer-dumps/scripts/2-generate-jailer-model.sh
bash
#!/bin/bash
set -e
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
BASE_DIR="$(dirname "$SCRIPT_DIR")"
MODEL_DIR="$BASE_DIR/model"
LOG_FILE="$BASE_DIR/logs/model-$(date +%Y%m%d-%H%M%S).log"
exec > >(tee -a "$LOG_FILE") 2>&1
echo "================================"
echo "STEP 2: Generate Jailer Model"
echo "Started: $(date)"
echo "================================"
# Clean old model
rm -rf "$MODEL_DIR"
mkdir -p "$MODEL_DIR"
# Extract DB credentials from .my.cnf
DB_HOST=$(grep 'host' "$BASE_DIR/.my.cnf" | cut -d'=' -f2 | tr -d ' ')
DB_PORT=$(grep 'port' "$BASE_DIR/.my.cnf" | cut -d'=' -f2 | tr -d ' ')
DB_USER=$(grep 'user' "$BASE_DIR/.my.cnf" | cut -d'=' -f2 | tr -d ' ')
DB_PASS=$(grep 'password' "$BASE_DIR/.my.cnf" | cut -d'=' -f2 | tr -d ' ')
DB_NAME="ixdf_staging"
CA_CERT="$BASE_DIR/ca-certificate.crt"
echo "Analyzing database schema..."
jailer import-schema \
-jdbcjar /opt/jailer/lib/mysql-connector-java.jar \
-datamodel "$MODEL_DIR" \
-url "jdbc:mysql://${DB_HOST}:${DB_PORT}/${DB_NAME}?useSSL=true&requireSSL=true&verifyServerCertificate=true&trustCertificateKeyStoreUrl=file:${CA_CERT}" \
-user "$DB_USER" \
-password "$DB_PASS"
echo "✅ Model generated!"
echo "Tables analyzed: $(ls "$MODEL_DIR" | wc -l)"
echo "Finished: $(date)"
echo ""2.4 Script: Extract Jailer Subsets
File: /opt/jailer-dumps/scripts/3-extract-jailer-subsets.sh
bash
#!/bin/bash
set -e
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
BASE_DIR="$(dirname "$SCRIPT_DIR")"
MODEL_DIR="$BASE_DIR/model"
DUMP_DIR="$BASE_DIR/dumps/subsets"
CONFIG_DIR="$BASE_DIR/config"
LOG_FILE="$BASE_DIR/logs/extract-$(date +%Y%m%d-%H%M%S).log"
DATE_STAMP=$(date +%Y%m%d)
exec > >(tee -a "$LOG_FILE") 2>&1
echo "================================"
echo "STEP 3: Extract Jailer Subsets"
echo "Started: $(date)"
echo "================================"
mkdir -p "$DUMP_DIR"
# DB connection details
DB_HOST=$(grep 'host' "$BASE_DIR/.my.cnf" | cut -d'=' -f2 | tr -d ' ')
DB_PORT=$(grep 'port' "$BASE_DIR/.my.cnf" | cut -d'=' -f2 | tr -d ' ')
DB_USER=$(grep 'user' "$BASE_DIR/.my.cnf" | cut -d'=' -f2 | tr -d ' ')
DB_PASS=$(grep 'password' "$BASE_DIR/.my.cnf" | cut -d'=' -f2 | tr -d ' ')
DB_NAME="ixdf_staging"
CA_CERT="$BASE_DIR/ca-certificate.crt"
JDBC_URL="jdbc:mysql://${DB_HOST}:${DB_PORT}/${DB_NAME}?useSSL=true&requireSSL=true&verifyServerCertificate=true&trustCertificateKeyStoreUrl=file:${CA_CERT}"
# Build exclusions
EXCLUSIONS=""
while IFS= read -r line; do
[[ "$line" =~ ^#.*$ || -z "$line" ]] && continue
EXCLUSIONS="$EXCLUSIONS -exclude $line"
done < "$CONFIG_DIR/exclusions.txt"
# Profile 1: Small (1% sample)
echo "Extracting: Small (1% sample)..."
jailer export \
-datamodel "$MODEL_DIR" \
-jdbcjar /opt/jailer/lib/mysql-connector-java.jar \
-url "$JDBC_URL" \
-user "$DB_USER" \
-password "$DB_PASS" \
-format SQL \
-e "member" \
-where "MOD(id, 100) = 0" \
-threads 4 \
$EXCLUSIONS \
-d "$DUMP_DIR/small-${DATE_STAMP}.sql"
pigz -f "$DUMP_DIR/small-${DATE_STAMP}.sql"
echo "✅ Small: $(du -h "$DUMP_DIR/small-${DATE_STAMP}.sql.gz" | cut -f1)"
# Profile 2: Medium (10% sample)
echo "Extracting: Medium (10% sample)..."
jailer export \
-datamodel "$MODEL_DIR" \
-jdbcjar /opt/jailer/lib/mysql-connector-java.jar \
-url "$JDBC_URL" \
-user "$DB_USER" \
-password "$DB_PASS" \
-format SQL \
-e "member" \
-where "MOD(id, 10) = 0" \
-threads 4 \
$EXCLUSIONS \
-d "$DUMP_DIR/medium-${DATE_STAMP}.sql"
pigz -f "$DUMP_DIR/medium-${DATE_STAMP}.sql"
echo "✅ Medium: $(du -h "$DUMP_DIR/medium-${DATE_STAMP}.sql.gz" | cut -f1)"
# Profile 3: Recent (last 6 months)
echo "Extracting: Recent (6 months)..."
jailer export \
-datamodel "$MODEL_DIR" \
-jdbcjar /opt/jailer/lib/mysql-connector-java.jar \
-url "$JDBC_URL" \
-user "$DB_USER" \
-password "$DB_PASS" \
-format SQL \
-e "member" \
-where "created_at > DATE_SUB(NOW(), INTERVAL 6 MONTH)" \
-threads 4 \
$EXCLUSIONS \
-d "$DUMP_DIR/recent-${DATE_STAMP}.sql"
pigz -f "$DUMP_DIR/recent-${DATE_STAMP}.sql"
echo "✅ Recent: $(du -h "$DUMP_DIR/recent-${DATE_STAMP}.sql.gz" | cut -f1)"
echo "Finished: $(date)"
echo ""2.5 Script: Upload to Spaces
File: /opt/jailer-dumps/scripts/4-upload-to-spaces.sh
bash
#!/bin/bash
set -e
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
BASE_DIR="$(dirname "$SCRIPT_DIR")"
DUMP_DIR="$BASE_DIR/dumps/subsets"
LOG_FILE="$BASE_DIR/logs/upload-$(date +%Y%m%d-%H%M%S).log"
DATE_STAMP=$(date +%Y%m%d)
S3_BUCKET="s3://ixdf-jailer-dumps"
exec > >(tee -a "$LOG_FILE") 2>&1
echo "================================"
echo "STEP 4: Upload to Spaces"
echo "Started: $(date)"
echo "================================"
# Upload dated files
echo "Uploading small-${DATE_STAMP}.sql.gz..."
s3cmd put "$DUMP_DIR/small-${DATE_STAMP}.sql.gz" \
"$S3_BUCKET/small-${DATE_STAMP}.sql.gz" \
--acl-private
echo "Uploading medium-${DATE_STAMP}.sql.gz..."
s3cmd put "$DUMP_DIR/medium-${DATE_STAMP}.sql.gz" \
"$S3_BUCKET/medium-${DATE_STAMP}.sql.gz" \
--acl-private
echo "Uploading recent-${DATE_STAMP}.sql.gz..."
s3cmd put "$DUMP_DIR/recent-${DATE_STAMP}.sql.gz" \
"$S3_BUCKET/recent-${DATE_STAMP}.sql.gz" \
--acl-private
# Update "latest" symlinks by copying
echo "Updating latest-small.sql.gz..."
s3cmd cp "$S3_BUCKET/small-${DATE_STAMP}.sql.gz" \
"$S3_BUCKET/latest-small.sql.gz"
echo "Updating latest-medium.sql.gz..."
s3cmd cp "$S3_BUCKET/medium-${DATE_STAMP}.sql.gz" \
"$S3_BUCKET/latest-medium.sql.gz"
echo "Updating latest-recent.sql.gz..."
s3cmd cp "$S3_BUCKET/recent-${DATE_STAMP}.sql.gz" \
"$S3_BUCKET/latest-recent.sql.gz"
# Create metadata file
cat > "/tmp/metadata-${DATE_STAMP}.json" <<EOF
{
"date": "$(date -Iseconds)",
"files": {
"small": {
"filename": "small-${DATE_STAMP}.sql.gz",
"size": "$(du -b "$DUMP_DIR/small-${DATE_STAMP}.sql.gz" | cut -f1)",
"size_human": "$(du -h "$DUMP_DIR/small-${DATE_STAMP}.sql.gz" | cut -f1)"
},
"medium": {
"filename": "medium-${DATE_STAMP}.sql.gz",
"size": "$(du -b "$DUMP_DIR/medium-${DATE_STAMP}.sql.gz" | cut -f1)",
"size_human": "$(du -h "$DUMP_DIR/medium-${DATE_STAMP}.sql.gz" | cut -f1)"
},
"recent": {
"filename": "recent-${DATE_STAMP}.sql.gz",
"size": "$(du -b "$DUMP_DIR/recent-${DATE_STAMP}.sql.gz" | cut -f1)",
"size_human": "$(du -h "$DUMP_DIR/recent-${DATE_STAMP}.sql.gz" | cut -f1)"
}
}
}
EOF
s3cmd put "/tmp/metadata-${DATE_STAMP}.json" \
"$S3_BUCKET/latest-metadata.json" \
--acl-private
rm "/tmp/metadata-${DATE_STAMP}.json"
# Cleanup old dumps (keep last 7 days on server)
echo "Cleaning up old local dumps..."
find "$DUMP_DIR" -name "*.sql.gz" -mtime +7 -delete
# Cleanup old dumps on Spaces (keep last 30 days)
echo "Cleaning up old remote dumps..."
s3cmd ls "$S3_BUCKET/" | \
awk '{print $4}' | \
grep -E '\d{8}\.sql\.gz$' | \
while read -r file; do
file_date=$(echo "$file" | grep -oE '\d{8}')
days_old=$(( ($(date +%s) - $(date -d "$file_date" +%s)) / 86400 ))
if [ "$days_old" -gt 30 ]; then
echo "Deleting old file: $file (${days_old} days old)"
s3cmd rm "$file"
fi
done
echo "✅ Upload complete!"
echo "Finished: $(date)"
echo ""
# List current files
echo "Current files in Spaces:"
s3cmd ls "$S3_BUCKET/" | tail -102.6 Script: Daily Orchestrator
File: /opt/jailer-dumps/scripts/daily-refresh.sh
bash
#!/bin/bash
set -e
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
BASE_DIR="$(dirname "$SCRIPT_DIR")"
LOG_FILE="$BASE_DIR/logs/daily-refresh-$(date +%Y%m%d).log"
exec > >(tee -a "$LOG_FILE") 2>&1
echo "========================================"
echo "DAILY JAILER REFRESH"
echo "Started: $(date)"
echo "========================================"
echo ""
# Run all steps in sequence
bash "$SCRIPT_DIR/1-import-production-dump.sh"
bash "$SCRIPT_DIR/2-generate-jailer-model.sh"
bash "$SCRIPT_DIR/3-extract-jailer-subsets.sh"
bash "$SCRIPT_DIR/4-upload-to-spaces.sh"
echo "========================================"
echo "✅ DAILY REFRESH COMPLETE!"
echo "Finished: $(date)"
echo "Duration: $SECONDS seconds"
echo "========================================"
# Send notification (optional - integrate with Slack, email, etc.)
# curl -X POST https://hooks.slack.com/... \
# -d '{"text":"Jailer daily refresh completed successfully!"}'2.7 Configuration Files
File: /opt/jailer-dumps/config/exclusions.txt
text
# Tables to exclude from Jailer subsets
# Based on scripts/database/backup--dev.sh
# Logging
logging__admin_logs
logging__logs
# Leads
leads
subscription__contact_list_subscribers
lead__download_requests
guest_downloaders
# Notifications
notification__message_log_email
notification__message_log_postcard
notification__message_log_sms
notification__message_queue_email
notification__message_queue_postcard
notification__message_queue_sms
notification__notification_log
notification__notification_queue
# Session data
session_references
jobs--failed
# Statistics
statistic__metric_values
# Membership renewals
membership__renewal_results
# OAuth tokens
oauth_access_tokens2.8 Setup Cron Job
bash
# Edit crontab
sudo crontab -e
# Add daily refresh at 3:00 AM UTC (after production backup completes)
0 3 * * * /opt/jailer-dumps/scripts/daily-refresh.sh
# Or for testing, run every hour
#0 * * * * /opt/jailer-dumps/scripts/daily-refresh.sh2.9 Make Scripts Executable
bash
chmod +x /opt/jailer-dumps/scripts/*.shPhase 3: Dev Machine Setup (macOS)
3.1 Install Dependencies
bash
# Install Homebrew (if not already installed)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# Install tools
brew install pigz pv s3cmd
# Verify
pigz --version
pv --version
s3cmd --version3.2 Configure DigitalOcean Spaces Access
bash
# Configure s3cmd
s3cmd --configure
# Enter:
# - Access Key: YOUR_SPACES_ACCESS_KEY (read-only recommended)
# - Secret Key: YOUR_SPACES_SECRET_KEY
# - Region: nyc3
# - Endpoint: nyc3.digitaloceanspaces.com
# Test
s3cmd ls s3://ixdf-jailer-dumps/3.3 Update Makefile
Add Jailer download and import commands:
File: Makefile
makefile
# Jailer subsets from DigitalOcean Spaces
S3_BUCKET := s3://ixdf-jailer-dumps
JAILER_LOCAL_DIR := tools/jailer/subsets
.PHONY: db\:update\:jailer\:small db\:update\:jailer\:medium db\:update\:jailer\:recent db\:jailer\:list
db\:jailer\:list:
@echo "📦 Available Jailer subsets in Spaces:"
@s3cmd ls $(S3_BUCKET)/ | grep 'latest-' || echo "No subsets found"
@echo ""
@echo "Download with:"
@echo " make db:update:jailer:small - 1% sample (~300MB)"
@echo " make db:update:jailer:medium - 10% sample (~2GB)"
@echo " make db:update:jailer:recent - Last 6 months (~3GB)"
db\:update\:jailer\:small:
@echo "📥 Downloading small Jailer subset (1%, ~300MB)..."
@mkdir -p $(JAILER_LOCAL_DIR)
s3cmd get $(S3_BUCKET)/latest-small.sql.gz $(JAILER_LOCAL_DIR)/latest-small.sql.gz --force
@echo "✅ Downloaded: $$(du -h $(JAILER_LOCAL_DIR)/latest-small.sql.gz | cut -f1)"
@echo "Decompressing..."
pigz -d -k -f $(JAILER_LOCAL_DIR)/latest-small.sql.gz
@echo "Importing to local database..."
./scripts/database/import.sh \
-d $${DB_DATABASE:-ixdf} \
-u root \
-p $${DB_PASSWORD:-password} \
-h 127.0.0.1 \
-P $${FORWARD_DB_PORT:-3306} \
-s $(JAILER_LOCAL_DIR) \
-f latest-small.sql
@echo "✅ Small subset imported!"
db\:update\:jailer\:medium:
@echo "📥 Downloading medium Jailer subset (10%, ~2GB)..."
@mkdir -p $(JAILER_LOCAL_DIR)
s3cmd get $(S3_BUCKET)/latest-medium.sql.gz $(JAILER_LOCAL_DIR)/latest-medium.sql.gz --force
@echo "✅ Downloaded: $$(du -h $(JAILER_LOCAL_DIR)/latest-medium.sql.gz | cut -f1)"
@echo "Decompressing..."
pigz -d -k -f $(JAILER_LOCAL_DIR)/latest-medium.sql.gz
@echo "Importing to local database..."
./scripts/database/import.sh \
-d $${DB_DATABASE:-ixdf} \
-u root \
-p $${DB_PASSWORD:-password} \
-h 127.0.0.1 \
-P $${FORWARD_DB_PORT:-3306} \
-s $(JAILER_LOCAL_DIR) \
-f latest-medium.sql
@echo "✅ Medium subset imported!"
db\:update\:jailer\:recent:
@echo "📥 Downloading recent Jailer subset (6 months, ~3GB)..."
@mkdir -p $(JAILER_LOCAL_DIR)
s3cmd get $(S3_BUCKET)/latest-recent.sql.gz $(JAILER_LOCAL_DIR)/latest-recent.sql.gz --force
@echo "✅ Downloaded: $$(du -h $(JAILER_LOCAL_DIR)/latest-recent.sql.gz | cut -f1)"
@echo "Decompressing..."
pigz -d -k -f $(JAILER_LOCAL_DIR)/latest-recent.sql.gz
@echo "Importing to local database..."
./scripts/database/import.sh \
-d $${DB_DATABASE:-ixdf} \
-u root \
-p $${DB_PASSWORD:-password} \
-h 127.0.0.1 \
-P $${FORWARD_DB_PORT:-3306} \
-s $(JAILER_LOCAL_DIR) \
-f latest-recent.sql
@echo "✅ Recent subset imported!"3.4 Update .gitignore
bash
echo "tools/jailer/subsets/" >> .gitignore3.5 Developer Workflow
bash
# List available subsets
make db:jailer:list
# Download and import small subset (quickest, for quick tests)
make db:update:jailer:small
# Download and import medium subset (balanced, daily work)
make db:update:jailer:medium
# Download and import recent subset (most complete, specific features)
make db:update:jailer:recentPhase 4: Monitoring & Maintenance
4.1 Staging Server Monitoring
Create monitoring script: /opt/jailer-dumps/scripts/check-health.sh
bash
#!/bin/bash
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
BASE_DIR="$(dirname "$SCRIPT_DIR")"
S3_BUCKET="s3://ixdf-jailer-dumps"
echo "==================================="
echo "Jailer Health Check"
echo "==================================="
echo ""
# Check last successful run
LAST_LOG=$(ls -t "$BASE_DIR"/logs/daily-refresh-*.log 2>/dev/null | head -1)
if [ -n "$LAST_LOG" ]; then
echo "Last run log: $LAST_LOG"
if grep -q "✅ DAILY REFRESH COMPLETE" "$LAST_LOG"; then
echo "Status: ✅ SUCCESS"
echo "Completed: $(grep 'Finished:' "$LAST_LOG" | tail -1)"
else
echo "Status: ❌ FAILED"
echo "Check log for errors: $LAST_LOG"
fi
else
echo "Status: ⚠️ No logs found"
fi
echo ""
# Check Spaces uploads
echo "Latest files in Spaces:"
s3cmd ls "$S3_BUCKET/" | grep 'latest-' || echo "No files found"
echo ""
# Check disk space
echo "Disk usage:"
df -h /opt/jailer-dumps
echo ""
# Check staging DB connection
echo "Staging DB connection:"
if mysql --defaults-extra-file="$BASE_DIR/.my.cnf" -e "SELECT 1" &>/dev/null; then
echo "✅ Connected"
else
echo "❌ Failed to connect"
fi4.2 Alerting
Slack notifications (optional):
Add to end of daily-refresh.sh:
bash
# At end of script
if [ $? -eq 0 ]; then
STATUS="✅ SUCCESS"
COLOR="good"
else
STATUS="❌ FAILED"
COLOR="danger"
fi
curl -X POST https://hooks.slack.com/services/YOUR/WEBHOOK/URL \
-H 'Content-Type: application/json' \
-d "{
\"attachments\": [{
\"color\": \"$COLOR\",
\"title\": \"Jailer Daily Refresh\",
\"text\": \"Status: $STATUS\",
\"fields\": [
{\"title\": \"Date\", \"value\": \"$(date)\", \"short\": true},
{\"title\": \"Duration\", \"value\": \"${SECONDS}s\", \"short\": true}
]
}]
}"4.3 Log Rotation
Create: /etc/logrotate.d/jailer-dumps
text
/opt/jailer-dumps/logs/*.log {
daily
rotate 30
compress
delaycompress
notifempty
create 0644 root root
sharedscripts
}4.4 Backup Validation
The daily staging import serves as backup validation:
- If import fails → backup is corrupted
- If import succeeds → backup is valid
- Automatically tests restore procedure
Add validation to import script:
bash
# After import, run basic sanity checks
echo "Running sanity checks..."
# Check table count
TABLE_COUNT=$(mysql --defaults-extra-file="$BASE_DIR/.my.cnf" "$DB_NAME" -N -e "
SELECT COUNT(*) FROM information_schema.TABLES WHERE TABLE_SCHEMA='$DB_NAME'
")
if [ "$TABLE_COUNT" -lt 100 ]; then
echo "❌ ERROR: Only $TABLE_COUNT tables imported (expected 100+)"
exit 1
fi
# Check member count
MEMBER_COUNT=$(mysql --defaults-extra-file="$BASE_DIR/.my.cnf" "$DB_NAME" -N -e "
SELECT COUNT(*) FROM member
")
if [ "$MEMBER_COUNT" -lt 1000 ]; then
echo "❌ ERROR: Only $MEMBER_COUNT members (expected 1000+)"
exit 1
fi
echo "✅ Sanity checks passed"
echo " - Tables: $TABLE_COUNT"
echo " - Members: $MEMBER_COUNT"Phase 5: Team Onboarding
5.1 Developer Documentation
Update: README.md
Add the following section:
Database Setup
We use Jailer subsets for fast local development (2GB instead of 30GB).
Quick Start
bash
# List available subsets
make db:jailer:list
# Download and import (recommended for daily work)
make db:update:jailer:medium
# First time only: Install dependencies
brew install pigz pv s3cmd
s3cmd --configure # Get credentials from 1PasswordSubset Profiles
| Profile | Size | Use Case | Command |
|---|---|---|---|
| Small | ~300MB | Quick tests, bug fixes | make db:update:jailer:small |
| Medium | ~2GB | Daily development | make db:update:jailer:medium |
| Recent | ~3GB | Recent data (6 months) | make db:update:jailer:recent |
| Full | ~30GB | Rare, special cases | make db:update |
Import Times
- Small: ~2 minutes
- Medium: ~5 minutes
- Recent: ~8 minutes
- Full: ~25 minutes
Freshness
Jailer subsets are regenerated daily at 3 AM UTC from production data.
5.2 DigitalOcean Spaces Access
Create read-only access keys for developers:
- Go to DigitalOcean → API → Spaces Keys
- Create new key: "Dev Team - Jailer Subsets (Read Only)"
- Store in 1Password vault: "Engineering / Jailer Subsets Access"
- Share with team
Key configuration in s3cmd:
text
Access Key: <READ_ONLY_KEY>
Secret Key: <READ_ONLY_SECRET>
Region: nyc3
Endpoint: nyc3.digitaloceanspaces.com5.3 Troubleshooting Guide
Update: docs/troubleshooting-jailer.md
Create new file with the following content:
Jailer Troubleshooting
"s3cmd: command not found"
Install s3cmd:
bash
brew install s3cmd"403 Forbidden" when downloading
Reconfigure s3cmd with correct credentials:
bash
s3cmd --configure
# Get keys from 1Password: "Engineering / Jailer Subsets Access""Import failed: foreign key constraint"
Jailer maintains referential integrity, so this shouldn't happen. If it does:
- Check which table failed
- Try full dump instead:
make db:update - Report to #backend channel
Subset too old
Check when last generated:
bash
make db:jailer:list
# Shows file datesIf older than 2 days, check staging server:
bash
ssh staging-server
sudo /opt/jailer-dumps/scripts/check-health.shNeed specific data not in subset
Options:
- Use larger subset:
make db:update:jailer:recent - Use full dump:
make db:update - Request custom subset from DevOps
Phase 6: Rollout Plan
Week 1: Staging Setup
- [ ] Install Jailer on staging server
- [ ] Configure MySQL connections
- [ ] Set up DigitalOcean Spaces
- [ ] Create all scripts
- [ ] Test manual execution
- [ ] Validate dump integrity
Week 2: Automation
- [ ] Set up cron job
- [ ] Monitor first automated run
- [ ] Verify Spaces uploads
- [ ] Test downloads from dev machine
- [ ] Set up alerting
Week 3: Beta Testing
- [ ] Share with 2-3 developers
- [ ] Test all three profiles
- [ ] Gather feedback
- [ ] Fix issues
- [ ] Update documentation
Week 4: Team Rollout
- [ ] Create onboarding video
- [ ] Share credentials via 1Password
- [ ] Update team documentation
- [ ] Announce in #backend channel
- [ ] Provide support during transition
Success Metrics
Performance
| Metric | Current | Target | Measured |
|---|---|---|---|
| Dev import time | 25 mins | 5 mins | Weekly avg |
| Staging refresh time | N/A | 60 mins | Daily log |
| Download time | N/A | 2 mins | Per subset |
| Disk usage (dev) | 60GB | 5GB | Per machine |
Reliability
| Metric | Target |
|---|---|
| Daily refresh success rate | >95% |
| Backup validation (staging import) | 100% |
| Subset integrity (FK checks) | 100% |
Adoption
| Metric | Target |
|---|---|
| % devs using Jailer subsets | >80% |
| Avg subset age when downloaded | <24 hours |
| Support tickets | <2/month |
Costs
DigitalOcean Spaces
Storage:
- Small subset: 300MB × 30 days = 9GB
- Medium subset: 2GB × 30 days = 60GB
- Recent subset: 3GB × 30 days = 90GB
- Total: ~160GB = $3.20/month
Bandwidth:
- 10 devs × 3 downloads/week × 2GB = 240GB/month
- Cost: Free (first 1TB free)
Total: ~$3-5/month
Staging Server Time
Daily task:
- Import: 30 mins
- Jailer extraction: 20 mins
- Upload: 5 mins
- Total: ~55 mins/day CPU time
Cost: Negligible (runs during off-hours)
Appendix: Advanced Topics
A. Custom Subset Profiles
Create custom extraction for specific use cases:
File: /opt/jailer-dumps/config/profiles/specific-course.sh
bash
# Extract specific course and all related data
COURSE_ID=123
jailer export \
-datamodel "$MODEL_DIR" \
-jdbcjar /opt/jailer/lib/mysql-connector-java.jar \
-url "$JDBC_URL" \
-user "$DB_USER" \
-password "$DB_PASS" \
-format SQL \
-e "course__courses" \
-where "id = $COURSE_ID" \
-d "$DUMP_DIR/course-${COURSE_ID}.sql"B. Parallel Extraction
Speed up subset generation by running profiles in parallel:
bash
# In 3-extract-jailer-subsets.sh
(extract_small) &
(extract_medium) &
(extract_recent) &
wait # Wait for all to completeC. Incremental Updates
For very large databases, consider extracting only new data:
bash
# Store last extraction timestamp
LAST_RUN=$(cat "$BASE_DIR/.last-extraction-timestamp" || echo "1970-01-01")
# Extract only new/updated records
jailer export \
-e "member" \
-where "updated_at > '$LAST_RUN'" \
...
# Update timestamp
date -Iseconds > "$BASE_DIR/.last-extraction-timestamp"D. Multi-Region Distribution
For distributed teams, replicate Spaces to multiple regions:
bash
# In upload script, sync to EU region
s3cmd sync s3://ixdf-jailer-dumps-us/ s3://ixdf-jailer-dumps-eu/Summary
This plan provides:
- ✅ Daily backup validation - Staging imports test production backups
- ✅ Fresh staging data - Staging stays relevant with daily refresh
- ✅ Fast dev imports - 25min → 5min with Jailer subsets
- ✅ Multiple profiles - Small, medium, recent for different use cases
- ✅ Automated pipeline - Cron job handles entire workflow
- ✅ Easy distribution - DigitalOcean Spaces for downloads
- ✅ Monitoring & alerts - Health checks and notifications
- ✅ Cost effective - ~$5/month total costs
Timeline: 4 weeks from setup to full team adoption ROI: 20 mins saved per dev per week × 10 devs = 3.3 hours/week saved