Operations Runbooks
These runbooks are practical procedures for common operator tasks. They assume the root Docker Compose stack and local defaults unless noted otherwise.
Before You Touch Production
Record the current state:
docker compose ps
curl -fsS http://localhost:8080/ready
curl -fsS http://localhost:8081/health
docker compose exec core php artisan cdn:edge:list
docker compose exec core php artisan cdn:readiness:checkCapture the active config snapshot:
curl -s http://localhost:8080/api/v1/config/snapshots \
-H "Authorization: Bearer $CDNLITE_API_TOKEN"Write down:
- Active snapshot version.
- Edge nodes with recent heartbeat.
- Domains being changed.
- Expected rollback action.
- Who approved the change.
Add A Domain
- Create the domain.
- Add at least one primary origin.
- Add DNS records.
- Verify nameserver delegation.
- Activate the domain.
- Pull edge config.
- Send test traffic with a
Hostheader. - Watch analytics and security events.
Commands:
docker compose exec core php artisan cdn:domain:create \
--domain=example.com \
--name=Example
docker compose exec core php artisan cdn:domain:list
docker compose exec core php artisan cdn:domain:verify-ns --domain_id="$DOMAIN_ID"
docker compose exec core php artisan cdn:domain:activate --domain_id="$DOMAIN_ID"
docker compose exec core php artisan cdn:edge:sync-configRollback:
- Disable or delete the new domain record if traffic has not been delegated.
- Restore old DNS delegation if registrar changes already happened.
- Roll back the config snapshot if the edge pulled a bad config.
Rotate An Edge Token
Use this when an edge secret might be exposed or during scheduled rotation.
- Generate a new token.
- Register the new token in core.
- Update the edge agent secret.
- Restart the edge agent.
- Confirm heartbeat.
- Confirm config pull.
NEW_TOKEN=$(openssl rand -hex 32)
docker compose exec core php artisan cdn:edge:rotate-token \
--edge_id=edge-prod-1 \
--token="$NEW_TOKEN"After updating the edge environment:
docker compose restart edge-agent
docker compose logs --tail=120 edge-agent
docker compose exec core php artisan cdn:edge:show --edge_id=edge-prod-1Rollback:
- Rotate again to a known-good token.
- Do not re-enable a token that may be exposed.
Roll Back A Bad Config Snapshot
Symptoms:
- Edge starts serving unexpected origins.
- New rules block legitimate traffic.
- DNS/routing changes produce wrong config.
edge-sync-status.jsonshows a newer bad version.
Procedure:
curl -s http://localhost:8080/api/v1/config/snapshots \
-H "Authorization: Bearer $CDNLITE_API_TOKEN"
curl -s -X POST http://localhost:8080/api/v1/config/snapshots/diff \
-H "Authorization: Bearer $CDNLITE_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{"from_version":12,"to_version":13}'
curl -s -X POST http://localhost:8080/api/v1/config/snapshots/12/rollback \
-H "Authorization: Bearer $CDNLITE_API_TOKEN"Then force or wait for edge pull:
docker compose exec edge-agent sh /opt/cdnlite-agent/pull_config.sh
docker compose exec edge-agent sh -c 'cat "$EDGE_SYNC_STATUS_PATH"'Post-check:
- Confirm edge serves expected origin.
- Confirm readiness warnings clear.
- Keep the bad version number in the incident notes.
- Fix the database-backed state that generated the bad snapshot.
Investigate Edge 502
Decision tree:
- Does
curl http://localhost:8081/healthpass? - Does the request include the expected
Hostheader? - Is the host present in
config.json? - Is the domain active and proxied?
- Is the primary origin healthy?
- Is a backup origin available?
- Is a WAF/IP/rate-limit rule blocking the request?
Commands:
curl -i http://localhost:8081/ \
-H 'Host: example.com'
docker compose exec edge-agent sh -c 'cat "$EDGE_CONFIG_PATH"'
docker compose logs --tail=150 edge
docker compose logs --tail=150 edge-agentFix patterns:
| Finding | Action |
|---|---|
| Unknown host | Activate domain and pull config. |
| Config missing | Fix edge auth, then pull config. |
| Origin unhealthy | Restore origin or promote backup. |
| Rule block | Disable or narrow the rule. |
| Cache stale | Purge URL/prefix or wait for TTL. |
Recover Empty Analytics
- Check edge metric queue.
- Check edge agent logs.
- Check collector auth.
- Recalculate aggregates.
- Verify dashboard filters.
docker compose exec edge-agent sh -c 'test -f "$METRIC_PATH" && wc -l "$METRIC_PATH" || true'
docker compose logs --tail=120 edge-agent
curl -s -X POST http://localhost:8080/api/v1/usage/recalculate \
-H "Authorization: Bearer $CDNLITE_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{"bucket":"hour"}'If metrics are queued but not ingested, focus on edge auth and collector responses. If metrics are ingested but dashboard is empty, check domain filters and bucket selection.
DNS Publishing Incident
Symptoms:
- DNS record API returns integration errors.
- PowerDNS test fails.
- Customer zone is missing.
- Edge DNS records do not match expected anycast settings.
Procedure:
docker compose --profile powerdns ps
curl -fsS http://localhost:8089/health
docker compose exec core php artisan cdn:settings:test-powerdns
docker compose logs --tail=120 core
docker compose logs --tail=120 powerdnsQuestions to answer:
- Did the PowerDNS API URL change?
- Did the API key rotate?
- Is strict PowerDNS mode enabled?
- Is the customer zone supposed to be lazily created?
- Are platform edge DNS settings complete?
Rollback:
- Restore previous PowerDNS settings.
- Rebuild customer zones only after the API test passes.
- Avoid live external DNS mutation when the mock can reproduce the issue.
SSL Issuance Failure
Symptoms:
- ACME status remains pending.
- DNS challenge record is missing.
- Certificate renewal job logs errors.
- Force HTTPS cannot be enabled.
Procedure:
docker compose logs --tail=160 ssl-scheduler
docker compose exec core php artisan cdn:ssl:list --domain_id="$DOMAIN_ID"
docker compose exec core php artisan cdn:settings:test-powerdnsChecklist:
- ACME directory is staging for tests.
- Contact email is set.
- DNS propagation seconds are realistic.
_acme-challengerecords are DNS-only.CDNLITE_SSL_SECRET_KEYdid not change.- Domain is active and proxied when required by the workflow.
Fallback:
- Import a manual certificate.
- Keep force HTTPS disabled until a valid active certificate exists.
Release Validation
Run these before a release that changes runtime behavior:
docker compose config --quiet
find core -name '*.php' -print0 | xargs -0 -n1 php -l
pytest -q core/tests
cd dash && npm ci && npm run typecheck && npm test && npm run build
cd docs && npm ci && npm run docs:buildThen run stack tests:
docker compose up -d --build --wait
./ci/smoke.sh
docker compose --profile powerdns up -d --build
EDGE_AGENT_IDLE=1 CDNLITE_CACHE_DEFAULT_TTL=1s ./ci/e2e.sh
./ci/powerdns_dns_checks.shRelease notes should include:
- Changed behavior.
- API or CLI changes.
- Required environment changes.
- Migration or rollback steps.
- Test commands that passed.
