Skip to content

Runbook — JWT_SECRET rotation

Plan step: A2.5. Owner: backend lead. Stated cadence: 90 days, or immediately on suspected leak (docs/security/PROTOCOL.md §1). Blast radius: every issued JWT becomes invalid the moment a new secret takes effect. Every authenticated user must re-login. There is no per-user staggered rolloutJWT_SECRET is a single process-level signing key.

This runbook is the comms-coordinated playbook the trust portal roadmap on cryptography.md and session-management.md references. It does not trigger a rotation; it tells the operator what a rotation looks like when one is decided.


When to rotate

Scheduled rotations (90 days)

The default cadence. Forced re-login is a small cost on a 90-day calendar; pushing further than 90 days starts to compound the value of a compromised secret if a future leak is discovered late.

Schedule the next scheduled rotation before finishing the current one — set a calendar reminder for last_rotation_date + 90 days so the cadence does not silently slip.

Emergency rotations (immediate)

Rotate without waiting for the cadence if any of the following holds:

  • A JWT_SECRET value (current or any prior version) was committed to a repo, written to a log, posted in chat, sent in an email, or in any way exited the Secret Manager + on-VM-process boundary.
  • A VM-side compromise is suspected (regardless of whether exfiltration is confirmed).
  • A backend developer with current access to Secret Manager leaves the company.
  • The secret was visible in a screen-share, demo recording, or third-party support session, even momentarily.

Emergency rotations do not require pre-announcement; the user impact (forced re-login) is the same and is preferable to leaving the suspected-compromised secret in place.


Pre-flight

Run all of the following before issuing the new secret version:

  1. Confirm the trigger. For scheduled rotation: today is last_rotation_date + ≥ 90 days. For emergency: identify the leak vector, even if uncertain.
  2. Identify on-call. A second engineer must be reachable for the next 30 minutes — rotation hits every authenticated user simultaneously.
  3. Pick the window. Scheduled rotations land Sunday 06:00–08:00 Europe/Brussels by convention — minimum traffic, support team on standby, US/EU/IL operators all reach the same morning. Emergency rotations override the window.
  4. Write the user-facing notice. A pre-baked notice template is below. For scheduled rotation, send it 48 hours and again 1 hour before the window. Emergency rotation: send during or immediately after.
  5. Verify the local stack is healthy and on the current secret version: gcloud secrets versions list claim-guard-jwt-secret --project=train-cvit2 shows the current latest.
  6. Confirm Secret Manager access. The operator running the rotation must have roles/secretmanager.secretVersionAdder or higher on claim-guard-jwt-secret. The VM service account (claim-guard-vm@…) keeps its existing secretAccessor role — no IAM change at rotation time.
  7. Generate the new secret value outside shell history. Recommended:
    NEW_SECRET=$(openssl rand -base64 48)
    
    Length floor: 32 bytes of entropy. The current secret is 48 bytes base64 (~64 chars); preserve that shape so log parsers and length-tagged tooling don't notice the change.

Procedure

All commands run from a workstation with gcloud auth login against roee@dtectvision.ai and gcloud config set project train-cvit2.

# 1. Add a new secret version. Does NOT yet take effect anywhere
#    because the server fetches `versions/latest` only at boot.
printf '%s' "$NEW_SECRET" | gcloud secrets versions add \
  claim-guard-jwt-secret \
  --data-file=- \
  --project=train-cvit2

# 2. Verify the new version is at the head.
gcloud secrets versions list claim-guard-jwt-secret \
  --project=train-cvit2 \
  --limit=3
# Expected: the just-added version is `ENABLED` and at the top.

# 3. Restart pm2 on the production VM so the new value is read on
#    boot via server/src/lib/secrets.js. Do this OVER IAP-SSH only.
gcloud compute ssh claim-guard-app-1 \
  --tunnel-through-iap \
  --zone=europe-west1-b \
  --command='pm2 restart all && pm2 logs --lines 50 --nostream'
# Expected pm2 output: clean boot, no JWT_SECRET error, no
# Secret Manager error, no DATABASE_URL error.

# 4. Smoke-test from outside the cloud project.
curl -fsS https://app.dtectvision.ai/api/health
# Expected: HTTP 200, body { status: "ok", ... }.
# (If A1.3 has not yet landed, hit the current public IP instead.)

After step 3, every previously issued JWT is now invalid. Users will see 401 with code: TOKEN_EXPIRED (jsonwebtoken returns this on signature mismatch when the old token's exp has not yet been reached) or Invalid token. They must log in again to receive a JWT signed with the new secret.


Verification

Check Command Expected
New secret version is latest gcloud secrets versions list claim-guard-jwt-secret --project=train-cvit2 --limit=1 the version added in step 1 is ENABLED
Server is using the new secret pm2 logs --lines 200 on the VM (over IAP-SSH) no JWT_SECRET environment variable is required error; no Secret Manager errors
Existing tokens are rejected from a separate session: curl -H "Authorization: Bearer <jwt-issued-before-rotation>" https://app.dtectvision.ai/api/auth/me HTTP 401
New login succeeds curl -X POST -d '{"email":"…","password":"…"}' -H "Content-Type: application/json" https://app.dtectvision.ai/api/auth/login HTTP 200, response includes a token
New token works curl -H "Authorization: Bearer <new-jwt>" https://app.dtectvision.ai/api/auth/me HTTP 200, returns the user record

Recovery

If pm2 fails to boot after the rotation (e.g. Secret Manager throws, or the new secret was generated with a non-printable character that broke the env-var pipeline), the working escape hatch is:

# On the VM, over IAP-SSH:
# 1. Disable the broken version.
gcloud secrets versions disable <new-version-id> \
  --secret=claim-guard-jwt-secret \
  --project=train-cvit2

# 2. Restart so the previous version is now `latest`.
pm2 restart all
pm2 logs --lines 100 --nostream

The previous version remains ENABLED automatically, so disabling the new one re-elevates it. Users with tokens issued under the original secret will still be rejected — pm2 restart already booted them out once. They must re-login one more time once the recovery is done. There is no version of "rollback that preserves sessions" — a JWT signing-key rotation cannot be transparent.

If the issue is "new secret is fine, the rotation still went bad somewhere," do not delete versions. disable is reversible; destroy is not.


Post-rotation

  1. Update the calendar reminder. Next scheduled rotation = today + 90 days.
  2. Add a journal entry to the top of docs/security/HARDENING-LOG.md under today's date. Include: trigger (scheduled vs. emergency + reason), version IDs (old → new), pm2 restart timestamp, the smoke-test response, and the user-impact window. Use the heading ### A2.5 — JWT_SECRET rotation [done <date>] so future runbook iterations can grep for it.
  3. After 7 days, destroy old versions that are no longer the most recent two:
    gcloud secrets versions destroy <old-version-id> \
      --secret=claim-guard-jwt-secret --project=train-cvit2
    
    Keep latest and latest - 1 enabled as the recovery floor. Older versions are not consulted by the server (it always reads versions/latest) and only widen the leak surface if Secret Manager itself were ever compromised.

User-facing notice templates

Scheduled (T-48h and T-1h)

Scheduled maintenance — Sunday \<DATE> 06:00–08:00 CET

ClaimGuard will perform a scheduled security maintenance window at the time above. You will be logged out as part of the maintenance and asked to log in again. No user action is required in advance; bookmarked links and saved data are unaffected. If you have trouble logging in after \<DATE 09:00 CET>, please contact support at security@dtectvision.ai.

Emergency (during / immediately after)

You have been logged out as part of an unscheduled security action. Your data is unaffected. Please log in again. If you have any concerns or questions, contact security@dtectvision.ai.

Do not include the words "rotation," "compromise," or "JWT" in the user-facing copy. They invite questions you can answer in a follow-up if asked, but should not lead the message.


Cross-references

  • docs/security/PROTOCOL.md §1 Secrets — 90-day cadence rule.
  • Cryptography — public-facing claim about rotation.
  • Session management — public-facing claim about the rotation's user impact.
  • server/src/lib/secrets.js — Secret Manager bootstrap. Reads versions/latest at boot.
  • server/src/middleware/auth.jsJWT_SECRET is mandatory at boot; signature verification happens here.
  • docs/security/HARDENING-LOG.md — the journal to which each completed rotation is recorded.