Backups¶
Statement¶
ClaimGuard takes daily snapshots of the production VM's boot disk. Snapshots
are retained for 7 days, run during a low-load window, and stored in the
GCP region containing the VM. Recovery is a documented gcloud compute
disks create --source-snapshot operation that produces a bootable disk
identical to the snapshot point. Snapshot creation, deletion, and restore
events are recorded in the project's immutable audit log.
Implementation¶
What's backed up¶
- The boot disk of
claim-guard-app-1, nameddeepfakebench3(a leftover from a previous repurpose; the disk itself was not renamed when the VM becameclaim-guard-app-1). - The boot disk is the only persistent storage the VM uses. It contains the application code, the on-VM Postgres database files, configuration, and operational state.
Snapshot policy¶
A regional snapshot-schedule resource policy named claim-guard-daily
attaches to the disk. Its parameters:
$ gcloud compute resource-policies describe claim-guard-daily \
--region=europe-west1 --project=train-cvit2
description: ClaimGuard daily snapshots — 7d retention, A2.2
snapshotSchedulePolicy:
schedule:
dailySchedule:
daysInCycle: 1
startTime: '02:00'
retentionPolicy:
maxRetentionDays: 7
onSourceDiskDelete: APPLY_RETENTION_POLICY
- Cadence: every 24 hours, starting at 02:00 (Europe/Brussels — GCP times this in UTC; the VM's local low-load window is around midnight Israel time).
- Retention: 7 days. Older snapshots are auto-deleted.
- On source-disk delete: retention policy still applies (i.e., snapshots aren't immediately purged if the disk is deleted; they age out normally).
The policy is verified attached:
$ gcloud compute disks describe deepfakebench3 \
--zone=europe-west1-b --project=train-cvit2 \
--format='value(resourcePolicies)'
projects/train-cvit2/regions/europe-west1/resourcePolicies/claim-guard-daily
Storage location and encryption¶
GCP Compute Engine snapshots are multi-regional within the source
region's geography by default — for europe-west1, that means the
snapshot is stored across multiple zones in the same European region.
This is the GCP default; we have not overridden it.
Snapshots are encrypted at rest with Google-managed encryption keys (GMEK) by default. ClaimGuard does not currently use customer-managed encryption keys (CMEK) for snapshots; that's a roadmap item for a later SOC 2 cycle.
Recovery procedure¶
To restore the VM from a snapshot, an operator with appropriate IAM
roles (roles/compute.instanceAdmin.v1 and roles/compute.storageAdmin)
runs:
# 1. List available snapshots
gcloud compute snapshots list \
--filter="sourceDisk~deepfakebench3" \
--project=train-cvit2
# 2. Create a new disk from the chosen snapshot
gcloud compute disks create claim-guard-app-1-restore \
--source-snapshot=<snapshot-name> \
--zone=europe-west1-b \
--project=train-cvit2
# 3. Stop the current VM
gcloud compute instances stop claim-guard-app-1 \
--zone=europe-west1-b --project=train-cvit2
# 4. Detach the current boot disk and attach the restored one as boot
# (or create a new VM from the restored disk and migrate the
# static external IP `claim-guard-vm-ip` to it)
# 5. Start, verify pm2 resurrects, smoke-test /api/health
A full validated restore drill has not yet been run end-to-end. Adding a quarterly restore drill is a planned roadmap item (see below).
Audit trail¶
Every snapshot creation and deletion is recorded in the project's
_Required audit log bucket with 400-day immutable retention. See
Audit logging. Representative query:
gcloud logging read \
'resource.type="gce_disk"
AND protoPayload.methodName=~"compute.snapshots.(insert|delete)"' \
--project=train-cvit2 --limit=50 --format=json
Status¶
implemented — verified 2026-04-29.
The snapshot policy was provisioned and attached as plan step A2.2;
the journal entry of the same name in docs/security/HARDENING-LOG.md
records the exact commands and pre-state (zero snapshots existed before
A2.2 landed).
Known gaps and roadmap¶
- Postgres lives on the VM's boot disk. This means the database is recovered as part of disk-level recovery rather than via a dedicated database backup mechanism. RPO is bounded by the snapshot interval (24 hours). Migrating Postgres to Cloud SQL with point-in-time recovery is plan step A2.3 and is the proper long-term fix.
- No off-region replication of snapshots. All copies live in
europe-west1. A regional outage would defer recovery until the region returns. Cross-region snapshot copy is a roadmap item. - No quarterly restore drill yet. The recovery procedure above is documented but not validated end-to-end. First drill scheduled before SOC 2 fieldwork.
- CMEK on snapshots — Google-managed keys today; CMEK with a customer-controlled KMS key is a roadmap item for the next compliance cycle.
- Application-layer backups (Postgres logical dumps) — not in
place. Adding
pg_dumpto a separate GCS bucket on a daily cadence is a low-cost addition that would make per-table or per-row recovery possible without a full disk restore. Tracked as a P2 follow-up.