Skip to content

Backups

Statement

ClaimGuard takes daily snapshots of the production VM's boot disk. Snapshots are retained for 7 days, run during a low-load window, and stored in the GCP region containing the VM. Recovery is a documented gcloud compute disks create --source-snapshot operation that produces a bootable disk identical to the snapshot point. Snapshot creation, deletion, and restore events are recorded in the project's immutable audit log.

Implementation

What's backed up

  • The boot disk of claim-guard-app-1, named deepfakebench3 (a leftover from a previous repurpose; the disk itself was not renamed when the VM became claim-guard-app-1).
  • The boot disk is the only persistent storage the VM uses. It contains the application code, the on-VM Postgres database files, configuration, and operational state.

Snapshot policy

A regional snapshot-schedule resource policy named claim-guard-daily attaches to the disk. Its parameters:

$ gcloud compute resource-policies describe claim-guard-daily \
    --region=europe-west1 --project=train-cvit2
description: ClaimGuard daily snapshots — 7d retention, A2.2
snapshotSchedulePolicy:
  schedule:
    dailySchedule:
      daysInCycle: 1
      startTime: '02:00'
  retentionPolicy:
    maxRetentionDays: 7
    onSourceDiskDelete: APPLY_RETENTION_POLICY
  • Cadence: every 24 hours, starting at 02:00 (Europe/Brussels — GCP times this in UTC; the VM's local low-load window is around midnight Israel time).
  • Retention: 7 days. Older snapshots are auto-deleted.
  • On source-disk delete: retention policy still applies (i.e., snapshots aren't immediately purged if the disk is deleted; they age out normally).

The policy is verified attached:

$ gcloud compute disks describe deepfakebench3 \
    --zone=europe-west1-b --project=train-cvit2 \
    --format='value(resourcePolicies)'
projects/train-cvit2/regions/europe-west1/resourcePolicies/claim-guard-daily

Storage location and encryption

GCP Compute Engine snapshots are multi-regional within the source region's geography by default — for europe-west1, that means the snapshot is stored across multiple zones in the same European region. This is the GCP default; we have not overridden it.

Snapshots are encrypted at rest with Google-managed encryption keys (GMEK) by default. ClaimGuard does not currently use customer-managed encryption keys (CMEK) for snapshots; that's a roadmap item for a later SOC 2 cycle.

Recovery procedure

To restore the VM from a snapshot, an operator with appropriate IAM roles (roles/compute.instanceAdmin.v1 and roles/compute.storageAdmin) runs:

# 1. List available snapshots
gcloud compute snapshots list \
  --filter="sourceDisk~deepfakebench3" \
  --project=train-cvit2

# 2. Create a new disk from the chosen snapshot
gcloud compute disks create claim-guard-app-1-restore \
  --source-snapshot=<snapshot-name> \
  --zone=europe-west1-b \
  --project=train-cvit2

# 3. Stop the current VM
gcloud compute instances stop claim-guard-app-1 \
  --zone=europe-west1-b --project=train-cvit2

# 4. Detach the current boot disk and attach the restored one as boot
#    (or create a new VM from the restored disk and migrate the
#    static external IP `claim-guard-vm-ip` to it)

# 5. Start, verify pm2 resurrects, smoke-test /api/health

A full validated restore drill has not yet been run end-to-end. Adding a quarterly restore drill is a planned roadmap item (see below).

Audit trail

Every snapshot creation and deletion is recorded in the project's _Required audit log bucket with 400-day immutable retention. See Audit logging. Representative query:

gcloud logging read \
  'resource.type="gce_disk"
   AND protoPayload.methodName=~"compute.snapshots.(insert|delete)"' \
  --project=train-cvit2 --limit=50 --format=json

Status

implemented — verified 2026-04-29.

The snapshot policy was provisioned and attached as plan step A2.2; the journal entry of the same name in docs/security/HARDENING-LOG.md records the exact commands and pre-state (zero snapshots existed before A2.2 landed).

Known gaps and roadmap

  • Postgres lives on the VM's boot disk. This means the database is recovered as part of disk-level recovery rather than via a dedicated database backup mechanism. RPO is bounded by the snapshot interval (24 hours). Migrating Postgres to Cloud SQL with point-in-time recovery is plan step A2.3 and is the proper long-term fix.
  • No off-region replication of snapshots. All copies live in europe-west1. A regional outage would defer recovery until the region returns. Cross-region snapshot copy is a roadmap item.
  • No quarterly restore drill yet. The recovery procedure above is documented but not validated end-to-end. First drill scheduled before SOC 2 fieldwork.
  • CMEK on snapshots — Google-managed keys today; CMEK with a customer-controlled KMS key is a roadmap item for the next compliance cycle.
  • Application-layer backups (Postgres logical dumps) — not in place. Adding pg_dump to a separate GCS bucket on a daily cadence is a low-cost addition that would make per-table or per-row recovery possible without a full disk restore. Tracked as a P2 follow-up.