Business Continuity¶

Statement¶

ClaimGuard's business-continuity (BC) and disaster-recovery (DR) story today is built on three primitives: GCP's regional availability for the production VM and its boot disk, daily snapshot-based backups with a documented restore procedure, and the founder pair as the incident-response substrate. There is no multi-region failover, no hot standby, no documented RTO/RPO target, and no tabletop-validated DR plan.

The business-continuity control is operating: the technical substrate (daily encrypted snapshots, 400-day immutable audit log, Secret Manager bootstrap, documented gcloud rebuild commands) is in place and verifiable. Planning artifacts (RTO/RPO targets, named recovery roles, exercised playbooks) are not.

Implementation¶

Recovery primitives in place¶

Daily VM boot-disk snapshots, 7-day retention (claim-guard-daily resource policy). Every snapshot is a full, encrypted point-in-time copy of the application code, the on-VM Postgres database, and operational state. See Backups for the schedule, the recovery gcloud compute disks create --source-snapshot procedure, and known gaps.
Cloud audit logs at 400-day immutable retention so the configuration history of the project is reconstructable from outside the VM. See Audit logging (cloud).
Secrets in Secret Manager so a freshly-rebuilt VM can come up by attaching the same service account and pulling secrets at boot — no out-of-band copy of .env material is required. See Secrets management.
Documented rebuild path. The scripts/setup.sh script brings up the local stack idempotently; the cloud equivalent (a documented gcloud compute instances create recipe) is queued (see roadmap).

Known limits of the current setup¶

Single VM, single zone (europe-west1-b). A zonal outage halts the application until the zone returns.
Single region for snapshots. All snapshot copies live in europe-west1. A regional outage defers recovery until the region returns.
No standby capacity warm. Recovery starts at "create a new VM from a snapshot" — that is non-zero RTO.
Postgres on the VM. A disk-level recovery is the same as a database-level recovery. A separate logical-dump backup (e.g., pg_dump to GCS) is a low-cost addition that is not in place.

What we have not defined yet¶

Recovery Time Objective (RTO). No written target.
Recovery Point Objective (RPO). Bounded by snapshot interval (24 hours) and snapshot retention (7 days), but no per-data-class RPO target is recorded.
Named recovery roles. Today the founder pair handles recovery the same way they handle Incident response; no separate "BC coordinator" role is named.
Tabletop / drill cadence. No drill has been run end-to-end.

Out of scope¶

Office / endpoint continuity is out of scope for this page; ClaimGuard is a cloud-hosted application without a physical office whose loss would interrupt operations.
Customer-side continuity (e.g., a customer's ability to continue operations if ClaimGuard is down) is the customer's concern, addressed in their integration design — outside the scope of this page.

Status¶

implemented — verified 2026-05-06. Backup, restore, and audit-history substrates are in place and operational. Known gaps below are planning-artifact and drill items for SOC 2 fieldwork.

What's in place:

Daily encrypted snapshots with documented restore procedure.
400-day immutable audit log so configuration history is independent of the running VM.
Secrets in Secret Manager so a rebuilt VM is self-bootstrapping.
A small enough operational footprint that recovery is genuinely feasible by a single founder following a written runbook.

Known gaps¶

No written RTO/RPO targets. This is the single biggest gap.
No multi-region or multi-zone redundancy. A single zonal outage is a service interruption.
No dedicated database backup beyond disk snapshots (no pg_dump to GCS).
No documented "rebuild a fresh VM from scratch" runbook. The pieces are documented across the trust portal and the hardening log; the connecting walkthrough does not yet exist as one artifact.
No drill. The procedure is documented but not validated by exercise.

Roadmap¶

RTO/RPO definition — write down per-data-class targets (probably: "best-effort, ≤24h RPO, ≤24h RTO" at current scale and customer set). Before SOC 2 fieldwork.
End-to-end DR runbook — combine Backups, Secrets management, and the rebuild instructions into a single "if the VM is gone, do these steps" walkthrough.
Tabletop / restore drill at least once before SOC 2 fieldwork. First drill should validate both the snapshot restore and the Secret Manager bootstrap of a fresh VM.
pg_dump to a separate GCS bucket on a daily cadence so per-row recovery is possible without a full disk restore.
Cross-region snapshot copy — see Backups roadmap.
Postgres → Cloud SQL with point-in-time recovery ( the Cloud SQL migration step) — collapses the BC story for the database to "Cloud SQL guarantees PITR within retention window."