Business Continuity¶
Statement¶
ClaimGuard's business-continuity (BC) and disaster-recovery (DR) story today is built on three primitives: GCP's regional availability for the production VM and its boot disk, daily snapshot-based backups with a documented restore procedure, and the founder pair as the incident-response substrate. There is no multi-region failover, no hot standby, no documented RTO/RPO target, and no tabletop-validated DR plan.
This control sits at partial: the technical pieces that any DR
plan would build on are in place (snapshots, audit logs, infra-as-
documented in gcloud commands), but the planning artifacts (RTO/RPO
targets, named recovery roles, exercised playbooks) are not.
Implementation¶
Recovery primitives in place¶
- Daily VM boot-disk snapshots, 7-day retention
(
claim-guard-dailyresource policy). Every snapshot is a full, encrypted point-in-time copy of the application code, the on-VM Postgres database, and operational state. See Backups for the schedule, the recoverygcloud compute disks create --source-snapshotprocedure, and known gaps. - Cloud audit logs at 400-day immutable retention so the configuration history of the project is reconstructable from outside the VM. See Audit logging (cloud).
- Secrets in Secret Manager so a freshly-rebuilt VM can come up
by attaching the same service account and pulling secrets at boot —
no out-of-band copy of
.envmaterial is required. See Secrets management. - Documented rebuild path. The
scripts/setup.shscript brings up the local stack idempotently; the cloud equivalent (a documentedgcloud compute instances createrecipe) is queued (see roadmap).
Known limits of the current setup¶
- Single VM, single zone (
europe-west1-b). A zonal outage halts the application until the zone returns. - Single region for snapshots. All snapshot copies live in
europe-west1. A regional outage defers recovery until the region returns. - No standby capacity warm. Recovery starts at "create a new VM from a snapshot" — that is non-zero RTO.
- Postgres on the VM. A disk-level recovery is the same as a
database-level recovery. A separate logical-dump backup (e.g.,
pg_dumpto GCS) is a low-cost addition that is not in place.
What we have not defined yet¶
- Recovery Time Objective (RTO). No written target.
- Recovery Point Objective (RPO). Bounded by snapshot interval (24 hours) and snapshot retention (7 days), but no per-data-class RPO target is recorded.
- Named recovery roles. Today the founder pair handles recovery the same way they handle Incident response; no separate "BC coordinator" role is named.
- Tabletop / drill cadence. No drill has been run end-to-end.
Out of scope¶
- Office / endpoint continuity is out of scope for this page; ClaimGuard is a cloud-hosted application without a physical office whose loss would interrupt operations.
- Customer-side continuity (e.g., a customer's ability to continue operations if ClaimGuard is down) is the customer's concern, addressed in their integration design — outside the scope of this page.
Status¶
partial — verified 2026-04-29.
What's in place:
- Daily encrypted snapshots with documented restore procedure.
- 400-day immutable audit log so configuration history is independent of the running VM.
- Secrets in Secret Manager so a rebuilt VM is self-bootstrapping.
- A small enough operational footprint that recovery is genuinely feasible by a single founder following a written runbook.
Known gaps¶
- No written RTO/RPO targets. This is the single biggest gap.
- No multi-region or multi-zone redundancy. A single zonal outage is a service interruption.
- No dedicated database backup beyond disk snapshots (no
pg_dumpto GCS). - No documented "rebuild a fresh VM from scratch" runbook. The pieces are documented across the trust portal and the hardening log; the connecting walkthrough does not yet exist as one artifact.
- No drill. The procedure is documented but not validated by exercise.
Roadmap¶
- RTO/RPO definition — write down per-data-class targets (probably: "best-effort, ≤24h RPO, ≤24h RTO" at current scale and customer set). Before SOC 2 fieldwork.
- End-to-end DR runbook — combine Backups, Secrets management, and the rebuild instructions into a single "if the VM is gone, do these steps" walkthrough.
- Tabletop / restore drill at least once before SOC 2 fieldwork. First drill should validate both the snapshot restore and the Secret Manager bootstrap of a fresh VM.
pg_dumpto a separate GCS bucket on a daily cadence so per-row recovery is possible without a full disk restore.- Cross-region snapshot copy — see Backups roadmap.
- Postgres → Cloud SQL with point-in-time recovery (plan step A2.3) — collapses the BC story for the database to "Cloud SQL guarantees PITR within retention window."