Skip to content

Business Continuity

Statement

ClaimGuard's business-continuity (BC) and disaster-recovery (DR) story today is built on three primitives: GCP's regional availability for the production VM and its boot disk, daily snapshot-based backups with a documented restore procedure, and the founder pair as the incident-response substrate. There is no multi-region failover, no hot standby, no documented RTO/RPO target, and no tabletop-validated DR plan.

This control sits at partial: the technical pieces that any DR plan would build on are in place (snapshots, audit logs, infra-as- documented in gcloud commands), but the planning artifacts (RTO/RPO targets, named recovery roles, exercised playbooks) are not.

Implementation

Recovery primitives in place

  • Daily VM boot-disk snapshots, 7-day retention (claim-guard-daily resource policy). Every snapshot is a full, encrypted point-in-time copy of the application code, the on-VM Postgres database, and operational state. See Backups for the schedule, the recovery gcloud compute disks create --source-snapshot procedure, and known gaps.
  • Cloud audit logs at 400-day immutable retention so the configuration history of the project is reconstructable from outside the VM. See Audit logging (cloud).
  • Secrets in Secret Manager so a freshly-rebuilt VM can come up by attaching the same service account and pulling secrets at boot — no out-of-band copy of .env material is required. See Secrets management.
  • Documented rebuild path. The scripts/setup.sh script brings up the local stack idempotently; the cloud equivalent (a documented gcloud compute instances create recipe) is queued (see roadmap).

Known limits of the current setup

  • Single VM, single zone (europe-west1-b). A zonal outage halts the application until the zone returns.
  • Single region for snapshots. All snapshot copies live in europe-west1. A regional outage defers recovery until the region returns.
  • No standby capacity warm. Recovery starts at "create a new VM from a snapshot" — that is non-zero RTO.
  • Postgres on the VM. A disk-level recovery is the same as a database-level recovery. A separate logical-dump backup (e.g., pg_dump to GCS) is a low-cost addition that is not in place.

What we have not defined yet

  • Recovery Time Objective (RTO). No written target.
  • Recovery Point Objective (RPO). Bounded by snapshot interval (24 hours) and snapshot retention (7 days), but no per-data-class RPO target is recorded.
  • Named recovery roles. Today the founder pair handles recovery the same way they handle Incident response; no separate "BC coordinator" role is named.
  • Tabletop / drill cadence. No drill has been run end-to-end.

Out of scope

  • Office / endpoint continuity is out of scope for this page; ClaimGuard is a cloud-hosted application without a physical office whose loss would interrupt operations.
  • Customer-side continuity (e.g., a customer's ability to continue operations if ClaimGuard is down) is the customer's concern, addressed in their integration design — outside the scope of this page.

Status

partial — verified 2026-04-29.

What's in place:

  • Daily encrypted snapshots with documented restore procedure.
  • 400-day immutable audit log so configuration history is independent of the running VM.
  • Secrets in Secret Manager so a rebuilt VM is self-bootstrapping.
  • A small enough operational footprint that recovery is genuinely feasible by a single founder following a written runbook.

Known gaps

  • No written RTO/RPO targets. This is the single biggest gap.
  • No multi-region or multi-zone redundancy. A single zonal outage is a service interruption.
  • No dedicated database backup beyond disk snapshots (no pg_dump to GCS).
  • No documented "rebuild a fresh VM from scratch" runbook. The pieces are documented across the trust portal and the hardening log; the connecting walkthrough does not yet exist as one artifact.
  • No drill. The procedure is documented but not validated by exercise.

Roadmap

  • RTO/RPO definition — write down per-data-class targets (probably: "best-effort, ≤24h RPO, ≤24h RTO" at current scale and customer set). Before SOC 2 fieldwork.
  • End-to-end DR runbook — combine Backups, Secrets management, and the rebuild instructions into a single "if the VM is gone, do these steps" walkthrough.
  • Tabletop / restore drill at least once before SOC 2 fieldwork. First drill should validate both the snapshot restore and the Secret Manager bootstrap of a fresh VM.
  • pg_dump to a separate GCS bucket on a daily cadence so per-row recovery is possible without a full disk restore.
  • Cross-region snapshot copy — see Backups roadmap.
  • Postgres → Cloud SQL with point-in-time recovery (plan step A2.3) — collapses the BC story for the database to "Cloud SQL guarantees PITR within retention window."