AI Risk Management¶
Statement¶
ClaimGuard manages the risks of its single production AI integration
(Google Gemini in tools/master_tool/) through three principles
encoded in the product itself: a human-in-the-loop approval
pattern, an explicit-scope prompt that constrains what the model
is allowed to opine on, and a structured-output contract so the
model's response is parseable and auditable rather than free-form
narrative that downstream code cannot reason about.
There is no formal AI risk register distinct from the broader Risk management page, no scheduled model-evaluation cadence, and no AI-specific incident-response runbook. Closing those gaps is the work that keeps this control at partial.
Implementation¶
What we treat as the AI risks¶
The risks we have identified for the current integration:
| Risk | Why it matters here | Today's mitigation |
|---|---|---|
| Hallucination on claim analysis | Gemini could produce a confident but wrong assessment, mislabel an indicator, or invent a corroborating fact. | Human reviewer is the decision-maker; AI output is labeled and advisory. The prompt explicitly directs the model to return INSUFFICIENT_DATA rather than guess. |
| Prompt scope creep | The model could opine on out-of-scope topics (policy coverage, criminal intent, customer character). | The master prompt's "STRICT SCOPE LIMITATION" section bounds the model to tool-grounded signals only. See Prompt guardrails. |
| Vendor model behavior drift | Google can change gemini-2.5-pro behavior under the same model name without our knowledge. |
Documented as a gap on AI transparency; roadmap item is to pin to a versioned model identifier and run a periodic eval set. |
| Prompt injection via user-controlled fields | Claim narratives are operator-supplied text concatenated into the prompt. A malicious narrative could, in principle, attempt to override scope limits. | The prompt's standing instructions are the structural defense; output is structured JSON parsed downstream so a free-form override has limited blast radius. See Prompt guardrails for the honest gap inventory here. |
| Training-on-prompts | Prompt content could be retained and used by the vendor for model training. | Governed by Google Cloud DPA; documented under Subprocessors. Stronger contractual commitments are a roadmap item. |
| Data minimization | We send claim narrative + evidence summaries; we do not send raw uploads, credentials, or unrelated claims. | Code-grounded — tools/master_tool/master_api.py constructs the prompt from a fixed set of fields. |
| Output authority misalignment | A user could read AI-generated text and treat it as a final determination. | Application UI labels AI output; reviewer role is mandatory in the workflow; the prompt forbids accusatory language. |
How AI risk decisions are made today¶
- Per-feature, before deploying a new AI surface. No new AI integration has been added since the trust portal opened; if one is added, the Vendor management process plus a written AI-specific risk evaluation gates the rollout.
- Per-prompt change. Edits to the master prompt or any sibling prompt file go through the same PR review as code changes (see Change management). A reviewer is expected to read the prompt's "IMPORTANT PRINCIPLES" and "STRICT SCOPE LIMITATION" sections to verify they remain intact.
- At deferral renewal time. The annual
.snyk/ risk-review pass reaches model-behavior assumptions if any have been recorded.
What does not exist¶
- No standalone AI risk register. Risks above live on this page; there is no normalized table with risk IDs, owners, severity, mitigation status, and target dates.
- No periodic model evaluation. We do not run a held-out claim set against the live model on a cadence. Behavior drift would be observed only via user-reported anomalies.
- No AI-specific incident-response runbook. A bad-output incident routes through general Incident response today.
- No quantitative AI fairness / bias evaluation. Out of scope at current claim volume but should be on the roadmap for a regulated- customer engagement.
Status¶
partial — verified 2026-04-29.
What's in place:
- A small, named risk inventory specific to the live AI integration.
- Code-grounded mitigations: human-in-the-loop reviewer, structured- output contract, scope-limited prompt, data minimization at the prompt-construction step.
- AI prompt edits go through the same review process as code edits.
Known gaps¶
- No standalone AI risk register.
- No periodic model-eval cadence.
- No AI-specific incident runbook.
- No fairness / bias evaluation cadence.
- No model-version pinning (alias
gemini-2.5-prorather than a versioned identifier). Cross-listed on AI transparency.
Roadmap¶
- Held-out evaluation set of representative claims, run quarterly against the live model. Compare results against the previous run to detect drift.
- Versioned model pin so behavior drift is opt-in.
- AI-specific incident-response addendum to Incident response: what to do if a SEV-2 "AI produced an off-the-rails determination" event surfaces.
- AI risk register as a single doc page or sub-section of Risk management.
- Fairness / bias evaluation before the first regulated-customer engagement.