Reference Schema · 11 min read

AI SDLC Audit Trail: 12 Fields the FCA Will Ask About

A field-by-field reference for the audit-trail schema regulated firms need when agents touch the software delivery pipeline — what to capture, why, where to store it, and the failure modes that void the evidence.

Published 30 April 2026 · By Sunny Patel, Founder, Agentic AI Associates

Why this is the most-deferred decision in regulated agentic AI

If you do one thing right at the start of an agentic SDLC programme in a regulated firm, do this. Get the audit-trail schema right in Phase 0 and the rest of the programme stays operationally clean. Get it wrong and you are rebuilding it under regulatory pressure six months in.

Below is the 12-field schema we install in regulated engagements. Field by field: what to capture, why a supervisor cares, and the failure modes that look like compliance but are not.

The 12-field schema

01.

event_id

What to capture
Globally unique identifier for the action — UUIDv7 recommended for time-orderability
Why a supervisor cares
Lets you reference any single event in incident response, audit queries, and supervisory disclosures
Common failure modes
Reusing IDs across replays. Using DB autoincrement (creates ordering ambiguity across shards).
02.

timestamp_utc

What to capture
ISO-8601 with millisecond precision, always UTC
Why a supervisor cares
Reconstructs sequence; aligns with other regulated logs (trade, transaction, communication) on the same clock
Common failure modes
Local timezones. Second-precision (loses ordering for fast multi-step runs).
03.

agent_id + version_hash

What to capture
Identifier of the agent definition + a content hash of the agent code, prompt, and tool list at runtime
Why a supervisor cares
A regulator asking "exactly what agent did this?" should be answerable in one query. Hashes prevent ambiguity from later edits.
Common failure modes
Logging agent name only. Letting agent definitions mutate without versioning. Storing the hash but not the underlying versioned definition.
04.

model + grounding_source

What to capture
Exact model identifier (provider + name + version + region) and the retrieval index, knowledge base, or memory store used
Why a supervisor cares
Reproduces the inference inputs. Required for model-risk attestation (SS1/23 alignment) and for incident replay.
Common failure modes
"claude-3" with no version. Vector-store ID without the index version. Mixing inference-time and training-time grounding.
05.

principal

What to capture
Human or service identity that initiated the run — typically OIDC subject claim or service-account ARN
Why a supervisor cares
Ties an action to an accountable human or named service, not a faceless agent
Common failure modes
Agent self-initiated runs with no parent principal. Service-account chains where the upstream principal is lost.
06.

smf_owner

What to capture
The Senior Manager Function holder accountable for this agent, recorded as a stable internal reference
Why a supervisor cares
Direct mapping from event to SM&CR-attesting human; supports the quarterly attestation pack
Common failure modes
Logging the role name but not the ID (hard to query when SMF holders rotate). Not maintaining historical mapping.
07.

data_classes_touched

What to capture
Set of data classifications referenced, e.g. {public, internal, customer_pii, financial_account}
Why a supervisor cares
Allows a single query to answer "did any agent ever touch class X?". Critical for boundary enforcement and breach detection.
Common failure modes
Free-text class labels. Coarse-grained classes that mask sensitive subsets. Logging "customer_data" instead of typed classes.
08.

tools_invoked

What to capture
Ordered list of tool calls with hashes of inputs and outputs and the result status
Why a supervisor cares
Reconstructs the action sequence. Supports retrospective tool-level boundary checks.
Common failure modes
Logging tool names without inputs. Storing PII in tool-call payloads in clear (use hashes; store the cleartext under separate access if needed).
09.

decision_rationale

What to capture
The agent's articulated reason for the action, captured at the time, in plain text or structured form
Why a supervisor cares
Required for Consumer Duty outcome 4 (consumer understanding) where AI influences customer-facing actions; supports incident triage
Common failure modes
Capturing only the LLM output, not the reasoning. Summarising rationale at write time (loses fidelity).
10.

outcome

What to capture
Enumerated result: success | refused_by_policy | escalated_to_human | failed | aborted | timed_out
Why a supervisor cares
Lets you query the population of outcomes by agent, period, or use case for Model Risk Committee review
Common failure modes
Free-text outcomes. Conflating "refused by policy" with "failed".
11.

cost_units

What to capture
Tokens consumed, tool fees, infrastructure metered units, and a denormalised total in pence
Why a supervisor cares
Budget enforcement, chargeback, and cost-attribution analytics. Also flags anomalous runs.
Common failure modes
Token counts from the LLM provider mixed across input/output (must be split). Forgetting tool-call costs.
12.

review_status

What to capture
unreviewed | sampled_clean | sampled_flagged | escalated | resolved
Why a supervisor cares
Drives the monthly Model Risk Committee sample-review queue and quarterly attestation evidence
Common failure modes
Status set once and never updated. No process to mark events as reviewed.

Storage architecture in one paragraph

Write events to an append-only, hash-chained store with WORM-grade immutability. AWS S3 Object Lock (Compliance mode) and Azure immutable Blob storage are the two most common implementations we see in regulated UK firms. Layer a daily root-hash anchor written to a separate system. Keep PII out of the primary store using hashes plus an entitlement-controlled cleartext sidecar keyed by event_id. Index for common queries (by agent, by SMF owner, by data class, by date range) on a separate read replica or search index — never on the immutable store directly.

What supervisors actually ask for

Five questions, asked in some form, in every supervisory engagement we have observed where AI was a topic:

  1. Show me an agent decision from yesterday and tell me who is accountable. Schema fields needed: 1, 2, 3, 6, 9. If you cannot answer in under two minutes, the audit trail is not yet fit-for-purpose.
  2. Show me every time an agent touched class X data in the last 90 days. Schema fields: 7. Indexed access to data_classes_touched is non-negotiable.
  3. Show me the rate of refused_by_policy outcomes by agent. Schema field: 10. This is the supervisor checking whether your boundary is being tested or just nominal.
  4. What evidence do you have that this agent has not been tampered with since deployment? Schema field: 3. The version hash is your answer; without it you have a credibility problem regardless of other controls.
  5. Walk me through how a single decision was made. Schema fields: 1, 4, 7, 8, 9. End-to-end replay capability — what the agent saw, what tools it used, why it chose what it did.

Frequently asked questions

How long do I need to keep this data?

Default to seven years for FCA-regulated workloads, aligned to most existing record-keeping obligations under SYSC. Some Consumer Duty-related records benefit from longer retention; fraud and AML records may be governed by separate schedules. Confirm with your firm's record-retention policy and your DPO.

Where should the audit trail physically live?

Append-only, hash-chained storage with object lock or equivalent immutability. Common implementations: AWS S3 with Object Lock (Compliance mode), Azure Blob with immutable storage policies, or a managed compliance-grade store. Avoid using your APM (Datadog, New Relic) as the source of truth — APMs are excellent for operational telemetry but most do not meet the tamper-evident expectations a supervisor will probe.

Can I redact PII before writing?

You can and often should. The pattern is: write the structured event with hashes of sensitive fields, store cleartext (if needed at all) in a separately-access-controlled store keyed by the same event_id, with an entitlement model that lets only named roles retrieve it. This way the audit trail is fully queryable for governance purposes without becoming a PII liability.

Do I really need decision_rationale on every action?

Yes for any action that influences a regulated decision or a consumer-facing outcome. The cheap answer is to capture the agent's scratchpad or chain-of-thought verbatim. The robust answer is to require structured rationale (a typed object: justification, alternatives_considered, policy_constraints) — agents with this requirement built into the prompt produce significantly better audit evidence.

How do I prove tamper-evidence?

Three layers. (1) Storage-level immutability (Object Lock, immutable blob, WORM appliance). (2) Hash-chained event sequencing — each event's hash includes the previous event's hash, so any tampering breaks the chain at the affected point. (3) Periodic anchoring — write a daily root hash to a separate system (or, for higher-stakes use, a public ledger). Most regulated firms get to the first two and stop; the third is appropriate for firms with the highest supervisory exposure.

How does this differ from a typical application audit log?

A typical application audit log captures user actions on application surfaces. An AI SDLC audit trail captures agent decisions, the inputs they relied on, the rationale they articulated, and the actions they took. The granularity is finer (per-decision, not per-request), the schema is richer (rationale, model, grounding), and the retention policy is stricter. They are complementary; the application audit log does not substitute.

Is this overkill for a small fintech?

No. The cost of building the schema right at the start of an agentic SDLC programme is roughly two engineer-weeks. The cost of bolting it on after a supervisor or audit asks for it is closer to three engineer-months plus a credibility tax. Smaller firms have less margin to absorb the second cost. Build it once, properly, in Phase 0 of the framework.

Build this into your programme

A Phase-Gate Diagnostic produces the audit-trail schema, storage architecture, and access-control model your engineering and compliance teams need to ship in Phase 0.

Book a Fit Call →