Reference Architecture · 15 min read

Agentic SDLC for Regulated Engineering Teams

A reference architecture for AI-augmented software delivery in FCA-supervised firms — phased framework, audit-trail design, SM&CR control mapping, and the failure modes we see most often in 2026.

Published 30 April 2026 · By Sunny Patel, Founder, Agentic AI Associates

The thesis in one paragraph

Agentic SDLC fails in regulated firms not because the agents are bad, but because the firms try to retrofit autonomy onto a pipeline never designed for non-deterministic actors. The fix is not better prompts. It is a phased adoption that earns audit trust before it grants action latitude — and an audit-trail schema designed from the first commit, not bolted on after the FCA shows up.

Why this is on every regulated CTO's desk in 2026

Three forces are converging. First, frontier models in 2025–2026 became measurably useful at code generation, test design, and dependency review for production codebases over 100k LOC. Second, frameworks like LangGraph, Bedrock Agents, and Claude Code matured to the point that multi-step agents can hold state across hours-long delivery tasks. Third, the FCA's AI Approach and the broader Consumer Duty regime crystallised expectations: regulated firms are accountable for AI outcomes, model risk, and consumer-facing decisions, with named Senior Manager Function holders attesting.

The result: every Head of Engineering at an FCA-supervised firm has been told, by their CTO or COO, to "figure out the AI in the pipeline" — without compromising any of the controls that make their pipeline supervised in the first place.

This page is the framework we hand to those Heads of Engineering. It is not a vendor pitch and it is not a maturity model. It is a phased delivery and audit architecture you can implement, and the failure modes we have observed often enough to name.

The five-phase framework

The phases are sequential. Skipping is the most common failure mode we see; the second is collapsing all five into a single pilot and "evaluating" it.

Phase Name Duration Output
Phase 0 Boundary mapping 1–2 weeks Agentic Capability Boundary document — what the agent layer is allowed to touch, separated from regulatory/non-regulatory data domains.
Phase 1 Read-only augmentation 4–6 weeks Agents that read, summarise, and propose — never act. PR review, test generation suggestions, design-doc critique. Audit trail is observation-grade.
Phase 2 Suggested actions, human approval 6–8 weeks Agents that draft commits, IaC changes, and tickets. Every action requires named human approval. Audit trail is decision-grade.
Phase 3 Bounded autonomous action 8–12 weeks Agents that act within a tightly scoped envelope (specific repos, pre-approved change types, value caps). Audit trail is regulatory-grade.
Phase 4 Multi-agent delivery ongoing Coordinated agents handling spec → code → test → deploy → monitor across approved envelopes. Each agent has a named SMF-attested owner.

Phase 0 — Boundary mapping

The phase most firms skip and most regret skipping. Before any agent touches the codebase, you draft a one-page Agentic Capability Boundary: a document that names every system the agent layer will read or act on, classifies the data therein under your existing data-classification schema, and identifies which Senior Manager Function holder owns each system.

The Boundary is a refusal as much as a permission. It is the document you point to when someone asks "could the agent get to the customer KYC store?" — the answer is "no, and here is the SMF24-signed page that says so."

Phase 0 takes one to two weeks and is led by your Head of Engineering with input from the CISO, the data-protection lead, and whichever Senior Manager will own the agentic delivery scope (usually SMF24 in firms where it exists, otherwise SMF4 or SMF18).

Phase 1 — Read-only augmentation

The agents in Phase 1 do four jobs and only four: summarise (existing code, PRs, design docs), review (proposing changes for human consideration), generate-for-review (test cases, IaC suggestions, dependency notes), and retrieve-and-synthesise (linking related issues, prior incidents, internal documentation).

They do not commit, do not push, do not invoke deployment, do not call paid third-party APIs without a per-action human OK. The audit trail captures every prompt, retrieval, and output — but the audit grade is "observation": you are recording what the agent saw and proposed, not what it changed.

The win in Phase 1 is calibration. Your engineers learn what the agents are good and bad at, your CISO and DPO see the audit data flow in real conditions, and your Senior Manager develops a defensible posture before authority increases.

Most firms try to skip directly to Phase 2 or 3. We have not yet seen one succeed. Phase 1 is six weeks of trust-building you cannot buy back later.

Phase 2 — Suggested actions, human approval

Phase 2 introduces action proposals: the agent drafts a commit, an IaC change, a Jira ticket, a database migration plan. Every proposal is presented to a named human reviewer who approves, modifies, or rejects. The reviewer's identity is logged alongside the agent's proposal.

The audit grade upgrades to "decision": you are recording the joint decision of agent and human, with both contributions distinguishable and both retrievable. Crucially, you can answer the question "who made this change?" with both names attached, and you can answer "why?" with the agent's articulated rationale.

This is the first phase where Consumer Duty outcome 4 — consumer understanding applies, even for internal engineering tools, because it sets the precedent for how the firm will handle agent-influenced decisions when they later reach customer surfaces.

Phase 3 — Bounded autonomous action

The agent acts within a tightly drawn envelope. The envelope is defined by: specific repositories, pre-approved change types (e.g. dependency upgrades within a semver range, test-coverage-only commits, IaC drift remediation), value caps (e.g. a deployment that consumes under a defined £-cap of cloud spend), and time windows (e.g. business hours, post-deploy soak periods).

The audit grade reaches "regulatory": each autonomous action is logged with a complete decision trail, attestation by the SMF owner, and a sample-review schedule (typically 5–10% of actions reviewed monthly by the Model Risk Committee).

Phase 3 needs the control plane built — not bolted-on. By this stage you have it because Phases 1 and 2 forced you to build the audit pipeline first.

Phase 4 — Multi-agent delivery

Phase 4 is the steady state, not a finish line. Multiple agents — each with a single responsibility, a permitted scope, an SMF owner, and a budget — coordinate to deliver work end-to-end within their combined envelopes. A spec agent writes the design from a Linear ticket; a coding agent implements; a test agent expands coverage; a review agent runs static and dynamic checks; a deploy agent pushes through pre-approved environments.

None of the individual agents is autonomous in the sense the technology press uses the word. The system appears autonomous because the human approval gates are placed at the joints, not at every action.

The audit-trail schema (the part nobody else writes about)

Every regulated agentic SDLC implementation we have run starts and ends with the audit-trail schema. It is the single most important architectural decision and the one most often deferred. Below is the minimum schema we install in Phase 0 and never have to redesign in later phases.

Field Purpose
event_id Immutable UUID for the action
timestamp_utc ISO-8601 with millisecond precision
agent_id + version_hash Pin the exact agent definition responsible
model + grounding_source Reproduce the inference inputs
principal Human or service identity that initiated the run
smf_owner Senior Manager Function holder accountable for this agent (often SMF24 — Chief Operations)
data_classes_touched Set notation, e.g. {public, internal, customer_pii} — checked against agent permitted_classes
tools_invoked Ordered list of tool calls with their input/output hashes
decision_rationale The agent's articulated reason — required by Consumer Duty outcome 4 (consumer understanding)
outcome success | refused_by_policy | escalated_to_human | failed
cost_units Tokens, tool fees, infrastructure — for budget enforcement and chargeback
review_status unreviewed | sampled_clean | sampled_flagged — feeds the monthly Model Risk Committee

The schema is written to an append-only, hash-chained store with a 7-year retention. We typically use AWS S3 with Object Lock (Compliance mode) or Azure immutable storage. The hash chain protects against tampering and gives you a forensic posture if a regulator ever asks.

A common shortcut — using your APM (Datadog, New Relic) as the audit store — fails the FCA's "tamper-evident" expectations and re-creates the work later under regulatory pressure. Build it once, properly, in Phase 0.

SM&CR mapping

For every agent you put into production, name the SMF holder who owns it. In our 2026 engagements with UK firms, the typical mapping is:

  • SMF24 (Chief Operations) — owns agents that touch operational processes, including engineering automation
  • SMF4 (Chief Risk) — owns model-risk attestation across all agents and the AI Risk Register
  • SMF18 (Other Overall Responsibility) — fallback for firms without SMF24, often the CTO carries this
  • SMF16 (Compliance Oversight) — owns the controls that ensure agent outputs meet regulatory perimeter (e.g. financial promotions, advice boundaries)

The mapping is not academic. Each agent's record in the agent register cites its SMF owner, and the SMF owner signs an attestation each quarter that the agent operated within its boundary. If the agent did not, the record shows what it did instead and what was done about it.

Failure modes we have catalogued

  • Phase-skipping pilots. Firm tries Phase 3 in week one, gets a working demo, declares victory, has no audit trail. Recovery: full restart at Phase 0, three months lost.
  • "Sandbox" overreach. Agent runs in a sandbox that turns out to share an IAM identity with the production data plane. Recovery: incident, one-time exemption with the FCA, network re-segmentation.
  • Audit trail in the APM. Datadog or New Relic used as the source of truth. Tampering not preventable. Recovery: rebuild audit pipeline at month six, double work.
  • Anonymous model use. Agent can choose any model on any provider; no version-pinning. SMF holder cannot attest. Recovery: pin model + version per agent, lose flexibility, regret nothing.
  • No human-in-the-loop budget. Phase 2 designed but reviewers' time was never carved out; queue grows; team gives up. Recovery: 0.5 FTE per active agent for the first six months.
  • Tool-call sprawl. Agent given a generic "execute_shell" tool; permits anything. Recovery: typed, narrow tools per agent, list approved by SMF.
  • The "we'll write the policy later" failure. The agent ships before the policy. Recovery: policy + register written before any code change, even if it slows Phase 1 by a week.

What this costs in time and money

For a regulated firm with 30–80 engineers and a working CI/CD pipeline, our experience is:

  • Phase 0 + 1: 6–8 weeks elapsed, 1.5 FTE engineering effort, plus 0.2 FTE risk + compliance. Direct vendor cost: minimal (model API spend under £2k/month).
  • Phase 2: +6–8 weeks, 2.0 FTE, control-plane build begins in parallel.
  • Phase 3: +8–12 weeks, 3.0 FTE peak, control plane reaches regulatory grade.
  • Phase 4: ongoing, ~2.0 FTE platform team plus per-agent ownership distributed across product teams.

Total elapsed: 6–9 months from board sign-off to a multi-agent delivery system in production. Total cost of build: typically £400k–£700k including platform, control-plane, and operational team carve-out for the first 18 months. Returns we have measured for firms in Phase 4 sit between 18% and 35% reduction in lead time per change, with no measurable regression in quality or incident rates when the framework is followed.

Apply this to your firm

A Phase-Gate Diagnostic runs the framework above against your specific regulatory perimeter, engineering capacity, and existing controls. Two weeks, £6,500, written architecture and operating-model assessment delivered to your board.

Book a Fit Call →