Reference Framework · 12 min read

Agent Studio: Build vs Buy for Regulated Firms (2026)

Compare 7 agent platforms on 12 criteria — LangGraph, Bedrock, Copilot Studio, and more. The scoring matrix for FCA-regulated UK firms. Read now.

Published 30 April 2026 · By Sunny Patel, Founder, Agentic AI Associates

TL;DR

If you are a £bn-AUM, FCA-regulated firm asking whether to build or buy your agent studio, the honest answer is hybrid: buy the agent platform, build the control plane. The platform decision (LangGraph, Bedrock, Copilot Studio, Vertex, Writer, Glean) is a make-or-buy of orchestration mechanics. The control-plane decision — policy, identity, audit, model-risk attestation, SM&CR approval — is non-negotiably yours, because the FCA holds your Senior Manager accountable, not the vendor.

Below is the 12-criterion matrix we use in Phase-Gate Diagnostic engagements, the decision tree we hand to boards, and the three patterns we have seen succeed in regulated UK firms in 2025–2026.

Why "build vs buy" is the wrong question

In the conversations we have with CTOs and Heads of Engineering at FCA-supervised firms — wealth managers, payments platforms, insurance carriers, savings apps — the build-vs-buy framing arrives pre-loaded with a mistake: it assumes the agent platform is the strategic asset. It is not. The strategic asset is the control plane: the layer that governs which agents can access which data under which policy, with what audit, attested by which Senior Manager.

We have yet to see a regulated firm where the agent platform itself is the differentiator. We have seen many where the control plane is — because the control plane is what your auditor, your compliance team, your CRO, and the FCA all want a single place to inspect. Vendors do not give you that single place. You have to design it.

Reframed properly, the question becomes:

Which agent platform minimises time-to-production and ongoing engineering load while satisfying our regulatory perimeter?
Which control-plane components must we own outright, regardless of platform choice?
How do we sequence the two so that platform decisions do not foreclose control-plane options later?

The matrix below answers the first question. The decision tree at the end answers the second and third.

The 12-criterion scoring matrix

Seven platforms, twelve criteria, scored as we have observed them perform in 2025–2026 engagements with UK regulated firms. Where a platform's behaviour depends on tier or configuration, we have noted the realistic enterprise default rather than the marketing claim.

#	Criterion	LangGraph	Bedrock Agents	Copilot Studio	Vertex AI Agent Builder	Writer Palmyra	Glean Workflow	Custom build
1	Audit-trail completeness Critical for FCA	Strong — full state graph capturable	Strong — CloudTrail + Bedrock logs	Partial — Purview integration but coarse	Strong — Cloud Logging + Trace	Partial — workflow logs, no token-level trace	Strong — full step + retrieval log	As-built — depends on instrumentation
2	Data residency (UK/EU) Critical for FCA	Whatever you deploy	eu-west-2 + UK Sovereign region	UK data boundary (M365)	europe-west2 (London)	EU only on enterprise tier	Dedicated EU tenant	Whatever you deploy
3	Model portability High	Any LLM via adapter	Bedrock catalogue (~50 models, no Anthropic in some regions)	GPT-4 family + Phi (locked)	Gemini + Anthropic + Llama on Vertex	Palmyra + select frontier (locked stack)	OpenAI/Anthropic/Gemini swap	Anything — you own the call
4	On-prem / air-gap option High for some regulated	Yes — open source	No	No	No	No (private dedicated only)	No	Yes
5	Identity + RBAC integration High	Build it (OIDC libs available)	IAM-native	Entra ID native — strongest in class	IAM-native	SAML/SCIM standard	SAML/SCIM, Slack/Drive/Confluence ACLs respected	Whatever you build
6	SM&CR mapping support Critical for FCA	Build it	Manual via tags + IAM	Manual via Purview labels	Manual via labels + IAM	Workflow-level approvals	Workflow-level approvals	You design it
7	Tool/skill ecosystem High	Vast — any Python lib	Bedrock action groups + Lambda	Power Platform (~1,400 connectors)	Vertex extensions + functions	Writer skills marketplace (limited)	Curated enterprise connectors	Whatever you ship
8	Eval + observability High	LangSmith (paid) or roll your own	Bedrock Evaluations (basic) + Studio	Copilot Studio Analytics (basic)	Vertex AI Evaluation	Built-in eval suite (good)	Strong analytics, weak eval	Build with Langfuse/Helicone/Arize
9	Cost predictability Medium	Token + infra (variable)	Token + agent invocation (variable)	Per-user/month (predictable)	Token + agent (variable)	Per-user/month (predictable)	Per-user/month (predictable)	Whatever you build to
10	Time-to-first-agent (regulated context) Medium	6–10 weeks	4–8 weeks	2–4 weeks	4–8 weeks	4–6 weeks	2–6 weeks	12–20 weeks
11	Vendor concentration risk High	Low — open source	High — AWS lock-in	Very high — MS lock-in	High — GCP lock-in	High	High	Low (your code)
12	Best for	Engineering-led firms with platform team	AWS-native shops with regulated workloads	M365/Dynamics shops, internal copilots	GCP-native shops, multi-modal needs	Regulated enterprise, content-heavy	Knowledge-worker productivity layer	Firms with strategic AI moat ambitions

A platform's score on any single criterion is rarely the deciding factor. The pattern of which criteria matter most for your firm is. A wealth manager with £5bn AUM and a UK-only client base will weight criteria 1, 2, 6, and 11 (audit, residency, SM&CR, concentration) above criteria 7 and 9 (ecosystem, cost predictability). A neobank with a US expansion plan inverts that.

Three patterns that work

Pattern A — Buy platform, build control plane (the £bn fintech default)

The pattern we have seen succeed most often in regulated UK firms with £500m+ AUM or £100m+ revenue. The firm picks a vendor agent platform — typically Bedrock Agents (if AWS-native), Copilot Studio (if M365-native), or LangGraph self-hosted (if engineering-led with a platform team) — and builds an in-house control plane that sits in front of every agent invocation.

The control plane is the single place where:

An agent identity is resolved to a Senior Manager-approved scope
The data classification of every retrieval is checked against the agent's permitted classes
The decision-and-tool-call log is written to an immutable audit store with a 7-year retention
Budget and rate limits are enforced per-agent, per-team, and per-purpose
Model risk attestation is bound to model + version + grounding source

Time to first agent in production: 8–12 weeks. Time to scaled deployment across 5+ business areas: 6–9 months. Vendor lock-in is real but bounded — the control plane is portable, so a platform swap is contained to the agent-orchestration layer.

Pattern B — Vendor stack with shadow control plane (the time-to-market path)

Right pattern for SMEs in regulated industries (sub-£100m revenue) where the cost of a custom control plane is disproportionate, and time-to-market is the binding constraint.

Choose a single vendor stack that bundles agent platform, identity, retrieval, and observability — typically Glean, Writer, or Microsoft (Copilot Studio + Purview + Sentinel). Layer a thin "shadow control plane" on top consisting of:

A board-signed agent register (spreadsheet, then a simple internal app) listing every deployed agent with owner, scope, model, data classes, and approval date
A weekly export of the vendor's audit log into your own immutable store
A monthly model-risk review meeting feeding into the AI Risk Register

You accept platform lock-in in exchange for buying back six months of engineering time. Defensible to the FCA if and only if the agent register is operationally live and the Senior Manager attesting can demonstrate it.

Pattern C — Build everything (rare, justified)

Justified in three cases: (a) AI delivery is the product (not a productivity layer), (b) sovereign or air-gapped deployment requirements that no vendor satisfies, or (c) scale at which vendor fees exceed £400k/year and platform-team economics flip.

Time to first production agent: 16–24 weeks. Total cost of ownership for the first 18 months is rarely below £600k including platform team, eval infrastructure, and observability. Worth it when AI is a moat. Almost never worth it when AI is an enabler.

The decision tree

Five questions, asked in order. Stop at the first No.

Is agentic AI a strategic moat for the firm, or an enabler? If moat, jump to question 5. If enabler, continue.
Is the firm AWS-, M365-, or GCP-native? Default to the platform's native agent layer (Bedrock, Copilot Studio, Vertex). Vendor concentration risk is already incurred elsewhere; agents shouldn't compound it.
Do we have a platform engineering team of 2+ senior engineers with capacity? If yes, LangGraph self-hosted is in play and gives the cleanest control-plane integration. If no, stick with native vendor.
Are our data residency or air-gap requirements impossible to meet on the chosen vendor's UK/EU regions? If yes, escalate to Pattern C. If no, lock the platform and design the control plane.
Have we sized the control-plane build at 2–4 engineers for 6–9 months, with a designated SMF owner? If yes, proceed. If no, the platform decision is premature — fix the control-plane staffing first.

The most common failure mode we see is firms who ship a vendor-platform pilot in 6 weeks, declare success, and then spend the next year discovering they have no control plane and no path to SM&CR-attestable scale.

What we do with this in a Phase-Gate Diagnostic

In a two-week Phase-Gate Diagnostic engagement (£6,500), the matrix above is one of three deliverables. We run it against your specific regulatory perimeter, AWS/Azure/GCP commitments, and existing engineering capacity. The other two deliverables are a control-plane reference architecture and a 12-month phased delivery plan with FCA-control checkpoints.

The output is a written assessment your board can act on — not a PowerPoint deck and not an open-ended consulting engagement. We have run this for firms making first agent decisions, and for firms recovering from a stalled pilot. In both cases the same matrix applies.

Frequently asked questions

When does building a custom agent platform make sense?

When agentic AI is a strategic moat (not a cost line), when no vendor stack covers your regulatory perimeter (rare in UK fintech but possible in defence, pharma R&D, or sovereign cloud mandates), or when you operate at a scale where platform-fee economics break against custom infrastructure (typically £30k+/month in vendor fees).

Why does LangGraph score well despite being open source?

For regulated firms, the open-source license is a feature, not a liability. It means full visibility of the orchestration logic for audit, no vendor data residency surprises, and zero lock-in if you need to migrate. The trade-off is that you carry the platform-team cost — typically two senior engineers — that you would otherwise pay for as part of a vendor SaaS fee.

Does Copilot Studio satisfy FCA AI governance requirements?

Partially. It covers identity, ACLs, and basic logging well via Entra and Purview. Where it falls short for FCA-supervised firms is fine-grained audit at the token/decision level, model risk attestation, and SM&CR-level approval workflows for autonomous actions. These need to be layered on top of Copilot Studio with a control plane the FCA-supervised entity owns.

How do you score vendor concentration risk in practice?

We map every workload to its platform, model, retrieval store, and identity provider, then ask: if the platform vendor changed pricing 5×, deprecated an API, or had a 30-day outage, what is the migration cost in calendar weeks and engineering effort? Anything over 12 weeks of migration is a Critical concentration risk that needs a board-level mitigation.

What is a "control plane" in this context?

A control plane is the unified policy, identity, audit, and budget layer that sits between your agents and the data, knowledge bases, tools, and models they access. It enforces who can do what, logs every decision, and gives the FCA SMF holder a single place to attest to. In our taxonomy it is distinct from the agent platform (LangGraph, Bedrock, etc.) — the platform runs agents; the control plane governs them.

Run this against your firm

A Phase-Gate Diagnostic is two weeks, £6,500, ends with a written architecture and operating-model assessment, and pays for itself the first time it stops a wrong platform decision.

Book a Fit Call →