Reference Framework · 12 min read
Agent Studio: Build vs Buy for Regulated Enterprises (2026)
A 12-criterion scoring matrix across seven agent platforms — LangGraph, Bedrock Agents, Copilot Studio, Vertex AI Agent Builder, Writer Palmyra, Glean, and custom — judged for FCA-regulated and £bn-AUM organisations.
Published 30 April 2026 · By Sunny Patel, Founder, Agentic AI Associates
TL;DR
If you are a £bn-AUM, FCA-regulated firm asking whether to build or buy your agent studio, the honest answer is hybrid: buy the agent platform, build the control plane. The platform decision (LangGraph, Bedrock, Copilot Studio, Vertex, Writer, Glean) is a make-or-buy of orchestration mechanics. The control-plane decision — policy, identity, audit, model-risk attestation, SM&CR approval — is non-negotiably yours, because the FCA holds your Senior Manager accountable, not the vendor.
Below is the 12-criterion matrix we use in Phase-Gate Diagnostic engagements, the decision tree we hand to boards, and the three patterns we have seen succeed in regulated UK firms in 2025–2026.
Why "build vs buy" is the wrong question
In the conversations we have with CTOs and Heads of Engineering at FCA-supervised firms — wealth managers, payments platforms, insurance carriers, savings apps — the build-vs-buy framing arrives pre-loaded with a mistake: it assumes the agent platform is the strategic asset. It is not. The strategic asset is the control plane: the layer that governs which agents can access which data under which policy, with what audit, attested by which Senior Manager.
We have yet to see a regulated firm where the agent platform itself is the differentiator. We have seen many where the control plane is — because the control plane is what your auditor, your compliance team, your CRO, and the FCA all want a single place to inspect. Vendors do not give you that single place. You have to design it.
Reframed properly, the question becomes:
- Which agent platform minimises time-to-production and ongoing engineering load while satisfying our regulatory perimeter?
- Which control-plane components must we own outright, regardless of platform choice?
- How do we sequence the two so that platform decisions do not foreclose control-plane options later?
The matrix below answers the first question. The decision tree at the end answers the second and third.
The 12-criterion scoring matrix
Seven platforms, twelve criteria, scored as we have observed them perform in 2025–2026 engagements with UK regulated firms. Where a platform's behaviour depends on tier or configuration, we have noted the realistic enterprise default rather than the marketing claim.
| # | Criterion | LangGraph | Bedrock Agents | Copilot Studio | Vertex AI Agent Builder | Writer Palmyra | Glean Workflow | Custom build |
|---|---|---|---|---|---|---|---|---|
| 1 | Audit-trail completeness Critical for FCA | Strong — full state graph capturable | Strong — CloudTrail + Bedrock logs | Partial — Purview integration but coarse | Strong — Cloud Logging + Trace | Partial — workflow logs, no token-level trace | Strong — full step + retrieval log | As-built — depends on instrumentation |
| 2 | Data residency (UK/EU) Critical for FCA | Whatever you deploy | eu-west-2 + UK Sovereign region | UK data boundary (M365) | europe-west2 (London) | EU only on enterprise tier | Dedicated EU tenant | Whatever you deploy |
| 3 | Model portability High | Any LLM via adapter | Bedrock catalogue (~50 models, no Anthropic in some regions) | GPT-4 family + Phi (locked) | Gemini + Anthropic + Llama on Vertex | Palmyra + select frontier (locked stack) | OpenAI/Anthropic/Gemini swap | Anything — you own the call |
| 4 | On-prem / air-gap option High for some regulated | Yes — open source | No | No | No | No (private dedicated only) | No | Yes |
| 5 | Identity + RBAC integration High | Build it (OIDC libs available) | IAM-native | Entra ID native — strongest in class | IAM-native | SAML/SCIM standard | SAML/SCIM, Slack/Drive/Confluence ACLs respected | Whatever you build |
| 6 | SM&CR mapping support Critical for FCA | Build it | Manual via tags + IAM | Manual via Purview labels | Manual via labels + IAM | Workflow-level approvals | Workflow-level approvals | You design it |
| 7 | Tool/skill ecosystem High | Vast — any Python lib | Bedrock action groups + Lambda | Power Platform (~1,400 connectors) | Vertex extensions + functions | Writer skills marketplace (limited) | Curated enterprise connectors | Whatever you ship |
| 8 | Eval + observability High | LangSmith (paid) or roll your own | Bedrock Evaluations (basic) + Studio | Copilot Studio Analytics (basic) | Vertex AI Evaluation | Built-in eval suite (good) | Strong analytics, weak eval | Build with Langfuse/Helicone/Arize |
| 9 | Cost predictability Medium | Token + infra (variable) | Token + agent invocation (variable) | Per-user/month (predictable) | Token + agent (variable) | Per-user/month (predictable) | Per-user/month (predictable) | Whatever you build to |
| 10 | Time-to-first-agent (regulated context) Medium | 6–10 weeks | 4–8 weeks | 2–4 weeks | 4–8 weeks | 4–6 weeks | 2–6 weeks | 12–20 weeks |
| 11 | Vendor concentration risk High | Low — open source | High — AWS lock-in | Very high — MS lock-in | High — GCP lock-in | High | High | Low (your code) |
| 12 | Best for | Engineering-led firms with platform team | AWS-native shops with regulated workloads | M365/Dynamics shops, internal copilots | GCP-native shops, multi-modal needs | Regulated enterprise, content-heavy | Knowledge-worker productivity layer | Firms with strategic AI moat ambitions |
A platform's score on any single criterion is rarely the deciding factor. The pattern of which criteria matter most for your firm is. A wealth manager with £5bn AUM and a UK-only client base will weight criteria 1, 2, 6, and 11 (audit, residency, SM&CR, concentration) above criteria 7 and 9 (ecosystem, cost predictability). A neobank with a US expansion plan inverts that.
Three patterns that work
Pattern A — Buy platform, build control plane (the £bn fintech default)
The pattern we have seen succeed most often in regulated UK firms with £500m+ AUM or £100m+ revenue. The firm picks a vendor agent platform — typically Bedrock Agents (if AWS-native), Copilot Studio (if M365-native), or LangGraph self-hosted (if engineering-led with a platform team) — and builds an in-house control plane that sits in front of every agent invocation.
The control plane is the single place where:
- An agent identity is resolved to a Senior Manager-approved scope
- The data classification of every retrieval is checked against the agent's permitted classes
- The decision-and-tool-call log is written to an immutable audit store with a 7-year retention
- Budget and rate limits are enforced per-agent, per-team, and per-purpose
- Model risk attestation is bound to model + version + grounding source
Time to first agent in production: 8–12 weeks. Time to scaled deployment across 5+ business areas: 6–9 months. Vendor lock-in is real but bounded — the control plane is portable, so a platform swap is contained to the agent-orchestration layer.
Pattern B — Vendor stack with shadow control plane (the time-to-market path)
Right pattern for SMEs in regulated industries (sub-£100m revenue) where the cost of a custom control plane is disproportionate, and time-to-market is the binding constraint.
Choose a single vendor stack that bundles agent platform, identity, retrieval, and observability — typically Glean, Writer, or Microsoft (Copilot Studio + Purview + Sentinel). Layer a thin "shadow control plane" on top consisting of:
- A board-signed agent register (spreadsheet, then a simple internal app) listing every deployed agent with owner, scope, model, data classes, and approval date
- A weekly export of the vendor's audit log into your own immutable store
- A monthly model-risk review meeting feeding into the AI Risk Register
You accept platform lock-in in exchange for buying back six months of engineering time. Defensible to the FCA if and only if the agent register is operationally live and the Senior Manager attesting can demonstrate it.
Pattern C — Build everything (rare, justified)
Justified in three cases: (a) AI delivery is the product (not a productivity layer), (b) sovereign or air-gapped deployment requirements that no vendor satisfies, or (c) scale at which vendor fees exceed £400k/year and platform-team economics flip.
Time to first production agent: 16–24 weeks. Total cost of ownership for the first 18 months is rarely below £600k including platform team, eval infrastructure, and observability. Worth it when AI is a moat. Almost never worth it when AI is an enabler.
The decision tree
Five questions, asked in order. Stop at the first No.
- Is agentic AI a strategic moat for the firm, or an enabler? If moat, jump to question 5. If enabler, continue.
- Is the firm AWS-, M365-, or GCP-native? Default to the platform's native agent layer (Bedrock, Copilot Studio, Vertex). Vendor concentration risk is already incurred elsewhere; agents shouldn't compound it.
- Do we have a platform engineering team of 2+ senior engineers with capacity? If yes, LangGraph self-hosted is in play and gives the cleanest control-plane integration. If no, stick with native vendor.
- Are our data residency or air-gap requirements impossible to meet on the chosen vendor's UK/EU regions? If yes, escalate to Pattern C. If no, lock the platform and design the control plane.
- Have we sized the control-plane build at 2–4 engineers for 6–9 months, with a designated SMF owner? If yes, proceed. If no, the platform decision is premature — fix the control-plane staffing first.
The most common failure mode we see is firms who ship a vendor-platform pilot in 6 weeks, declare success, and then spend the next year discovering they have no control plane and no path to SM&CR-attestable scale.
What we do with this in a Phase-Gate Diagnostic
In a two-week Phase-Gate Diagnostic engagement (£6,500), the matrix above is one of three deliverables. We run it against your specific regulatory perimeter, AWS/Azure/GCP commitments, and existing engineering capacity. The other two deliverables are a control-plane reference architecture and a 12-month phased delivery plan with FCA-control checkpoints.
The output is a written assessment your board can act on — not a PowerPoint deck and not an open-ended consulting engagement. We have run this for firms making first agent decisions, and for firms recovering from a stalled pilot. In both cases the same matrix applies.
Frequently asked questions
When does building a custom agent platform make sense?
When agentic AI is a strategic moat (not a cost line), when no vendor stack covers your regulatory perimeter (rare in UK fintech but possible in defence, pharma R&D, or sovereign cloud mandates), or when you operate at a scale where platform-fee economics break against custom infrastructure (typically £30k+/month in vendor fees).
Why does LangGraph score well despite being open source?
For regulated firms, the open-source license is a feature, not a liability. It means full visibility of the orchestration logic for audit, no vendor data residency surprises, and zero lock-in if you need to migrate. The trade-off is that you carry the platform-team cost — typically two senior engineers — that you would otherwise pay for as part of a vendor SaaS fee.
Does Copilot Studio satisfy FCA AI governance requirements?
Partially. It covers identity, ACLs, and basic logging well via Entra and Purview. Where it falls short for FCA-supervised firms is fine-grained audit at the token/decision level, model risk attestation, and SM&CR-level approval workflows for autonomous actions. These need to be layered on top of Copilot Studio with a control plane the FCA-supervised entity owns.
How do you score vendor concentration risk in practice?
We map every workload to its platform, model, retrieval store, and identity provider, then ask: if the platform vendor changed pricing 5×, deprecated an API, or had a 30-day outage, what is the migration cost in calendar weeks and engineering effort? Anything over 12 weeks of migration is a Critical concentration risk that needs a board-level mitigation.
What is a "control plane" in this context?
A control plane is the unified policy, identity, audit, and budget layer that sits between your agents and the data, knowledge bases, tools, and models they access. It enforces who can do what, logs every decision, and gives the FCA SMF holder a single place to attest to. In our taxonomy it is distinct from the agent platform (LangGraph, Bedrock, etc.) — the platform runs agents; the control plane governs them.
Run this against your firm
A Phase-Gate Diagnostic is two weeks, £6,500, ends with a written architecture and operating-model assessment, and pays for itself the first time it stops a wrong platform decision.
Book a Fit Call →