Strategy · Frameworks · Operating Model

AI Transformation
Playbook

How I lead AI adoption in regulated environments — the rubric I use to triage initiatives, the matrix I score vendors with, the maturity model for scaling an internal AI function, and the four real case studies the playbook came out of.

Initiative rubric Vendor matrix Maturity model 4 real case studies
At a glance 30-second read

What it is

The operating playbook I bring to an AI transformation role. Concrete frameworks — not principles. A 5-axis rubric for evaluating initiatives, a 7-criterion vendor matrix, a 5-stage AI-function maturity model, and the operational standards that make rollouts repeatable. Backed by four transformation projects from a 15-year career in regulated environments.

Maps to

AI Transformation Lead · Director / Head of AI · AI Strategy & Operations · Enterprise AI Architect

What this proves

  • I can run an AI rollout end-to-end, not just build the tech
  • I evaluate AI work the same way I evaluate any infra project — risk, ROI, reversibility, governance fit, time-to-value
  • I've shipped transformation programs at Microsoft, Wells Fargo, Starbucks, Veradigm
  • I know what governance-as-an-enabler actually looks like in practice
  • I'm PMP-certified — I run the program as well as the build
PMP-Certified 15 yrs · Regulated Microsoft Wells Fargo Starbucks Veradigm
01 — Initiative Rubric five axes · 1–5 each · 25 max

How I triage an AI proposal.

Every AI initiative I see gets scored 1–5 on five axes before it gets a yes. A 20+ is a build. A 12–19 is a pilot with a kill criterion. Below 12 is a no — even if the demo is impressive.

01
Business Pain
Is this a named problem with a known owner and a measurable cost today? If a VP can't name what breaks if we don't build it, it isn't a yes.
02
Time to Value
Can we get a measurable win in <90 days? Transformation programs lose air after the first quarter without a visible deliverable.
03
Reversibility
If this fails, can we roll it back without breaking a downstream system or losing data? AI initiatives that touch identity, money, or PHI need a documented exit path before they ship.
04
Governance Fit
Does this clear identity, security, privacy, and compliance review at the architecture stage — or are we hoping nobody notices? "Hoping" is a no.
05
Scale Path
Does success here unlock the next three things, or is this a one-off? A 20-score initiative usually becomes a platform.

Concrete example: a customer-support agent scored a 22 (4·5·4·4·5) — clear pain, fast pilot, full reversibility via feature flag, governance-cleared because no PHI, and the same architecture would lift sales and ops next. A "let's put AI on the dashboard" idea scored 8 (2·2·2·1·1) and got declined; the team built it anyway, and nobody used it. The rubric was right.

02 — Vendor Evaluation Matrix seven criteria · weighted

How I score an AI vendor.

For every shortlisted platform — agent framework, NHI tooling, copilot, model API, automation runtime — the same matrix. I weight criteria differently by domain (identity-sensitive weights compliance heavier; productivity tools weight DX heavier), but the criteria are constant.

Criterion What I'm actually asking Typical weight
Identity & auth model How does it authenticate, how are tokens scoped and rotated, does it produce non-human identities and where do they go? 15–25%
Compliance posture SOC 2 Type II, HIPAA BAA, data residency, DPA terms, sub-processor list, breach notification SLA. 15–20%
Integration surface What APIs, SDKs, MCP/ADK support, identity provider integrations, SCIM, SAML, OIDC. How many days to a working pilot. 15%
Observability Logs, traces, metrics, eval hooks, prompt + response capture (with PII handling). If I can't see what it did, I can't govern it. 10–15%
Total cost of ownership Pricing model, token economics, autoscaling cost behavior, the cost of integration we own forever. 10–15%
Roadmap & vendor risk Funding stage, customer concentration, lock-in (proprietary vs. open formats), exit path, acquisition exposure. 10%
Developer experience Docs, examples, support quality, time-to-first-call, debuggability. A bad DX adds permanent integration tax. 5–10%

Scoring is 1–5 per criterion, weighted, normalized to a 100-point scale. Anything below 70 is a no; 70–85 is a pilot under a defined contract; 85+ is a yes. The matrix prevents the most common procurement failure — picking the most polished demo instead of the most operable platform.

03 — Case Studies four transformation programs, four employers

Where the playbook came from.

Real transformation work from a 15-year career in regulated environments. Different domains, same operating pattern — score the work, partner across functions, ship the foundation, leave the runbooks.

Microsoft
Non-Human Identity Governance
Cloud Solutions Architect

Built Python automation for Entra ID Governance and Non-Human Identity at scale — before "NHI governance" was a named discipline in the industry. Reduced ungoverned service-account sprawl across multiple business units; integrated discovery and lifecycle ops with the broader IAM platform.

Lesson: the operating model has to ship before the policy. The policy never gets followed otherwise.

Starbucks
Identity Automation at Scale
Senior Security Engineer

PowerShell automation for identity operations across 60+ Active Directory domains. Bulk audit, lifecycle, and remediation — replaced thousands of hours of manual ticket work per quarter with a script tree the security team could extend without me.

Lesson: scale doesn't come from headcount; it comes from leaving the team something they can run when you leave.

Wells Fargo
Payments Infrastructure
Engineer · Payment Systems

Engineering on payment infrastructure for SWIFT, CHIPS, and Fedwire. The kind of system where a wrong move is a regulatory event, not a P1. Learned what "reversibility" actually means when the rollback path matters more than the build.

Lesson: the boring parts — change windows, kill criteria, the rollback rehearsal — are the parts that make the rest possible.

Veradigm
HIPAA-Regulated IAM
Identity Platform Engineer

Designed and operated HIPAA-regulated IAM for 16,000+ users across a healthcare-data environment. PHI guardrails, access reviews, joiner-mover-leaver automation, audit-ready trails. The compliance model wasn't a checkbox; it was the architecture.

Lesson: governance done right speeds the team up. Done wrong, it's the reason nobody ships.

04 — AI Function Maturity five stages

Scaling an internal AI function.

Most organizations are at stage 1 or 2 when they hire for AI Transformation Lead. The job is to move them up the ladder — not skip steps. I size the ambition to the stage; over-reaching produces theater, not transformation.

Stage 1
Ad-hoc
Teams using AI tools individually, no standards, no review. Some lift, lots of risk. The starting line.
Stage 2
Sanctioned
Approved tool list, basic usage policy, named owner per vendor. Risk goes down; capability gain is incremental.
Stage 3
Governed
Initiative rubric in use, vendor matrix in use, NHI governance for agent credentials, eval harnesses on AI features that ship. Where the playbook lives.
Stage 4
Platform
Internal AI platform with shared tools — agent runtime, prompt registry, eval pipeline, vector store, governance plane. Business units ship on top of it.
Stage 5
Compounding
Every new initiative inherits the platform. Time to first value is days, not quarters. The AI function pays for itself measurably.

I aim for stage 2 → 3 in the first six months, stage 3 → 4 in the first twelve to eighteen. Stage 5 is two-plus years of compounded work; promising it in year one is how transformation programs fail their first review.

05 — Operational Standards the artifacts I bring

What ships with every AI initiative.

Standards are the difference between a portfolio of demos and a function that compounds. I make these mandatory for any AI feature that touches production — non-negotiable, scoped to the stage on the maturity ladder.

Threat Model
One-page document per AI feature. Identifies prompt-injection vectors, data exfiltration paths, NHI exposure, abuse cases. Reviewed at architecture stage by Security and Identity. Example: the threat model that ships with Linear MCP Server.
Eval Harness
Test set, scoring function, regression baseline. Every AI feature gets one before launch. The eval is what makes "AI improvements" measurable instead of vibes. Pattern shown in Document Pipeline.
Runbook
Operational playbook for the on-call team — failure modes, manual override, rollback path, contact tree. Written before launch, not after the first incident.
NHI Registry
Every service account, OAuth token, API key, and short-lived credential the system uses — discovered, classified, owned, lifecycle-managed. The pattern lives in AI Agent Governance Console.
Cost Telemetry
Token cost per feature, per request, per business unit. Reviewed monthly. AI feature cost has to be observable from day one or it goes uncontrolled fast.
HITL Gate Map
Which actions require human approval, who approves, what gets logged. Destructive actions, irreversible decisions, high-stakes outputs — none of them ship without an explicit gate.
06 — Cross-Functional Partnership how the function actually works

The operating model.

An AI function isn't a team — it's a set of standing partnerships. Same six relationships in every org, slightly different names.

Executive Leadership
Quarterly · Strategy alignment

Translate the AI roadmap into business outcomes. Defend the rubric. Re-balance investment across stages. Bring the hard "we shouldn't build this" calls upstream early.

Cadence
QBR + ad-hoc on initiative scoring

A short deck, never a long one. The rubric and the matrix are the slides.

Security & Identity
Weekly · Architecture review

Every AI feature goes through identity and security review at the architecture stage, not at launch. Treat them as design partners, not gatekeepers. They see threat models the engineering team would miss.

Cadence
Weekly office hours + threat-model reviews

Standing 30-min weekly. New initiatives book a slot.

Cloud Operations
Ongoing · Platform & cost

Capacity, secrets management, deployment patterns, cost attribution. The AI function inherits the existing platform — doesn't build a shadow one — and adds AI-specific layers (NHI, prompt registry, eval pipeline) on top.

Cadence
Embedded in deploy reviews

No separate AI-ops org — that's how silos start.

Lines of Business
Continuous · Demand intake

Every business unit has named AI requests in flight. Run a structured intake — rubric score on submission, kill-criterion on every pilot, "no" given fast and explained. The fastest way to lose business-unit trust is to leave their requests rotting.

Cadence
Bi-weekly intake review

Backlog visible in shared system. No black box.

Legal & Compliance
Per initiative · Risk classification

DPA review for vendors, data classification per feature, jurisdiction analysis, audit trail standards. Brought in at the matrix-scoring stage, not the procurement stage.

Cadence
Triggered by matrix score

Any vendor scoring >70 enters legal review.

Engineering
Continuous · Build & eval

The AI function provides the platform; engineering teams build on top. My role is to remove the integration tax — standard SDKs, shared eval harness, prompt registry, NHI plumbing — so the team's surface area is the AI feature itself.

Cadence
Daily / continuous

Embedded engineers on big initiatives.

07 — Where The Playbook Is Already Running portfolio proof

The standards aren't aspirational.

Each of these projects implements one of the operational standards above. The playbook isn't a deck — it's the way the projects are actually built.