How to Diagnose a Team's AI Maturity in 30 Minutes

The Holak Scale v2.1e describes 12 levels of AI adoption, from resistance to a custom agentic OS. The question that comes up in every conversation with a manager is: “OK, but how do I know where my team is?”

The answer is simple: in 30 minutes. No surveys, no full-day workshop, no slides. Below is the protocol I use with clients.

What you walk away with

A single A4 page:

the individual’s level
the organisation’s level as the individual sees it
the gap, if any
two “what next” recommendations
one thing to avoid

That’s it. Nothing more is required to plan the first step.

Session format

30 minutes, 1:1, camera optional. Better to do it in a tool that already has a chat panel - so you can ask them to show something live.

Schedule:

Time	What	Goal
0-5 min	Context	Role, industry, tools, time using AI
5-20 min	8 calibration questions	Data for scoring
20-25 min	Live observation	Validation of claims
25-30 min	Feedback	Spoken summary + one written recommendation

Calibration questions

Eight questions, two per phase. Don’t ask “which level are you on?” - the answers are distorted. Ask about behaviour.

Start phase (level 0-1)

“Show me how you usually start a session with AI. What do you actually type?” - looking for: no account vs one-off try vs habit. No account = below 1.
“When did AI last surprise you - good or bad?” - no such moment = level 0-1. A fresh surprise = at least 2.

Intentional use phase (level 2-4)

“Do you have a prompt template you use more than once a month? Show me one.” - none = level 1-2. A “cite 3 sources” template = 3. A template with role, goal, constraints, output format = 3.
“What’s in your custom instructions / model settings?” - “nothing, never touched it” = level 1-3. A few lines about yourself = 4. Half a page reviewed quarterly = solid 4.

Context and knowledge phase (level 5-8)

“Show me an AGENTS.md or CLAUDE.md from any project of yours.” - no such file = level 4 or below. File exists, last touched six months ago = level 5 with the documentation-graveyard anti-pattern. File touched this week = 6.
“Do you use any MCP / connectors / AI plugins? If so, name three and tell me when each last didn’t disappoint you.” - none = 5-7. Names + concrete stories = 8. List with no stories = 6-7 dressed up as 8.

Autonomy phase (level 9-11)

“Tell me about the most recent task an agent did end-to-end - without you intervening in the middle.” - none = below 9. Short anecdote = 9. “I have one daily” + example = solid 9.
“In the last month, have you designed a system of multiple agents collaborating? What was it? Does that system serve a specific business process with owners, audit logs and cost metrics?” - no = not above 9. Multi-agent process without owners and metrics = 10. A designed agentic OS for a concrete business outcome (e.g. QA, releases, compliance, customer support) with owners, logs, cost controls and a rollback procedure = 11.

Live observation (5 minutes)

The most important part. It typically corrects the self-assessment downward by 1-2 levels.

Ask: “Open your favourite AI tool and do something you do once a week. Think out loud.”

Signals:

Opens a clean window with no settings - level 1-2, regardless of what they said earlier.
Pastes a prompt from a document - prompt fetishism, level 3 with the anti-pattern.
The model already knows the context, the prompt is short - level 4+.
Invokes a skill / project / configured workspace - level 5-7.
Performs MCP actions (writes a file, pushes, posts to Slack) - level 8.
Types a goal and walks away from the screen - level 9.
Shows a panel of multiple agents with roles and logs - level 10.
Talks about a business process with an owner, audit logs, cost metrics and a rollback procedure - level 11.

Scoring

For each question, record a level 0-11. Then:

Individual level = median across 8 questions + observation. Median, not maximum. The most common beginner error: “I have custom instructions, so I’m at 4” - but every other answer says 2.
Organisational level = inferred through questions 5, 6, 8. Probe: “Is this your personal setup or the official company stack?” If personal, the organisation is lower.

Red flags in answers

Signals that the self-assessment is inflated:

“We have MCP” with no answer to “who authorises actions” - still level 6-7 dressed up as 8.
“I use it daily” with no distinction between tasks - daily Q&A is still level 1.
“The whole team is at X” with no sample - an average in a management memo is not a diagnosis.
“We had a prompt-engineering training” - training doesn’t change levels, behaviour does.
“We have it all in Confluence” - Confluence is documentation for humans, not context for agents.

Report - one-page format

Email after the session:

Interviewee: Name Surname, role, company
Session date: YYYY-MM-DD, duration: 30 min

Individual level: [X / 11]
Organisational level per interviewee: [Y / 11]
Gap: [description or "none"]

Recommendations:
1. [concrete action for 1-2 weeks, e.g. "write a CLAUDE.md for the main repo, max one page, review in two weeks"]
2. [concrete action for 1 month, e.g. "pick one MCP server, install it, use it 10 times in real work"]

What to avoid:
- [concrete warning, e.g. "don't buy a multi-agent framework subscription until a single agent handles 80% of tasks"]

What I usually see

From the last few dozen diagnoses:

Median in engineering teams: 3 (frameworks, prompt libraries).
Median in management teams: 2 (chat, occasional use).
Most common gap: individual 4-5, organisation 2. “I’m at 5, but at work I have to stick to vanilla ChatGPT because nothing else is approved.”
Most common inflated self-assessment: by 2 levels. “I’m at 7” → actually 4-5 after observation.

What’s next

Diagnosis is a start, not a goal. After the session, pick one level to cross in the next quarter. Not two, not three. One - with a concrete crossing signal (e.g. “CLAUDE.md in the main repo updated within the last week”).

Come back to the same interviewee every quarter. Three diagnoses a year show a trajectory - which matters more than a point estimate.

Version 3 of the Holak Scale will ship this protocol as a PDF template with scoring. If you use it now and see gaps - get in touch.