The 174 AI literacy assessment rubric
The full methodology behind the 10-minute organizational assessment — three dimensions, twelve questions, the scoring logic, and how to interpret your own report honestly.
The assessment most companies offer ends with a sales call. Ours ends with a report you can forward to your CHRO, your COO, or your CEO. To make that report defensible, the rubric behind it has to be public. This is that rubric — every question, every option weight, every bucket threshold, and the logic that turns answers into a recommended rollout shape.
If you’d rather just see your own number, the 10-minute assessment is the fastest way. If you want to read the methodology first — or you’re the kind of buyer who reads methodology before clicking anything — this page is for you.
Why three dimensions
Most AI-literacy assessments measure one of two things: tool usage (“does your team use ChatGPT?”) or sentiment (“how excited are people about AI?”). Both are misleading. Tool usage misses whether the work is any good; sentiment misses whether anything is shipping.
The three dimensions in the 174 rubric — adoption, capability, and governance — exist because they’re the smallest set that captures what actually matters for a mid-market rollout:
- Adoption is whether AI is in the work, day-to-day, across the people who would benefit. It’s the breadth signal.
- Capability is whether the work being done with AI is actually good — prompts that get reused, output that gets evaluated, agents that ship. It’s the depth signal.
- Governance is whether the program can survive scale — policy, quality controls, executive sponsorship. It’s the durability signal.
Drop any one and you can construct an organization that looks healthy on the other two but falls over the moment a rollout actually begins. We considered splitting governance into “policy” and “controls” — they’re meaningfully different — but in practice mid-market buyers think of them together, and conflating them costs us very little. Three dimensions, scored 0–100 each, bucketed into Emerging / Developing / Mature.
The questions
All 12 questions are below. Each option carries a score weight; the total possible score per dimension is the sum of each question’s max-option score. A buyer’s percentage in each dimension is (actual / max) * 100, rounded.
Adoption
A1. What share of your team uses AI tools in their day-to-day work?
Single select. The breadth-of-use signal.
| Option | Score |
|---|---|
| Less than 10% | 0 |
| 10–30% | 25 |
| 30–60% | 60 |
| More than 60% | 100 |
A2. Which AI tools are in active use across your team?
Multi select (up to 6). Each selection contributes — “None of the above” is its own option scored at 0 to keep the math honest. The mix matters: a team running custom agents alongside general assistants is signalling a different maturity than one running only ChatGPT.
| Option | Score |
|---|---|
| ChatGPT / Claude / general assistants | 20 |
| Copilot for code (GitHub, Cursor, Windsurf) | 20 |
| AI features inside existing SaaS (Notion AI, Slack AI) | 15 |
| Custom prompts or GPTs in shared workspaces | 20 |
| Internal agents / automations (n8n, Zapier AI, custom) | 25 |
| None of the above | 0 |
A3. How often does your team use AI for real work (not experimentation)?
Single select. Frequency-of-real-use, distinguished from playing-with-it.
| Option | Score |
|---|---|
| Rarely or never | 0 |
| A few times a month | 25 |
| Weekly | 60 |
| Daily | 100 |
A4. How would you rate your team’s satisfaction with current AI tools?
Single select. Captures whether adoption is durable or about to churn.
| Option | Score |
|---|---|
| Frustrated — outputs aren’t reliable | 0 |
| Mixed — some wins, some misses | 33 |
| Generally positive | 66 |
| Strong — it’s changed how we work | 100 |
Capability
C1. How would you rate your team’s prompting skill?
Single select. Honest answer beats aspirational.
| Option | Score |
|---|---|
| Most people just type a question | 0 |
| Some patterns, but ad-hoc | 33 |
| Documented prompts that get reused | 66 |
| Evaluated, versioned, reviewed prompts | 100 |
C2. How does your team evaluate AI output before relying on it?
Single select. Eyeballing and structured evaluation are two different rooms.
| Option | Score |
|---|---|
| We don’t — we eyeball it | 0 |
| Spot checks by a senior person | 33 |
| Informal rubrics or checklists | 66 |
| Structured evaluation with documented rubrics | 100 |
C3. How often does your team use AI in multi-step / agentic workflows?
Single select. Beyond single prompts — chained, automated, or agent-like.
| Option | Score |
|---|---|
| Never | 0 |
| Experimenting | 33 |
| A few production workflows | 66 |
| It’s a core part of how we work | 100 |
C4. How does your team currently learn new AI techniques?
Multi select (up to 5). Captures whether learning is structured or entirely incidental.
| Option | Score |
|---|---|
| YouTube and Twitter / X threads | 10 |
| Internal Slack channels and word of mouth | 15 |
| Courses or certifications | 25 |
| Internal lunch-and-learns or workshops | 25 |
| A formal AI literacy program | 25 |
Governance
G1. Do you have a written AI usage policy?
Single select. The single highest-leverage governance instrument.
| Option | Score |
|---|---|
| No | 0 |
| Drafting one | 33 |
| Yes — published but not enforced | 66 |
| Yes — published, trained, enforced | 100 |
G2. Who owns AI enablement in your organization?
Single select. Ownership is the difference between “we should do AI” and “we are doing AI.”
| Option | Score |
|---|---|
| No one in particular | 0 |
| IT or Security — mostly risk management | 33 |
| L&D or People Ops | 66 |
| A dedicated AI / transformation lead | 100 |
G3. Are there quality controls for AI output that ships externally?
Single select. Customer-facing copy, code, contracts, decisions.
| Option | Score |
|---|---|
| No controls | 0 |
| Manager review only | 33 |
| Documented review steps | 66 |
| Documented + enforced + audited | 100 |
G4. What’s the level of executive sponsorship for AI literacy?
Single select. Without an exec sponsor, the program lives until the next budget review.
| Option | Score |
|---|---|
| No exec sponsor | 0 |
| Verbal support, no budget | 33 |
| Budget allocated, no clear program | 66 |
| Funded program with exec accountability | 100 |
How scoring works
Each dimension is computed independently:
- Sum the scores for the answers given to the questions in that dimension.
- Sum the maximum possible scores for the same questions.
- Divide and multiply by 100. Round to the nearest integer.
For Adoption (questions A1 + A2 + A3 + A4), the maximum is 100 + 100 + 100 + 100 = 400. A response earning 0 + 40 + 25 + 33 = 98 produces an Adoption score of round(98 / 400 * 100) = 25.
For Capability and Governance, the same arithmetic applies with their respective maxima.
The overall score is the rounded average of the three dimension scores. There is no weighting — we treat the dimensions as equally important on purpose, because the failure mode of weighting is letting a single strong dimension hide weakness elsewhere.
How buckets work
| Score | Bucket |
|---|---|
| 0–39 | Emerging |
| 40–69 | Developing |
| 70–100 | Mature |
The thresholds aren’t tuned to a normal distribution — they’re tuned to the rollout decisions a buyer needs to make. A score below 40 in any dimension means the program for that dimension is not yet running. A score between 40 and 69 means the program is running but inconsistent. A score of 70 or higher means the program is durable and ready to scale or deepen.
How recommendations are generated
For every (dimension, bucket) pair, the rubric carries two pieces of copy:
- The highest-leverage gap. What’s actually limiting progress in that dimension at that maturity. Phrased as a diagnosis, not a complaint.
- The recommended next move. What 174 would suggest you do first. Phrased as a single concrete action, not a workshop list.
These pairs aren’t generated dynamically — they’re authored. We update the copy when we learn something new from a real engagement; we don’t update them per-buyer. The full mapping is in src/data/assessment.ts in the marketing repo if you want to read every variant.
The overall rollout shape is computed from your overall score plus your weakest dimension:
- Below 40 overall: a 1–10 seat pilot, in the team most ready to move.
- 40–69 overall: a department-wide rollout, starting with the weakest dimension.
- 70+ overall: org-wide concierge — the program scales now.
How to interpret your own score
Three honest principles:
-
The rubric rewards honest answers. Picking aspirational options (“we’re drafting one” when you haven’t started) inflates the bucket and recommends the wrong rollout shape. The point of the assessment is to see clearly, not to feel good.
-
Low scores in Emerging are useful, not damning. A 25 on Adoption is a clear instruction: visibility and access first. A 25 on Capability is a clear instruction: curriculum first. The bucket is your starting line, not a grade.
-
Mature in one dimension, Emerging in another is the most common shape. Mid-market companies routinely have strong adoption and weak governance, or vice versa. The assessment surfaces that asymmetry on purpose — most rollout failures are governance failures dressed up as adoption successes.
If you’ve taken the assessment and want to retake it after a few months of work, the URL is the same. The report you generate is dated, so you can put two reports side by side and see the actual lift.
Where does your org actually stand?
Ten minutes. Three dimensions. A leadership-shareable baseline.