Training & Dev Training & Development Strategy

How to Measure Training ROI (Without Fooling Yourself)

Most organizations say they measure training ROI. Most are actually measuring training activity—completion rates, satisfaction scores, and hours delivered. This guide explains the difference, walks through the Kirkpatrick-Phillips framework, and offers a practical approach to evidence that holds up to scrutiny.

By Joshua Baker

Training & Development Editor

Published May 14, 2026 · Updated May 14, 2026 · 5 min read

How to Measure Training ROI (Without Fooling Yourself)

Quick Answer

Real training ROI measurement requires isolating training's effect from other variables and converting that effect to a monetary value. Most organizations skip both steps, which is why most training ROI claims are not credible. The Kirkpatrick model and Phillips ROI methodology provide a structured path to evidence that holds up—but only if you design the measurement before the training, not after.

Key Takeaways

The Kirkpatrick Four Levels—Reaction, Learning, Behavior, Results—measure different things; most organizations only collect Levels 1 and 2, which tells you almost nothing about business impact.
The Phillips ROI Methodology adds a fifth level (Return on Investment) and, critically, a method for isolating training's contribution from other variables.
Measurement must be designed before training delivery, not retrofitted afterward; pre-training baselines are essential.
Soft data (morale, teamwork) can be converted to monetary values using estimation chains, though these require transparent assumptions.
The cost of measurement is real; not every training program warrants full ROI analysis—focus rigorous evaluation on high-stakes, high-investment programs.

In this article

Why Most Training Measurement Is Theater

Every L&D function faces the same pressure: demonstrate value to leadership using data. And almost every L&D function responds to this pressure the same way—by measuring what is easy to measure. Completion rates. Post-training satisfaction surveys. Hours of content delivered. Sometimes pre-and-post knowledge tests.

None of these measure whether the training changed anything that matters to the business. They measure activity and immediate reaction, which are at best proxies for impact and at worst actively misleading. A training program with a 95% completion rate and 4.7/5 satisfaction score can be utterly ineffective at changing job behavior. We know this because the research on training transfer has consistently found that most participants do not apply what they learned—estimates of non-transfer range from 40% to 90% of training content, depending on the study and the domain.

This is the core problem with training measurement as most organizations practice it. This guide offers an alternative grounded in the Kirkpatrick model and Phillips ROI methodology—two frameworks that, used together, point toward evidence worth having.

The Kirkpatrick Four Levels

Donald Kirkpatrick developed his four-level evaluation model in the 1950s. It has been updated, debated, and extended, but the core taxonomy remains the most widely used framework for thinking about training evaluation. Understanding it clearly is the foundation for measurement that means something.

Level 1: Reaction. Did participants find the training relevant, engaging, and well-delivered? This is what most post-training surveys measure. It is the easiest level to collect and the least informative about impact. High satisfaction scores do not predict learning, behavior change, or business results.

Level 2: Learning. Did participants acquire the knowledge, skills, attitudes, or confidence targeted by the training? This is typically measured by pre-and-post assessments, skill demonstrations, or simulations. Level 2 data is meaningfully more valuable than Level 1 because it confirms whether transfer of learning occurred—but it still does not tell you whether that learning changed anything on the job.

Level 3: Behavior. Did participants apply what they learned to their work? This requires observation or manager assessment of job behavior after training, compared to before. Level 3 data is collected infrequently because it requires follow-up over time (typically 30–90 days post-training) and cooperation from managers. It is also the most practically important level: without behavior change, no business result is plausible.

Level 4: Results. Did the behavior change produce the business outcomes the training was intended to support? Reduced error rates, faster onboarding times, lower sales cycle duration, improved customer satisfaction scores—these are the organizational metrics that justify training investment. Level 4 data is the most meaningful and the most difficult to collect.

The common failure mode is treating Levels 1 and 2 as a complete measurement story. They are not. They are necessary but insufficient conditions for demonstrating training value.

The Phillips ROI Methodology: Adding Level 5

Jack Phillips extended the Kirkpatrick model with a fifth level: Return on Investment. The Phillips ROI methodology adds two critical elements that the original Kirkpatrick model does not address: isolating the effect of training from other variables, and converting outcomes to monetary values.

These additions are what separate credible ROI claims from wishful thinking. Consider a scenario where a sales training program is delivered in Q1, and sales performance improves by 15% over the following quarter. Was that the training? Was it a new product launch? A competitor’s stumble? An improvement in the economy? Without isolating training’s contribution, a 15% improvement claim is meaningless as evidence of training ROI.

Phillips proposed several techniques for isolation: control groups (compare trained vs. untrained employees on the same tasks), trend-line analysis (project what performance would have been without training based on historical trends), and participant estimation (ask participants and managers to estimate what percentage of the improvement was due to training, then apply a confidence factor to that estimate). The participant estimation method is the most practical for most organizations, though it requires careful design and transparent documentation of assumptions.

Monetary conversion—translating a business outcome into a dollar value—is the other essential step. For some outcomes, conversion is straightforward: a reduction in error rate can be multiplied by the known cost per error. For softer outcomes like improved team communication or reduced conflict, conversion requires an estimation chain—a documented set of reasonable assumptions linking the outcome to a monetary value. These chains are inherently imprecise, but they are far better than unmeasured claims, provided the assumptions are stated explicitly and conservatively.

Designing Measurement Before the Training

The single most important practical insight from the Kirkpatrick-Phillips framework is this: measurement must be designed before the training is delivered, not retrofitted afterward. Without a pre-training baseline, Level 3 and Level 4 data are uninterpretable. Without a defined control group or isolation methodology, any post-training improvement is unattributable.

In practice, this means including measurement design in the training design process. When you define the business problem the training is intended to address, simultaneously define: what the current state of that problem looks like in measurable terms, how you will observe whether behavior changes 60–90 days post-training, and what data sources will allow you to track Level 4 results over the relevant time horizon.

This is harder than collecting a post-training satisfaction survey. It requires stakeholder alignment and sometimes data infrastructure that does not currently exist. But it is the only path to evidence that holds up when L&D is asked to justify its investment to a CFO.

What Not to Measure—and Why

Not every training program warrants full Kirkpatrick-Phillips evaluation. A 20-minute compliance refresher training for 50 employees does not need a control group, isolation methodology, and monetary conversion. The cost of rigorous evaluation often exceeds the value of the information it produces for small, low-stakes programs.

A reasonable heuristic: apply full ROI methodology to training programs with the highest investment (time, money, or strategic stakes), highest visibility (programs that leadership tracks closely), and most uncertain evidence base (new programs without a track record). For lower-stakes programs, Level 1-3 collection is sufficient, with Level 4 metrics monitored informally rather than through formal evaluation design.

For a broader strategic framework for L&D decision-making, see our corporate training strategy guide. For coverage of behavioral science insights relevant to learning design, visit our behavioral science section.

Sources

#corporate training #Kirkpatrick model #L&D measurement #learning and development #Phillips ROI #training evaluation #training ROI

Training & Development Editor

Joshua Baker

Joshua Baker spent over a decade inside corporate learning and development before he started writing about it. He built and led training functions at mid-size companies across the financial services and professional services sectors, where he was responsible for everything from new-hire… Read full profile →

Frequently Asked Questions

The Kirkpatrick model is a four-level framework for evaluating training effectiveness: Level 1 (Reaction—did participants like it?), Level 2 (Learning—did they acquire the knowledge?), Level 3 (Behavior—did they apply it on the job?), and Level 4 (Results—did it improve business outcomes?). Most training programs only collect Levels 1 and 2, which are the least informative about actual impact.

The Phillips ROI methodology extends Kirkpatrick's four levels by adding a fifth: return on investment. More importantly, Phillips adds specific techniques for isolating training's contribution from other variables and for converting outcomes to monetary values—the two steps most organizations skip, and the ones most necessary for a credible ROI claim.

The most rigorous method is a control group—comparing trained employees to a similar untrained group on relevant performance metrics. Where control groups are impractical, Phillips recommends trend-line analysis (projecting what performance would have been without training) or participant/manager estimation with confidence adjustments. Participant estimation is the most frequently used practical approach.

No. Full ROI methodology is most justified for high-investment, high-visibility programs where leadership is actively evaluating spend. For routine compliance training or small-scale skill refreshers, basic Level 1-3 data and informal monitoring of Level 4 metrics is proportionate. Match the rigor of evaluation to the stakes of the program.

A pre-training baseline is a measurement of the current state of the performance problem the training is designed to address, collected before the training is delivered. Without a baseline, you cannot determine whether any observed improvement is meaningful. Baselines might include error rates, sales performance, customer satisfaction scores, or manager ratings of specific behaviors—whatever is relevant to the training's intended outcome.

Yes, with careful methodology. Phillips describes estimation chains: document the connection between the soft outcome and a monetizable downstream effect (e.g., reduced turnover costs). Use conservative estimates, state your assumptions explicitly, and apply confidence adjustments to participant estimates. This approach makes the calculation transparent and auditable, which is more credible than ignoring soft outcomes entirely.