How to Measure Training ROI (Without Fooling Yourself)
Most organizations say they measure training ROI. Most are actually measuring training activity—completion rates, satisfaction scores, and hours delivered. This guide explains the difference, walks through the Kirkpatrick-Phillips framework, and offers a practical approach to evidence that holds up to scrutiny.
Real training ROI measurement requires isolating training's effect from other variables and converting that effect to a monetary value. Most organizations skip both steps, which is why most training ROI claims are not credible. The Kirkpatrick model and Phillips ROI methodology provide a structured path to evidence that holds up—but only if you design the measurement before the training, not after.
Key Takeaways
- The Kirkpatrick Four Levels—Reaction, Learning, Behavior, Results—measure different things; most organizations only collect Levels 1 and 2, which tells you almost nothing about business impact.
- The Phillips ROI Methodology adds a fifth level (Return on Investment) and, critically, a method for isolating training's contribution from other variables.
- Measurement must be designed before training delivery, not retrofitted afterward; pre-training baselines are essential.
- Soft data (morale, teamwork) can be converted to monetary values using estimation chains, though these require transparent assumptions.
- The cost of measurement is real; not every training program warrants full ROI analysis—focus rigorous evaluation on high-stakes, high-investment programs.
Why Most Training Measurement Is Theater
Every L&D function faces the same pressure: demonstrate value to leadership using data. And almost every L&D function responds to this pressure the same way—by measuring what is easy to measure. Completion rates. Post-training satisfaction surveys. Hours of content delivered. Sometimes pre-and-post knowledge tests.
None of these measure whether the training changed anything that matters to the business. They measure activity and immediate reaction, which are at best proxies for impact and at worst actively misleading. A training program with a 95% completion rate and 4.7/5 satisfaction score can be utterly ineffective at changing job behavior. We know this because the research on training transfer has consistently found that most participants do not apply what they learned—estimates of non-transfer range from 40% to 90% of training content, depending on the study and the domain.
This is the core problem with training measurement as most organizations practice it. This guide offers an alternative grounded in the Kirkpatrick model and Phillips ROI methodology—two frameworks that, used together, point toward evidence worth having.
The Kirkpatrick Four Levels
Donald Kirkpatrick developed his four-level evaluation model in the 1950s. It has been updated, debated, and extended, but the core taxonomy remains the most widely used framework for thinking about training evaluation. Understanding it clearly is the foundation for measurement that means something.
Level 1: Reaction. Did participants find the training relevant, engaging, and well-delivered? This is what most post-training surveys measure. It is the easiest level to collect and the least informative about impact. High satisfaction scores do not predict learning, behavior change, or business results.
Level 2: Learning. Did participants acquire the knowledge, skills, attitudes, or confidence targeted by the training? This is typically measured by pre-and-post assessments, skill demonstrations, or simulations. Level 2 data is meaningfully more valuable than Level 1 because it confirms whether transfer of learning occurred—but it still does not tell you whether that learning changed anything on the job.
Level 3: Behavior. Did participants apply what they learned to their work? This requires observation or manager assessment of job behavior after training, compared to before. Level 3 data is collected infrequently because it requires follow-up over time (typically 30–90 days post-training) and cooperation from managers. It is also the most practically important level: without behavior change, no business result is plausible.
Level 4: Results. Did the behavior change produce the business outcomes the training was intended to support? Reduced error rates, faster onboarding times, lower sales cycle duration, improved customer satisfaction scores—these are the organizational metrics that justify training investment. Level 4 data is the most meaningful and the most difficult to collect.
The common failure mode is treating Levels 1 and 2 as a complete measurement story. They are not. They are necessary but insufficient conditions for demonstrating training value.
The Phillips ROI Methodology: Adding Level 5
Jack Phillips extended the Kirkpatrick model with a fifth level: Return on Investment. The Phillips ROI methodology adds two critical elements that the original Kirkpatrick model does not address: isolating the effect of training from other variables, and converting outcomes to monetary values.
These additions are what separate credible ROI claims from wishful thinking. Consider a scenario where a sales training program is delivered in Q1, and sales performance improves by 15% over the following quarter. Was that the training? Was it a new product launch? A competitor’s stumble? An improvement in the economy? Without isolating training’s contribution, a 15% improvement claim is meaningless as evidence of training ROI.
Phillips proposed several techniques for isolation: control groups (compare trained vs. untrained employees on the same tasks), trend-line analysis (project what performance would have been without training based on historical trends), and participant estimation (ask participants and managers to estimate what percentage of the improvement was due to training, then apply a confidence factor to that estimate). The participant estimation method is the most practical for most organizations, though it requires careful design and transparent documentation of assumptions.
Monetary conversion—translating a business outcome into a dollar value—is the other essential step. For some outcomes, conversion is straightforward: a reduction in error rate can be multiplied by the known cost per error. For softer outcomes like improved team communication or reduced conflict, conversion requires an estimation chain—a documented set of reasonable assumptions linking the outcome to a monetary value. These chains are inherently imprecise, but they are far better than unmeasured claims, provided the assumptions are stated explicitly and conservatively.
Designing Measurement Before the Training
The single most important practical insight from the Kirkpatrick-Phillips framework is this: measurement must be designed before the training is delivered, not retrofitted afterward. Without a pre-training baseline, Level 3 and Level 4 data are uninterpretable. Without a defined control group or isolation methodology, any post-training improvement is unattributable.
In practice, this means including measurement design in the training design process. When you define the business problem the training is intended to address, simultaneously define: what the current state of that problem looks like in measurable terms, how you will observe whether behavior changes 60–90 days post-training, and what data sources will allow you to track Level 4 results over the relevant time horizon.
This is harder than collecting a post-training satisfaction survey. It requires stakeholder alignment and sometimes data infrastructure that does not currently exist. But it is the only path to evidence that holds up when L&D is asked to justify its investment to a CFO.
What Not to Measure—and Why
Not every training program warrants full Kirkpatrick-Phillips evaluation. A 20-minute compliance refresher training for 50 employees does not need a control group, isolation methodology, and monetary conversion. The cost of rigorous evaluation often exceeds the value of the information it produces for small, low-stakes programs.
A reasonable heuristic: apply full ROI methodology to training programs with the highest investment (time, money, or strategic stakes), highest visibility (programs that leadership tracks closely), and most uncertain evidence base (new programs without a track record). For lower-stakes programs, Level 1-3 collection is sufficient, with Level 4 metrics monitored informally rather than through formal evaluation design.
For a broader strategic framework for L&D decision-making, see our corporate training strategy guide. For coverage of behavioral science insights relevant to learning design, visit our behavioral science section.
Sources
- Kirkpatrick Partners – The Four Levels of Evaluation
- Phillips, J.J. (2012) – Return on Investment in Training and Performance Improvement Programs (Routledge)
- Saks & Burke (2012) – An investigation into the relationship between training evaluation and the transfer of training (Int. J. of Training & Development)
- ATD State of the Industry Report 2024
Frequently Asked Questions
Smarter learning, in your inbox
Get new vaeyc articles, AI tools, and career-growth tips weekly. Free.



