How Evaluation Got Trapped in the Frame
I believe in the power of training. Rightly so. It can inspire, equip and energize. It creates moments of insight, shared language and renewed motivation. This belief that training works is not naïve. It is grounded in 58 years of lived experience.
Still, beneath that belief, there is often a quiet longing. A need for recognition. For something that confirms it made a difference. Not just for the participant, but for the organization, the outcome, the world of work around it. We want to know it mattered.
That longing is not weakness. It is the desire to connect effort with effect. And this is where a shift occurs. Not from doubt to trust, but from asking if it worked to asking how it worked. From seeking confirmation and certainty to embracing challenge and uncertainty.
Once we cross that threshold, everything changes. We move from conviction to method. From lived truth to measurable evidence. And that is where our usual ways of evaluating training begin to fall short.
In many cases, the question of impact arises only once the program is underway or completed. By that point, most decisions have already been made. The content has been designed. Participants have been selected. Delivery has taken place. Evaluation is treated as a follow-up task rather than as an integral part of the intervention design.
This sequencing reveals a deeper issue. When training is the starting point, the focus naturally shifts to its characteristics. How well was it delivered. Was it engaging. Did participants enjoy it. These are not irrelevant questions, but they tend to dominate the evaluation process in the absence of something more fundamental. That is, a clear and shared understanding of what the training is meant to accomplish.
In many cases, the intention behind the training remains abstract. Stakeholders may refer to general themes such as collaboration, resilience or growth. But these concepts are rarely defined in observable terms. As a result, the intervention begins to shape the purpose instead of the other way around. The form starts to determine the function.
This inversion has consequences. When the training itself becomes the reference point, the evaluation will almost inevitably focus on proximal indicators. Reactions. Attendance. Satisfaction. Perceived usefulness. These measures are easy to collect and analyze, but they offer limited insight into whether the underlying issue has been addressed.
Even established models, such as Kirkpatrick’s four levels, are often applied in a way that reinforces this pattern. Reaction and learning are treated as prerequisites. Behavior and results are inferred rather than tested. The assumption is that a well-designed training will produce effects that cascade outward into performance. But if the original goal was never specified, then any outcome or lack of outcome can be rationalized.
The illusion of structure
Frameworks like Phillips’ ROI model offer the appearance of methodological precision, but their structure is misleading. Each level of evaluation refers back to the program itself, reinforcing the assumption that the intervention is the source of change. The result is a closed loop of self-confirming logic, where we end up proving what we already assumed.
Satisfaction is measured in relation to the program. Learning is defined as the absorption of the program’s content. Application refers to the use of what was learned in the program. Business impact is attributed to what was learned in the program. And ROI is calculated as the return on the investment in the program, not in the employee.
The concepts themselves, however clear and meaningful they may seem, create a closed system. Each step reinforces the centrality of the intervention. The logic begins and ends with the program. The result is evaluative myopia. We examine how well the intervention performed on its own terms, without asking whether it was the right lever to begin with.
Instead of testing whether the system adapted or the employee improved in meaningful ways, we measure how well our own assumptions played out. That is not impact. That is circular reasoning with metrics attached.
The critical hinge
A defining feature of traditional evaluation methods is the attempt to isolate the effects of training from everything else that might influence change. The question whether the program caused the outcome is treated as the highest standard of proof.
This is exactly where the method begins to turn on itself. In trying to separate the intervention from its context, we separate the evaluation from its relevance. The more we pursue isolated attribution, the more we design our methods to exclude the very complexity we need to understand.
This is not a technical flaw. It is a conceptual trap that lures us into a seemingly safe, yet confined space. The methodology quietly locks the door behind us. The method that seeks clarity ends up narrowing its own field of vision. It asks for proof, but defines the very frame in which that proof can appear. What begins as a demand for evidence becomes a mechanism of detachment.
A more fundamental question
What was the intended outcome, and did the training contribute to its realization. Without a concrete and testable intention, it is not possible to assess whether the intervention was effective. If the desired change occurs independently of the training, the intervention was unnecessary. If the change does not occur at all, then even a well-executed program has failed in substantive terms.
Discussions about return on investment frequently obscure this issue. What is often presented as ROI is in fact the return on the program. It is a calculation of cost-efficiency based on the delivery of the intervention itself. Rarely is the return measured against the original need, because that need was never formulated in a way that could govern the evaluation process.
A more meaningful approach would begin with expectations. What specific change is desired. In which context. For which group. According to which criteria. If those expectations are stated clearly, they can serve as a benchmark for both design and evaluation. If not, then measurement becomes speculative. The numbers may be precise, but the reasoning behind them is not.
This kind of reflection is more difficult than reporting participation rates or compiling survey scores. It forces organizations to articulate what they want and to confront whether their interventions actually contribute to it. But it is precisely this difficulty that gives impact measurement its value. It turns evaluation into a practice of learning instead of a ritual of justification.
Reclaiming the frameworks
To be clear, models like Kirkpatrick and Phillips do offer value. Not as evidence structures, but as a way to categorize intent. They can help open the right conversation if we treat them as frameworks for orientation and not as proof of effectiveness.
One practical way to use them is to ask a simple question at the start of any intervention. Is this training intended to support pleasure, potential, performance, productivity or profitability.
That distinction matters. A session meant to boost morale requires a different kind of success than a program aimed at operational efficiency. Over the years, this simple classification has proven to be one of the most effective tools for clarifying expectations. It creates shared language. It brings assumptions to the surface. And it allows evaluation to begin in alignment with purpose rather than with the program.
From there, the methodological work begins. But without that first step, measurement lacks direction.
Final reflection
Training can be an effective means of supporting change, but only if it is embedded within a coherent logic of intention, action and result. That requires more than good design. It requires clarity about purpose, discipline in formulation and honesty in reflection. It requires a theory. A practical theory that elicits what stakeholders want and believe the program will help achieve. A working hypothesis that supports training design and governs the form of inquiry we commonly refer to as evaluation.
And for the record, I’m perfectly happy with an overpriced leadership retreat in Lausanne or the mythical hills above Aspen. Overflowing with beautiful tagliatelle, luxurious rooms, polite people and excellent wine. No problem. Bring it on. Just don’t expect too much. You never know how the whole thing might end in surprise. If that’s the purpose, it’s fine. I’m open to evaluating it. After all, somewhere behind the facade, something might already be quietly transforming.
No responses yet