Worked Examples and the Illusion of Competence
At IDTX: Evidence Informed Practice, Matt Zatonski, presented alongside Sara Farwell, that as a surgeon, he was not permitted to watch videos or recordings of a procedure before he was in theatre and part of performing it himself. The reasoning: watching the operation in advance risked building a sense of competence that the watcher had not earned, and in surgery a false sense of competence is a big risk.
I paint miniatures, and I recognised the pattern immediately, though the stakes in my hobby could hardly be more different. You watch someone demonstrate a technique, edge highlighting, wet blending, a glaze that takes a flat surface and gives it depth, and you watch it enough times that you begin to feel you can do it. You start a project with that confidence, attempt the technique, and discover that the smooth, narrated, well-lit version you absorbed bears little resemblance to what your own hands produce. Turns out you are in no way even approaching competence, regardless of what you may have been tricked into thinking.
When I get it wrong on a model, the consequences are trivial. I paint over it, I strip it back, or I leave it, because a slightly botched cloak on a 28mm figure is not going to harm anyone. Matt’s example sits at the other end of the scale entirely, where overconfidence carries a cost measured in patient safety. Most of the work we do in organisations sits somewhere between those two poles; few of us are conducting surgery, but most of us operate in environments where a confidently mishandled task, a difficult conversation, a safety procedure, a financial control, a piece of regulated work, can carry real consequences that compound when nobody notices the gap between what someone believed they could do and what they could do.
This raised a question: does seeing a task completed ahead of doing it create a false sense of competence, and if it does, how do we keep the well-established benefits of worked examples without manufacturing overconfidence in the process?
What watching does
In 2018 Kardas and O’Brien conducted a series of six experiments, in which people watched others perform a skill and then estimated their own ability to do it. People who watched a short video of a performer twenty times were markedly more confident they could carry out the task themselves than people who watched it once; they predicted they would perform better, and they reported having learned more and improved more. Their actual performance, when measured, was no better than the group who had watched the video a single time. The watching inflated the feeling of skill without supplying any of the skill.
In one study involving juggling, the overconfidence collapsed the moment people were handed the pins and allowed even a brief attempt; having held the pins, they revised their self-estimates downward and reported they had learned less and were less capable than they had thought. A small dose of doing destroyed the illusion that watching had built. The same experiments showed the inflated confidence only appeared when people could track the performer’s specific movements; when the performer’s hands were hidden, the effect disappeared, which suggests the illusion feeds on watching the steps and concluding “I could do that.”
The mechanism underneath this is metacognitive. When information is easy to process, a clear demonstration, a fluent and confident presenter, a well-edited video, we read that ease as evidence of our own learning. Carpenter et al. (2013) showed: people who watched a fluent instructor deliver content predicted they had learned far more than people who watched a hesitant instructor deliver the same content, yet both groups performed identically on the subsequent test. Fluency raised the perception of learning without actually facilitating any.
There’s a related phenomenon, the illusion of explanatory depth, where people believe they understand how something works in far more detail than they do, right up until the moment they are asked to produce a full explanation and discover the gap (Rozenblit and Keil, 2002). Watching an expert produces exactly this; the whole procedure is laid out, visible, and coherent, so recognising the steps feels like possessing them, when in reality, recognising a procedure and producing it under realistic conditions are rather different.
I should add a note of caution about a term people often reach for when discussing this, the Dunning-Kruger effect. The popular version, in which the least competent are the most confident and a graph shows a peak of misplaced certainty, does not appear in the original research and rests on shaky statistical ground; a good deal of the classic pattern turns out to be an artefact of how the data is analysed (Gignac and Zajenkowski, 2020). The robust and useful finding is: people in general are poor judges of their own competence, almost everyone overestimates to some degree, and self-assessment alone is a weak basis for deciding whether someone can do a job. That is enough to build on, and it doesn’t require the mythologised version.
What this doesn’t mean
The temptation at this point is to conclude that worked examples are a problem and that demonstrations should be treated with suspicion.
Giving a novice a fully worked solution to study, rather than setting them loose to solve an unfamiliar problem from scratch, reliably produces better results with less wasted effort, because it directs their limited working memory toward understanding the structure of the solution instead of toward fruitless trial and error (Sweller and Cooper, 1985). A recent meta-analysis in mathematics puts the benefit at a medium and dependable effect (Barbieri et al., 2023). For someone encountering something unfamiliar, a clear example is among the most useful things we can provide.
The danger lies in passive, fluent observation that gets mistaken for practice. The same demonstration can be a powerful aid or a confidence trap depending entirely on what the person is required to do with it. A demonstration followed by study, attempt, and feedback builds capability. A demonstration watched repeatedly and never enacted builds a feeling of undesrved confidence.
Remember, examples have a shelf life. As people develop expertise, the detailed guidance that helped them as novices starts to get in the way, because they now have to reconcile the external explanation with their own developing knowledge; this is the expertise reversal effect, and it means that continuing to show fluent demonstrations to people who should be practising independently can have a damaging effect (Kalyuga et al., 2003).
Evidence-informed design
If the problem is observation mistaken for practice, the solution is to design training so that confidence is repeatedly tested against actual competence, which means surrounding every example with: generation, attempt, retrieval, and feedback. The good news for anyone building training is that the most of it costs little beyond a willingness to make people do something rather than watch something.
The single highest-leverage and lowest-cost addition is the self-explanation prompt. Chi et al. (1989) found that the people who benefited most from worked examples were the ones who explained the steps to themselves as they studied, connecting each move to the principle behind it, and later work confirmed that prompting people to do this deliberately, improves their understanding and their ability to transfer it to new problems. Asking someone to explain, in their own words, why a step works and what they expect to get wrong can surfaces the illusion of explanatory depth before they over-commit to a task they have only watched.
Next up, let’s talk fading. Rather than moving someone straight from a complete worked example to a blank problem, you remove the support gradually: show the full solution, then present the next task with the final step missing for them to complete, then the final two steps, and so on until they are working unaided. Atkinson, Renkl and Merrill (2003) found that faded sequences combined with self-explanation prompts improved performance on similar problems and, harder still, the ability to transfer the skill to different ones. Fading forces production at every stage, so the person keeps meeting the gap between recognising a step and generating it.
For anything conceptual, there is a strong case for having people attempt before they are shown. Productive failure asks people to work through a novel problem and often to get it wrong before any instruction or worked solution arrives. A substantial meta-analysis by Sinha and Kapur (2021) found that problem-solving before instruction outperformed instruction before problem-solving for conceptual understanding and transfer, with the benefit growing the more faithfully the approach was designed, particularly when people were asked to generate and then compare several of their own attempts against the expert solution. The attempt does two things at once: it prepares the mind to make sense of the example when it arrives, and it calibrates confidence to reality before any false sense of competence can take hold. The caveat is that this works best for concepts and for adults who have enough prior knowledge to fail productively; for pure procedural skill in a complete novice, leading with the clear example remains the better route.
Then there is the most direct antidote of all, which is to stop accepting the feeling of competence as evidence and to require people to produce. Retrieval practice, low-stakes testing where people have to generate an answer rather than review one, is among the most robustly evidenced techniques in all of the learning sciences (Adesope, Trevisan and Sundararajan, 2017), and its value here is diagnostic as much as it is about retention; a test exposes the gap between what someone believes they know and what they can produce, and it does so to the person themselves, which is the only audience whose miscalibration we are trying to correct. Confidence ratings and completion rates tell us nothing about capability. A short, well-designed task that makes someone do the thing tells us a lot more.
Where the stakes are real, we should consider enactive practice with feedback against a defined standard. Ericsson’s work on deliberate practice established decades ago that expertise comes from effortful, goal-directed practice with immediate and informative feedback, not from exposure or experience accumulated passively (Ericsson, Krampe and Tesch-Römer, 1993). The clearest demonstration of where this leads comes from medicine, which has moved away from the old “see one, do one, teach one” apprenticeship and toward simulation-based training in which trainees practise to a defined mastery standard before they act on a patient. A meta-analysis by McGaghie et al. (2011) found this approach substantially outperformed traditional clinical training, and the principle generalises cleanly to the workplace: replace “watch the demonstration and self-certify” with “practise to an observed standard with feedback,” and you decouple progression from how ready someone feels.
Note: I must shout out to Laura Watkins’ session at IDTX Evidence Informed practice, which focused entirely on the concept of deliberate practice and how we can effectively build it into our work.
When a demonstration works
None of this argues against demonstrations; it argues for using them as the opening of a process. When you do put a worked example or a demonstration in front of someone, a few design choices reduce the chance that it inflates confidence rather than building skill.
Show the messy version, not only the polished one. A demonstration in which the performer struggles, makes a mistake, and recovers counteracts the “that looks easy” signal that a flawless, edited performance sends, and the evidence on observational learning suggests these coping demonstrations are at least as effective for building real capability.
Require an attempt immediately afterward. Even a brief go at the task collapses the overconfidence that watching creates, so never let a demonstration stand alone without a hands-on attempt close behind it.
Prompt explanation and prediction. Ask people to explain why each step works and to predict where they will struggle, which surfaces the holes in their understanding before they discover them the hard way.
Use the person’s own performance as the feedback. Recordings or observation of their own attempt, reviewed against the standard, calibrate self-assessment far better than watching someone else succeed.
Take the example away as competence grows. Fluent demonstrations that helped at the start will start to hinder as people develop, so fade them out rather than repeating them.
The principle underneath
Watching a task done well will often mislead you to believe that you are readier than you are, because the watching is smooth and clear and the doing is neither, and the mind mistakes the ease of recognition for the substance of skill.
Keep showing people good examples, because good examples are useful tools, but refuse to let the watching pose as practice, and build every piece of training so that the moment of feeling competent is quickly followed by the moment of finding out. Design the doing in early, attach feedback to it, and hold people to a standard.
References
Adesope, O.O., Trevisan, D.A. and Sundararajan, N. (2017) ‘Rethinking the use of tests: a meta-analysis of practice testing’, Review of Educational Research, 87(3), pp. 659-701.
Atkinson, R.K., Renkl, A. and Merrill, M.M. (2003) ‘Transitioning from studying examples to solving problems: effects of self-explanation prompts and fading worked-out steps’, Journal of Educational Psychology, 95(4), pp. 774-783.
Barbieri, C.A., Miller-Cotto, D., Clerjuste, S.N. and Chawla, K. (2023) ‘A meta-analysis of the worked examples effect on mathematics performance’, Educational Psychology Review, 35, article 11.
Carpenter, S.K., Wilford, M.M., Kornell, N. and Mullaney, K.M. (2013) ‘Appearances can be deceiving: instructor fluency increases perceptions of learning without increasing actual learning’, Psychonomic Bulletin & Review, 20(6), pp. 1350-1356.
Chi, M.T.H., Bassok, M., Lewis, M.W., Reimann, P. and Glaser, R. (1989) ‘Self-explanations: how students study and use examples in learning to solve problems’, Cognitive Science, 13(2), pp. 145-182.
Ericsson, K.A., Krampe, R.T. and Tesch-Römer, C. (1993) ‘The role of deliberate practice in the acquisition of expert performance’, Psychological Review, 100(3), pp. 363-406.
Gignac, G.E. and Zajenkowski, M. (2020) ‘The Dunning-Kruger effect is (mostly) a statistical artefact: valid approaches to testing the hypothesis with individual differences data’, Intelligence, 80, 101449.
Kalyuga, S., Ayres, P., Chandler, P. and Sweller, J. (2003) ‘The expertise reversal effect’, Educational Psychologist, 38(1), pp. 23-31.
Kardas, M. and O’Brien, E. (2018) ‘Easier seen than done: merely watching others perform can foster an illusion of skill acquisition’, Psychological Science, 29(4), pp. 521-536.
McGaghie, W.C., Issenberg, S.B., Cohen, E.R., Barsuk, J.H. and Wayne, D.B. (2011) ‘Does simulation-based medical education with deliberate practice yield better results than traditional clinical education? A meta-analytic comparative review of the evidence’, Academic Medicine, 86(6), pp. 706-711.
Rozenblit, L. and Keil, F. (2002) ‘The misunderstood limits of folk science: an illusion of explanatory depth’, Cognitive Science, 26(5), pp. 521-562.
Sinha, T. and Kapur, M. (2021) ‘When problem solving followed by instruction works: evidence for productive failure’, Review of Educational Research, 91(5), pp. 761-798.
Sweller, J. and Cooper, G.A. (1985) ‘The use of worked examples as a substitute for problem solving in learning algebra’, Cognition and Instruction, 2(1), pp. 59-89.


Tom, fantastic article as usual. This reminds me of the Failure Mode and Effects Analysis (FMEA) risk assessments that are required as part of implementing processes, validations, process improvements, or major changes in structured environments. My personal experience with these is in the pharmaceutical manufacturing industry and operations. What "good" looks like could be a fully compliant, sterile, efficacious product that gets shipped to the customer on time and ready to be given to a patient within expiry. The actual on-the-floor manufacturing steps, testing, stability, and batch release steps are logical but numerous and prone to human error, despite the incorporation of automation and AI in various areas. Before (and during) any process step is changed or improved (which is happening all the time in manufacturing), an FMEA needs to be completed and signed off by all stakeholders. It's full focus on "failures" and evaluation of whether the current processes can identify and manage those risks quickly, safely, compliantly, and effectively. It's comprehensive and straightforward, but it takes time and careful objective review. It also gatekeeps next steps on a process improvement product. Makes sense, right?
There is a direct consequence of the illusion of skill that affects people at many workplaces. Watching AI generated slide decks and videos is easy and cheap to make. However, it creates illusion of skills and competence, with all the consequences: entitlement to promotions, belief people should be paid more, resentment when actually more competent people get promoted. This drives job dissatisfaction, retention challenges and workplace culture. This is why evidence informed L&D plays a crucial role in organisations who want to get ahead of generic competition, and who want to build capable and loyal workforce. We now have 2 decades of evidence showing that feelings-based workplace does not work.