Self-Explanation: The Evidence Behind Asking People to Articulate What They Think They Know

Mar 08, 2026

Think about the last time you had to explain something to someone else and found, somewhere in the middle of explaining it, that you understood it better than when you started. Perhaps you were walking a colleague through a process and noticed a gap in your own reasoning that you had not previously registered; perhaps you were describing a decision you had made and, in the act of articulating your rationale, recognised a flaw in it. The understanding did not arrive when the information arrived, rather, when you were required to put it into your own words, to make the implicit explicit, to generate something beyond what had been given to you.

This is not a feature unique to teaching others. The same process operates when we read something complex and pause to explain it to ourselves, when we work through a problem and narrate our reasoning rather than just executing steps, when we write in the margins of a document or talk ourselves through a decision. The conversation in our own heads, when it is a genuine attempt to account for reasoning rather than simply to summarise what we have read, appears to do something qualitatively different to passive reception of information.

Anyone who has experienced the moment of noticing a gap in their own understanding precisely because they tried to explain something will recognise the phenomenon immediately; what is less widely appreciated is that this effect has been extensively studied and quantified.

Where the research begins

In 1989, Michelene Chi and her colleagues at the University of Pittsburgh’s Learning Research and Development Center sat with physics students as they worked through textbook examples, recording what those students said aloud while they studied. The gap between the high-performing students and the low-performing students was not primarily a matter of intelligence or prior knowledge; it was a matter of what the students were doing cognitively while the material was in front of them. Students who performed well on subsequent problem-solving tests generated an average of 15.3 spontaneous explanations per worked example; students who performed poorly generated 2.8. The high performers were not simply reading more carefully, but were constructing meaning as they went, articulating the principles behind each step, questioning gaps between what the example showed and what they already understood, and repairing their own misunderstandings in real time (Chi, Bassok, Lewis, Reimann and Glaser, 1989).

More than three decades on, the self-explanation effect is supported by two formal meta-analyses, hundreds of empirical studies across multiple domains and age groups, and a coherent framework that connects to cognitive load theory, generative learning, and metacognition. For those of us in L&D, the research offers something relatively rare in a specific, evidence-grounded design principle with clear applications to how we structure training experiences, and an equally clear set of conditions under which it will not work.

The theoretical mechanism

Self-explanation refers to the process of generating explanations to oneself while studying material, typically by articulating the reasoning behind a step in a worked example, connecting new information to existing knowledge, or identifying and resolving gaps in one’s own understanding. Chi formalised the underlying mechanism across a series of papers and a definitive chapter in 2000: self-explanation drives learning through two distinct but related processes. The first is inference generation, where people go beyond what is explicitly stated in the material, drawing on general principles and prior knowledge to fill in the logical gaps that most instructional materials leave incomplete. The second is mental model repair, where the act of explanation forces learners to confront points at which their existing understanding is inconsistent with new information; the friction of that confrontation drives revision and integration (Chi, 2000).

All of this was shaped into a working model by VanLehn, Jones, and Chi (1992) through the Cascade model, which simulated what they called impasse-driven learning: the idea that self-explanation is triggered specifically at moments where the learner encounters a step they cannot justify, and that the process of resolving that impasse generates durable understanding. The implication for instructional design is that material needs to be designed to create productive impasses, to preserve some difficulty rather than eliminate it, a principle that connects directly to Bjork’s concept of desirable difficulties (Bjork, 1994).

Chi later situated self-explanation within her broader ICAP framework, developed with Ruth Wylie and published in Educational Psychologist in 2014 (Chi and Wylie, 2014). The framework classifies learning activities along a continuum from Passive (receiving information without overt response) through Active (physically engaging with material, such as highlighting) to Constructive (generating output beyond what is presented, which is where self-explanation sits) to Interactive (collaboratively constructing knowledge through dialogue). The prediction is that constructive activities produce deeper learning than active ones, and the evidence broadly supports this ordering.

Note: I’ll do another article exploring the ICAP framework in more detail, as whilst there is good evidence to support it, there are also some significant questions around it that are worth exploring, but not today.

If you’d like to explore how you can use research from psychology, neuroscience, the cognitive sciences and behavioural sciences in your work in L&D, HR, performance enablement or leadership, consider attending this year’s Evidence Informed Practice Conference.

As a reader of this Substack, you can get 25% off your ticket using code CPDW25 at checkout.

Learn more and book your tickets today!

The evidence base

The most comprehensive quantitative assessment of self-explanation comes from Bisra, Liu, Nesbit, Salimi, and Winne (2018), who published a meta-analysis in Educational Psychology Review looking at 64 studies involving approximately 5,917 learners across a range of ages, subjects, and contexts. The overall weighted effect size was Hedges’ g = 0.55, a medium effect by standard conventions, and one that held across most domains, age groups, and task types. Critically, this effect did not disappear when researchers controlled for time on task, addressing the most immediate methodological objection: that self-explanation simply produces better outcomes because it takes longer (Bisra et al., 2018).

An earlier mathematics-specific meta-analysis found a smaller effect of g = 0.39 for near transfer, with the authors noting that evidence for long-term retention and transfer to classroom contexts was considerably thinner than the laboratory-based evidence (Rittle-Johnson, Loehr and Durkin, 2017). This distinction between controlled conditions and authentic training contexts is one that any practitioner should hold clearly in mind when reading this literature.

The most influential assessment came from Dunlosky, Rawson, Marsh, Nathan, and Willingham’s (2013) review of ten learning strategies in Psychological Science in the Public Interest, which rated self-explanation at “moderate utility.” This placed it above highlighting, rereading, and summarisation, all rated low utility, but below practice testing and distributed practice, both rated high utility. The grounds for the moderate rating were:

long-term retention effects were inadequately evidenced,
the strategy had not been sufficiently tested in authentic educational or training settings,
and the time investment required was non-trivial relative to some alternatives (Dunlosky et al., 2013).

So, self-explanation is a real, replicable effect; it is not a fragile single-lab finding or a vendor-driven claim, and it has been replicated across countries, ages, and research groups over 35 years. Overstating its position in the evidence hierarchy, however, would not serve us well, and the honest reading of the research places it as a conditionally useful strategy with significant design requirements.

Connections

Self-explanation does not exist in isolation, and understanding where it connects to other frameworks makes its application more tractable.

The most direct relationship is with cognitive load theory (Sweller, 1988). Worked examples reduce extraneous cognitive load by removing the demands of means-ends problem-solving, freeing working memory for the kind of deeper processing that self-explanation represents. The connection was explored by Renkl and Atkinson (2003), who proposed a fading procedure in which worked-out solution steps are progressively removed while users are prompted to self-explain, creating a managed transition from supported to independent problem-solving.

Generative learning theory provides the broadest umbrella. Fiorella and Mayer (2015) identify self-explanation as one of eight generative learning strategies and their review found 44 of 45 studies showing self-explanation outperforming control conditions. The strategy fits squarely within Mayer’s Select–Organise–Integrate model: it activates the selection of relevant information, the organisation of that information into coherent structures, and the integration of new material with prior knowledge (Fiorella and Mayer, 2015).

Self-explanation and retrieval practice are complementary with retrieval practice testing what someone can recall, and self-explanation testing whether someone understands the reasoning behind what they have encountered. The two strategies target different aspects of the same goal of durable, transferable knowledge, and there are reasonable grounds to expect they would work well in combination within a single training design, though this specific pairing has not been extensively tested in workplace contexts.

Thanks for reading Instructional Design Tips! This post is public so feel free to share it.

Boundary conditions

When it comes to using self-explanation in our work, wee need to consider four boundary conditions sufficiently well-evidenced to be treated as design constraints.

The first is domain structure. Self-explanation works best in domains governed by principles and inferential structures, where gaps in a worked example can be filled by reasoning from general rules. Physics, mathematics, and clinical medicine are the most extensively studied examples. In domains where the relevant knowledge is predominantly declarative without inferential depth, such as vocabulary learning or second-language grammar acquisition, the evidence is considerably weaker; research across multiple studies found no self-explanation benefit for grammar learning (Wylie and Chi, 2014). The implication is that self-explanation is not a universal prompt to add to any training experience. It is most likely to help when the content has inferential structure that a learner can reason through.

The second boundary condition is expertise level. The expertise reversal effect, identified across multiple studies within cognitive load theory (Kalyuga, Ayres, Chandler and Sweller, 2003), applies to self-explanation. Leppink, Broers, Imbos, van der Vleuten, and Berger (2012) found a direct expertise reversal for self-explanation in statistics: low prior knowledge users benefited substantially from worked examples combined with self-explanation prompting, while high prior knowledge users benefited more from generating independent arguments without worked-example scaffolding. Renkl, Stark, Gruber, and Mandl (1998) found the same pattern with bank tellers studying compound interest, where the strategy was most effective for those with low prior knowledge in the domain. McNamara’s (2004) research with reading training found that the strategy doubled comprehension scores for low-knowledge readers and produced no measurable benefit for high-knowledge readers.

In corporate training, the audience is frequently composed of experienced practitioners who already have substantial prior knowledge in their domain. Self-explanation yields its largest returns for novices encountering unfamiliar material. So, we will get more value out of self-explanation when training new starters and those new to the world of work.

The third constraint is explanation quality. Renkl (1997) found that most people are naturally passive and superficial self-explainers when left without guidance. Roy and Chi (2005) reported that users incorrectly answered approximately 63% of their written self-explanation attempts. Aleven and Koedinger (2002), working with intelligent tutoring systems, found that without feedback, fewer than 10% of student attempts at self-explanation were substantively accurate. This means the common L&D interpretation of self-explanation theory, which is to add a reflective question to an e-learning module and assume comprehension has deepened, misrepresents what the research describes. The effect in the literature is not produced by asking people to explain themselves; but by scaffolded, prompted self-explanation that is checked for accuracy and supported through feedback.

The fourth constraint is time cost. McEldoon, Durkin, and Rittle-Johnson (2013) directly compared self-explanation against additional practice equated for time, finding that self-explanation took roughly twice as long as standard practice and produced an advantage that was present on some measures but more modest than the unadjusted effect sizes suggest. When the question is whether self-explanation beats an equivalent amount of additional practice, the case is considerably less clear than when it is compared against a passive control (McEldoon, Durkin and Rittle-Johnson, 2013). Practitioners designing time-constrained training need to weigh this cost.

The workplace evidence gap

There is a significant point to make about the state of this research as it applies to corporate and professional training: there is very little of it. The majority of self-explanation studies have been conducted in school and university settings with student populations studying mathematics, science, or reading comprehension. The transfer of those findings to adult professional training is a matter of informed extrapolation.

The most relevant professional application exists in medical education, where Chamberland and colleagues at the University of Sherbrooke have conducted a sustained research programme. Chamberland, Mamede, Bhattacharyya, St-Louis, Bolduc, Rivard, Leblanc, and Schmidt (2015) found that combining self-explanation with worked examples from senior residents and structured prompts produced the best diagnostic accuracy outcomes in medical students, with the benefit concentrated in less familiar clinical presentations, a finding consistent with the expertise reversal pattern observed elsewhere. Outside medical education, the most directly workplace-adjacent study in the published record is Renkl et al. (1998) with bank tellers, and that study involves student populations learning a domain new to them.

That is a thin evidential foundation for wholesale programme recommendations, though it does not argue against cautious and principled application where the conditions are favourable.

Get 5% off forever

What this looks like in practice

The first design decision is whether self-explanation is the right tool at all for the people and content in front of you. Ask two questions before building anything:

Are these audience new to this material?
Does the content have inferential structure, meaning there are principles at work that can be reasoned through rather than simply memorised?

If the answer to both is yes, you have the conditions the research supports. If your audience are experienced practitioners revisiting familiar territory, or the content is primarily a list of facts without logical connective concepts, a different approach is likely to serve you better.

When those conditions are met, the most practical starting point is the design of your worked examples. Rather than providing a complete model answer and moving on, structure examples so that a prompt follows each significant step, asking people to explain why that step produces that outcome. The prompt needs to be specific to the reasoning, not open-ended: “why does applying this principle here prevent the error in the next stage?” is doing the work you want; “what did you find interesting about this section?” is not. The former requires the person to resolve a specific inferential gap; the latter invites reflection without requiring comprehension.

As users progress through a sequence, begin removing completed solution steps and replacing them with gaps they must fill before moving forward. The full example at the start is scaffolding, not the destination; the goal is for everyone to be generating the reasoning themselves by the end of the sequence. A technical onboarding programme, a procedure-heavy compliance course, or a clinical or legal reasoning module are all contexts where this structure maps naturally onto the material.

For eLearning, the format of the prompt matters a lot. Asking people to type a free-text explanation sounds as though it would produce the deepest processing, but without a human reviewer or a robust automated feedback system you cannot tell whether the explanation is accurate, and an inaccurate self-explanation can reinforce errors rather than repair them. Where feedback on open responses cannot be reliably provided, presenting users with a set of candidate explanations and asking them to select the most accurate produces better outcomes than open generation without feedback. The selection format still requires them to evaluate and discriminate between reasoning options but removes the risk of low-quality explanations going unchecked.

Feedback is not optional in any format. In a face-to-face or virtual setting, build in structured moments where explanations are verified, not simply shared: a paired activity framed as “explain your reasoning, then agree between you where you differ, and flag anything you cannot resolve” gives you a feedback mechanism and a signal about where the group’s understanding is weakest. In coaching or mentoring, the self-explanation prompt is already implicit in good practice; “talk me through why you made that decision at that point” is a well-constructed self-explanation prompt with a built-in feedback loop, and most experienced coaches are already doing something close to this without necessarily labelling it as such.

The most notable implementation risk is the reflective question that resembles self-explanation but does not produce it. End-of-module reflection prompts, “pause and consider” screens, and learning journal entries are common in modern training design and are not without value, but they do not reliably generate the inferential processing the research describes unless they are anchored to specific content gaps and followed by some form of accuracy check. The mechanism that drives the effect is not reflection in the general sense; it is the act of resolving a specific gap in understanding, under conditions where a wrong answer has consequences. Designs that carry the label of self-explanation without those features are borrowing the credibility of the research without applying its requirements and limitations.