Continuing his series on the potential of retrieval practice, spaced learning, successive relearning, and metacognitive approaches in the classroom, Kristian Still looks at the research underpinning test-enhanced learning (repeated retrieval practice) and draws out important lessons for teachers


In this series, I am attempting to elaborate and share what the recipe of test-enhanced learning (more commonly known as retrieval practice), spaced learning, interleaving, feedback, metacognition, and motivation might look like in and out of the classroom.

Having reviewed the research and cognitive science behind these concepts, there are nine clear but interlinked elements which I am considering across nine distinct but related articles:

You might also listen to a recent episode of the SecEd Podcast (SecEd, 2022) looking at retrieval practice, spaced learning, and interleaving and featuring a practical discussion between myself, teacher Helen Webb, and Dr Tom Perry, who led the Education Endowment Foundation’s Cognitive Science Approaches in the Classroom review (Perry et al, 2021).

This series, in reviewing the evidence-base, seeks to help you reflect on what will work for you, your classroom, and your pupils. This is article two and it focuses on test-enhanced learning – more commonly known as (repeated) retrieval practice.


Repeated retrieval practice (test-enhanced learning)

Retrieval practice describes the process of recalling information from memory with little or minimal prompting. Low-stakes tests (questions or quizzes) are often used as methods of retrieval practice as these require pupils to think hard about what information they have retained and can recall … Testing learning is often a better strategy for learning than restudying or recapping.
Perry et al, 2021

What does this mean in practice? Simply, answering a question yields a higher probability of retrieving the correct answer to that question on a later test than does simply restudying the answer, particularly if correct answer feedback is provided.

This is variously known as the retrieval practice effect, the test-enhanced learning effect, or the testing effect.

Roediger & Butler (2013) report that testing is “a potent mechanism for enhancing long-term retention”.

In one of the seminal papers – Improving students’ learning with effective learning techniques (Dunlosky et al, 2013) – Dr John Dunlosky and his colleaguesidentify just two techniques to receive a high “utility” rating: practice testing and distributed (spaced) practice, because they “benefit learners of different ages and abilities and have been shown to boost students’ performance across many criterion tasks”. This finding was reaffirmed by Donoghue and Hattie (2021).

Retrieving information from memory causes learning. Both directly and indirectly. The direct effect of retrieval practice on learning and relearning refers to the fact that practising retrieval produces superior long-term retention, transfer, durability, and accessibility. Critically, these effects are due to the act of retrieval itself and occur even in the absence of feedback (Karpicke & Roediger, 2008).

Indirectly, retrieving information from memory helps organise practised materials, informs current learning, potentiates future learning, increases transfer of knowledge to new contexts, reduces test anxiety, and informs study approaches. Retrieval has even been shown to improve retention of non-tested but related material (Chan et al, 2006; Chan, 2009).

Notably, there are recent promising findings about the impact of testing before learning takes place – or “pre-testing” (Latimier et al, 2019; Pan et al, 2020). This might be a quiz on the timelines of events, introducing key vocabulary, or categorising. This not only primes learning but also helps teachers to assess prior knowledge (we will come back to pre-testing later in this series).

Suffice to say that given “the strong evidence for its memorial benefits, many cognitive and educational psychologists now classify testing as among the most effective educational techniques discovered to date” (Pan & Rickard, 2018).

With that baseline established, how does this relate to your teaching, learning and relearning in and beyond your classroom?


In and out of the classroom

Even those simple terms “test-enhanced learning” or “retrieval practice” hide a multitude of modulators or factors at play that teachers must consider when planning retrieval practice:

  • Subject or learner characteristics: Level of education and setting, abilities, prior knowledge, atypical learners…
  • Encoding activities: Context, setting, instructions, time between learning, retrieval and the final assessment, feedback, the learning activities, the number of exposures to learning.
  • Retrieval format (the way retention is measured): The format of retrieval practice (free recall from memory, cued or recognition recall), aural or visual, the response format (covert or overt – I will explain these terms shortly), the framing (paced or self-paced).
  • Materials and modality: The types of materials or modes we use – pictures/words (e.g. paired associate learning or completion of word fragments), sentences, general knowledge. But also the difficulty or complexity of materials. Collaborative or independent tasks? Websites and apps or pen and paper.

These four categories are as identified by Jenkins’ 1979 tetrahedral model, and they all have an influence on the effectiveness of retrieval practice: “The memory phenomena that we see depend on what kinds of subjects we study, what kinds of acquisition conditions we provide, what kinds of materials we choose to work with, and what kinds of criterial measures we obtain.” (Jenkins, 1979, cited in Roediger, 2008).

It is worth pointing out that most teachers enter the retrieval practice arena at point 3 – the way retention is measured. But this is at the point of assessing what has been learnt and risks not taking into consideration factors from points 1 and 2.

However, rather than offering simple answers, the main message is that we should not discount these other modulators and the impact they may have on retrieval success for any given pupil. But allow me to make some brief observations.


1, Subject or learner characteristics

After Roediger and Karpicke’s (2006) seminal paper – “frequent testing in the classroom may boost educational achievement at all levels of education” – there came a flurry of research. Adesope et al (2017) confirmed: “Results show that testing effects do not vary with study settings.” In fact, the use of practice tests by secondary pupils seems to have a bigger impact than at post-secondary.

Retrieval practice seems to be a learning technique that is not moderated by individual differences or with working memory capacity, thus possibly beneficial for all pupils (Agarwal et al, 2021; Yang et al, 2021; Bertilsson et al, 2021).

Notably, retrieval practice during learning, when accompanied by feedback, may serve to level the playing field for pupils with lower working memory capacities (Agarwal et al, 2017).

A word of caution, however. There appears to be a general declining impact of retrieval practice as pupils get younger (Leahy et al, 2015).


2, Encoding activities

There is no end of encoding – learning – activities (where incoming information from the environment is processed). That is teaching after all.

But there are other factors too – such as whether the subsequent test pupils take is a surprise or not, or the use of corrective or elaborative feedback (or none at all) – and there are many overlaps, as Roediger (2008) states: “For example, ‘instructions’ is listed under encoding, but of course, the effect of the instructions will depend on the type of subjects receiving them and the knowledge the instructions activate.”

For reasons of space, I simply cannot delve into this discussion further. For those whose interest is piqued, Roediger’s (2008) paper offers a discussion of different formats and how these might link to later retrieval success.

His paper asks a number of questions, not least: “Does deeper, more meaningful processing during encoding enhance retention relative to less meaningful, superficial analyses?”

Sadly, the conclusion – as so often is the case with research into memory – is “it depends”. It depends on “many other conditions” – not least the learners themselves. But again, that is teaching…


3, Retrieval formats

There are no fewer retrieval practice formats: Q&A, A&Q, matching exercises, timelines, grids to fill in, mind-mapping, dumps (write everything down that you can remember!), relay races (write everything down that you can remember and then pass to a peer), labelling, knowledge organising, ordering, summarising, flashcards, and plenty more.

Perry et al (2021) offer a useful list of low-stakes quiz formats: multiple-choice questions; short-answer fact questions; short problem-solving; true/false questions; labelling diagrams; image recognition; recitation of quotes or definitions; list-creation.

Researchers are a little more conservative, with most studies focusing on overt recall (see below) and categorising the possible approaches as either free, cued or recognition recall.

Free recall tests: For example, pupils study a word list and then recall the words from the initial presentation; short-answer questions also fall into this category.

Cued recall tests: For example, pupils study word pairs or paired-associates (e.g. flower-daisy). During retrieval, they only get the cue (flower–?). Other examples include cloze questions or fill-in-the-blank exercises. For cued recall, there is a positive testing effect in 96% of published experiments (Rickard & Pan, 2018).

Recognition recall tests: For example, the presentation of a familiar cue that has been encountered before, such as in multiple-choice questions or other “matching” exercises.

All of these retrieval formats have strengths and drawbacks, but Yang et al (2021) predict larger learning gains for difficult, free recall tests when compared to easier recognition tests. Having to retrieve the answer from long-term memory is more effortful than spotting the correct answer from clues in multiple-choice questions.

That said, you will find plenty of support for multiple-choice tests – “a cornerstone of assessment” that produces “significant enhancements” (Yang et al, 2021). And there is good reason, not least question reliability and ease of marking. You may also be seeking the lower challenge of “recognition recall” so that you can use the feedback opportunity as the learning. Two tips: make all questions required and provide feedback on correct answers.

The format of the final assessment matters too. This relates to “transfer-appropriate processing” – the principle that performance will be highest if the characteristics of the learning procedure are similar to those of the assessment procedure.

Having said this, some studies have explored how the testing effect is influenced by how cognitively demanding the retrieval processes are during the practice test. Kang et al (2007) report that pupils who took a short-answer practice test outperformed (on the final test) those who took a multiple-choice practice test – regardless of the format of the final test.

And so we find ourselves going full circle and back to retrieval practice format.

So, why is this important? Well suffice to say that difficulty is key, but that there is a fine line between difficulty and pupil motivation. And we might consider difficulty in two ways: how hard the actual practice test is but also the difficulty introduced by the spacing (how long we wait until we test/retest).

We will come back to these ideas in article three on spacing and later in the series, but we can say here that the benefits of retrieval practice are more pronounced as levels of processing during retrieval become more demanding (desirable difficulty).

So rather than telling you which kind of tests to use, what is more relevant to consider in terms of retention is the cognitive demand of the practice test (Rowland, 2014; Adesope et al, 2017; Yang et al, 2021).

And consider the purpose of the retrieval activity, too – is it to manage the entry to the classroom? Then ensure a low failure rate. Is it to activate or connect with prior learning? Then aim for a medium failure rate. Is it to potentiate learning? This is likely to have a high failure rate.

Also, Endres et al (2020) report that free recall tasks help learners “remember a broader spectrum of information” and “increased self-efficacy and situational interest” – we might say that retrieval sparks learners' curiosity”.

I’ll leave the final word to Perry et al (2021): “Planning test difficulty is particularly important – pupils should be able to retrieve at least some of the content they are tested on.”


Covert vs overt retrieval formats

Pupils retrieving information can answer in their heads (covert) or by writing or typing their responses (overt), or indeed they can do both.

Tauber et al (2018) had pupils study key terms and the corresponding definition. Pupils then restudied the key terms/definitions or tested themselves; the pupils who tested themselves typed the definition (overt) or recalled it mentally (covert).

Pupils in all three groups made judgements about their performance and returned two days later for a test. Final recall was “moderately” greater after overt retrieval than after covert retrieval or restudy. The recommendation is that pupils use “overt retrieval when using retrieval practice as a strategy to learn complex materials”.

However, research also suggests that overt and covert retrieval have similar effects, particularly when learning simple facts (Putnam & Roediger, 2013; Smith et al, 2013).

Of course, teachers will adopt different retrieval response formats for different contexts, materials and complexity.

As for the delivery of retrieval cues in the classroom, there appears to be a Goldilocks principle. De Jonge et al (2012) looked at study time and report that giving students too short (one second) or too long (16 seconds) a time to focus on a piece of information resulted in poor immediate and delayed recall. Giving students an intermediate “think time” (of four seconds) led to “less proportional forgetting”.

We know “think time” is important. But the message seems to be don’t leave the retrieval wheels spinning for too long when you could offer another cue or give this time over to discussing the correct answer or feedback and further retrieval opportunities.


4, Materials and modality

We still don’t have a good grasp of exactly what kinds of material can and cannot be retrieved or “how much” for that matter. Some researchers will tell you that the “testing effect is alive and well with complex materials” (Karpicke & Aue, 2015), while others fear that “the complexity of learning materials might constitute another boundary condition of the testing effect” (van Gog & Sweller, 2015).

Much of the interest around modality has focused on retrieval difficulty and the work of Bjork & Bjork, including their (2011) paper Making things hard on yourself, but in a good way: Creating desirable difficulties to enhance learning.

Learning strategies which slow or hinder encoding during learning produce superior long-term retention and transfer and there is a level of difficulty that is optimal. I have already discussed above the fine line between difficulty and motivation, and how we can introduce difficulty via both the spacing we choose and the challenge of the actual practice test.

But “difficulty” has other contributory factors: the initial learning activities, the retrieval formats, learners’ prior knowledge, the extent to which other information “interferes” with what is being learned, how much information is being retained, not to mention the cognitive biases or beliefs that pupils bring with them. This last point is crucial, as many learners choose not to use retrieval practice, believing it to be too difficult or ineffective.

And related to this, it has become clear that “successfully disseminating knowledge about strategies that produce desirable difficulties is often not enough to produce changes in learning behaviours” (Zepeda et al, 2020).

But we must persist as the research is placing greater emphasis on the importance of learner motivation (Bjork & Bjork, 2020; Finn, 2020).

Speaking to this point, Helen Webb – one of the guests on the SecEd Podcast episode – highlighted the importance of explaining the "why" of "what" we are doing in the classroom to students. She said: "Pre-empt their resistance by explaining that retrieval is more difficult. Getting the desirable difficulty correct requires team-work. If students don't engage is the test too hard or too easy? Get buy-in by explaining that their participation informs the next steps (reteach or re-quiz with easier/harder questions).”

Test-enhanced learning needs leading as well as teaching. Many learners choose not to use retrieval practice in favour of less effective strategies like re-reading and restudy. Even when presented with the evidence to contrary! We will come back to this issue when we discuss motivation and related factors later in the series.

When all is said and done, “the first time students retrieve shouldn't be on a … final exam”, as Dr Pooja Agarwal tweeted in December.


Feedback

The critical mechanism in learning from retrieval is successful retrieval or exposure to the correct response. Providing feedback after a retrieval attempt, regardless of whether the attempt is successful, helps to ensure that retrieval will be successful in the future. Indeed, offering corrective feedback can significantly increase learning gains (Rowland, 2014; Yang et al, 2021). Of course, there is more to it than that. So much so that article five will focus purely on feedback.


How often should we ‘retrieve’?

It is clear that a greater number of test repetitions yields a larger learning enhancement (I could quote any number of papers to support this). But what can we do here to help learning stick?

First, Rawson et al (2018) showed that the power of relearning is more than just a "dosage effect", meaning that it is more than the result of multiple exposures – for example, correctly recalling items one time across three sessions versus three times in one session yielded a 262% increase in retention test performance.

Second, the optimal spaced retrieval practice or relearning schedule depends on the memory strength of that item after initial encoding/learning (Mozer et al, 2009).

If the memory strength is relatively high, the interval between repetitions should be longer than if the memory strength is relatively low (Latimier et al, 2021). And successful knowledge encoding is suggested to be more effective when prior learning is reactivated and congruent.

You may be familiar with Rosenshine’s advice from his Principles of Instruction (2012): “Begin a lesson with a short review of previous learning.”

Retrieval practice offers a very obvious opportunity to activate prior knowledge, with van Kesteren et al (2018) reporting: “Reactivation of prior knowledge during learning of new information indeed results in stronger association of new learned information.”

Professor Graham Nuthall (2007) in The Hidden Lives of Learners (the only education book I have read three times) also offers a clue. He concludes that a student “understands, learns and remembers a concept if they have encountered all the underlying information three times”.

And “encountered” does not mean only on practice tests, but also during the natural teaching cycle – explanations, worksheets, going over class work, homework, starter retrieval quizzes, weekly or monthly quizzes, etc.

In their meta-analysis of 10 learning techniques, Donoghue and Hattie (2021) confirm the major findings from Dunlosky et al (2013), adding: “It is not the frequency of testing that matters, but the skill in using practice testing to learn and consolidate knowledge and ideas.”

We will come back to these themes in articles three (spaced learning), four (interleaving), and six (successive relearning).


Low-stakes

Finally, let’s consider a common phrase when discussing retrieval practice: “low-stakes”. This is the inference that when we are testing for learning the process should be non-threatening and the need to report scores unnecessary. In fact, if teachers utilise the metacognitive benefits of self-assessment, could the stakes be any lower?

However, if anything, I find that pupils want to share their scores or success. Perhaps the middle ground is pupils self-assess and keep a track.

Agreed, stress inhibits retrieval processes, but while the phrase “low-stakes” may be worthy, Yang et al (2021) report that “stake level” has no significant difference in testing benefits between high and low-stake quizzes.


Conclusion

In the words of Paul (2015), retrieval practice “treats tests as occasions for learning, which makes sense only once we recognise and accept that we have misunderstood the nature of testing”.

Or from Yang et al (2021): “Testing is not only an assessment of learning but also an assessment for learning."


Takeaways

  • “Short, low-stakes tests or ‘quizzes’ in various formats can be a cheap, easy-to-implement way of recapping material that might strengthen pupils’ long-term ability to remember key concepts or information.” (Perry et al, 2021)
  • For the biggest impact, consider free recall testing over recognition recall.
  • How will knowledge be assessed? Does the retrieval practice format align with the final test format? Does it need to?
  • Use overt recall for more complex material, although covert recall for facts is time-efficient, especially when relearning.
  • In the initial learning phase, consider three learning exposures combined with at least three spaced exposures.
  • Delivery (think time) of retrieval cues in the classroom benefits from being “just right”, but then teachers know this already.
  • Reactivate prior congruent knowledge when introducing new knowledge.
  • There is a fine line between difficulty and motivation. Pupils should be able to retrieve at least some of the content they are tested on.
  • “Quizzing or low-stakes testing may also reveal misconceptions. How will you ensure that where these emerge pupils are supported to overcome them?” (Perry et al, 2021)
  • And if you only have time to read one paper on this topic: Testing (quizzing) boosts classroom learning: A systematic and meta-analytic review (Yang et al, 2021): https://bit.ly/3I6uGtI


  • Kristian Still – @KristianStill – is deputy head academic at Boundary Oak School in Fareham. A school leader by day, together with his co-creator Alex Warren, a full-time senior software developer, he is also working with Leeds University and Dr Richard Allen on RememberMore, a project offering resources to teachers and pupils to support personalised spaced retrieval practice.Read his previous articles for SecEd via https://bit.ly/seced-kristianstill


References: For all research references relating to this article, go to https://bit.ly/38ryU1M

Acknowledgement: This article would not have been possible without the author’s conversations with and support from Kirby Dowler, a specialist leader of education and valued colleague. Her positivity and willingness to help road-test RememberMore and her subsequent feedback has been very much appreciated.

ResearchED: Kristian will be speaking at the first ever ResearchED Berkshire taking place at Desborough College in Maidenhead on May 7. Visit https://researched.org.uk/event/researched-berkshire/

RememberMore: RememberMore delivers a free, personalised, and adaptive, spaced retrieval practice with feedback. For details, visit www.remembermore.app or try the app and resources via https://classroom.remembermore.app/