“Be careful what you recall”: Retrieval-induced forgetting of genuine real-life autobiographical memories

Which episodes from our lives will be remembered and which will be forgotten, and why? This question has still not been answered satisfactorily by research into autobiographical memory. Previous work has shown that retrieval-induced forgetting (RIF) might be a factor responsible for forgetting parts of the autobiographical memory content. However, none of the previous studies assessed RIF in memories for recent, controlled, personal events. We report here the results of an experiment in which autobiographical memories of real-life events were induced in a controlled, but fully naturalistic, manner under the disguise of team-building exercises, while an adapted RIF paradigm was applied to these memories. Results clearly showed the influence of RIF on autobiographical memory retrieval. These findings demonstrate conclusively that RIF occurs in everyday life when remembering personal events.


Introduction
Research on autobiographical memory is typically concerned with the features of the remembered memories, their content and functions, and with the processes that lead to their retrieval. A question which is less frequently asked is what determines which episodes from our lives will be remembered and which will be forgotten (for a review: Eysenck & Groome, 2020; Vecchi & Gatti, 2020). One might be inclined to think that this is largely determined by the nature of the episodes, such as how important, emotional, or recent, they are (e.g., Berntsen & Rubin, 2002;Conway et al., 2004;Wilson & Ross, 2003). However, laboratory-based memory research on non-autobiographical episodic memory has offered another intriguing perspective on why we forget: if memories A and B are associated with one and the same cue, then the retrieval process of memory A following presentation of the cue may have a dual effect: it strengthens the retrieved memory A, and at the same time weakens memory trace B. This phenomenon of forgetting due to selective remembering was first shown by Anderson et al. (1994) and is referred to as retrieval-induced forgetting (RIF; for a meta-analytic review see Murayama et al. (2014) and for recent evidence see Abel & Bäuml (2020).
In the standard RIF procedure, participants learn items related to different categories (e.g., category 1: fruit, with items apple, orange, . . .; category 2: colour, with items green, blue, . . .). After the learning phase, in which all categories with their items were presented, a retrieval practice (Rp) phase takes place, in which half of the items from half of the categories are practised (Rp+), half of the items of these practice categories are not practised (Rp−), and the remaining categories are not presented (Nrp), creating a baseline to measure retrieval. For example, during the practice phase, the participant is presented with "fruit-a____," where "apple" should be retrieved, while "fruit-o____" is not presented and thus not practised; the category colour is not presented at all. During the final test participants are presented with all category cues and have to recall as many of the related items presented in the initial learning phase as they can. Typically, there are two main findings: (1) a larger percentage of items that had been practised in the retrieval practice phase (Rp+ items) is remembered, as compared to the percentage of items remembered from the non-practised categories (Nrp), a process called facilitation.
(2) Crucially, the percentage of remembered non-practised items (Rp−) from the retrieval-practised categories is lower than the percentage of remembered items from the non-practised category (Nrp).
RIF has proved to be a robust effect, found for a large variety of stimuli (visual scenes: Shaw et al., 1995;mathematical operations: Phenix & Campbell, 2004;propositions: Anderson & Bell, 2001;goals: McCulloch et al., 2008;motor actions: Tempel & Frings, 2013; self-relevant information about a social event: Glazier et al., 2021;pictures: Scotti et al., 2020). In contrast, the role of RIF in autobiographical memories has been investigated only scarcely. This is largely due to the very nature of autobiographical memories, which does not easily allow for the kind of experimental manipulation required by the RIF paradigm. Most of the studies of RIF on autobiographical memories that have been conducted (e.g., Harris et al., 2010;Hauer & Wessel, 2006;Matsumoto et al., 2021;Stone et al., 2013) followed the method devised by Barnier et al. (2004). In Barnier et al.'s (2004) study, participants were presented in the elicitation phase with cue words (e.g., "happy" and "work") and asked to recall several episodic memories from their own lives in connection with each cue word. After receiving retrieval practice for half of the memories from half of the categories, at the final test, they had to recall all the memories they had recalled during the elicitation phase. The problem with this design is that when participants retrieve fewer Rp− memories than Nrp memories, then this could simply be due to forgetting which specific memories they reported during the elicitation phase, rather than that the autobiographical memories themselves were forgotten.
In principle, this problem can be solved if, instead of using self-reported memories, one assesses the retrieval of real, experimenter-controlled, events or experiences that happened to the participant. There are some studies in which specific events happening in the laboratory are considered autobiographical memories. For example, Koutstaal et al. (1999) asked participants to perform 36 actions (e.g., hammering a nail and pouring beans into a container). Participants returned to the laboratory 2 days later and did a retrieval practice on half of the actions they had performed, cued by pictures of the same actions performed by actors (Experimental Rp+ and Rp− group), or did an unrelated task and received no retrieval practice (Nrp group). Twenty minutes later, they had to recall all the actions they had performed in the first session. This procedure was very similar to a typical RIF experiment, except that the Rp+, Rp−, and Nrp categories were tested as between-rather than as within-subject variables. The results showed the RIF effect: participants in the retrieval practice condition Rp− recalled fewer actions than those in the Nrp group.
In another study (Sharman, 2011), participants either performed or observed familiar or bizarre actions. All participants practised half of the actions immediately after performing/observing them, cued by the object names and one-word descriptions of the actions. RIF effects were observed in all conditions (familiar/bizarre actions and performed/observed). Although in this study participants recalled events that happened to them, the procedure did not allow sufficient time for memory consolidation. Hence, the tested memories were probably not genuine autobiographical memories.
Two studies (Conroy & Salmon, 2005 have obtained RIF in a somewhat more ecologically valid procedure in 5-to 6-year-old children. The aim of their study was to assess the impact of selective discussion on memory for non-discussed material. The children were asked to discuss (or not discuss) a staged event. The authors found that memory for non-discussed aspects in the "discussed" condition was impaired compared to memory for the same aspects in the control no-discussion condition.
Glazier and colleagues (2021) asked participants to speak publicly regarding any topic of their choice and provided a standardised mixture of positive and negative feedback on the speech. They reported the classic RIF effect, thus extending the evidence about it also to social stimuli like positive and negative feedback. However, evidence for RIF effects regarding first-person and real-life autobiographic memories is still missing.
Finally, Cinel et al. (2018) reported autobiographical memory RIF effects across a multiple-experiment study using a naturalistic design. In this study, the RIF effect was obtained in an object-location-comment associations paradigm performed during a scavenger hunt game. This study reported relevant findings regarding how end-of-day review can lead to augmentation in human memory. In Cinel et al.'s (2018) study, the RIF effect was obtained on stimuli explicitly encoded across different university locations and one could argue that, although participants experienced the events in first-person in a real-like context, the memories encoded were not incidental as a large part of humans' autobiographical memories are.

The current study
This study aims to solve some of the key problems associated with earlier studies investigating whether RIF applies to autobiographical memories, by inducing genuine, real-life, experimenter-controlled autobiographical memories in adults, rather than artificial, non-ecologically valid memories, as in earlier studies. The specific autobiographical memories are induced under the disguise of "team-building exercises" for groups (5-9) of undergraduate students on campus. The team-building exercises consisted of 20 clearly distinctive and memorable games, divided into two sets of 10 games, each set performed in a different location.
We argue that there are several characteristics that make this study better suited to study RIF of autobiographical memories than the ones before. We used controlled consolidated autobiographical memories that were obtained in natural situations. While in previous studies, the simple actions used as items were atypical for the situation, as we can assume that one could not expect to be asked to hammer a nail in a laboratory experiment, in the present experiment, we used complex actions, which were also consistent with the context presented for the study (which was introduced as a study on team building). They followed a behavioural sequence that was in line with the proposed games. Specifically, participants were recruited for a team-building session (there is nothing unusual about such a request as teambuilding games are frequently used in Social Psychology studies), and the games played were later used as the memory stimuli to be recalled in the subsequent phases. The memories were therefore appropriate for the context, as it is not surprising to play games during a teambuilding session, and incidental, as participants were unaware of the real aim of the study. In addition, games were not performed in the lab, but in spacious rooms in which real team-building exercises could have been held. For these reasons, we believe that the memories refer to more complex personal experiences and that the single games were a natural part of the situation that participants were experiencing. In addition, while in classical RIF studies, participants are instructed to study a set of words, in our study participants' memory performance was fully incidental (i.e., they were unaware of the real aim and were not asked directly to learn stimuli, but rather they were asked to play games) and multimodal (i.e., it involved complex team-building games played in first-person). Event complexity, personal involvement, the presence of actions and social interaction, as well as lack of intentionality during acquisition, represent key features of this procedure that ensures memories were about personal experiences (i.e., autobiographical). The retrieval practice took place 2 days after the games were played, leaving time for the memories to become consolidated into the autobiographical memory system. There is ample evidence of autobiographical memories being present after short and long time intervals, as most literature on this topic examines personal memories after very long delays, not just hours and days, but also years (just to mention some recent papers among the very large number that test autobiographical memories after short and long time intervals, see Addis et al., 2004;Lempert et al., 2017;London et al., 2009;Simons et al., 2021). The assumption that this time-lapse allowed for autobiographical memory consolidation is also based on the evidence that sleep modulates humans' memory aiding its consolidation (e.g., Gais et al., 2007;Stickgold, 2005; and for a review: Gais & Born, 2004). In this way, the effect of retrieval practice on real, consolidated, and autobiographical memories could be measured. Therefore, an RIF effect obtained under these conditions would indicate that RIF occurs in autobiographical memory.

Power analysis
The minimum sample size was estimated through G*Power (Faul et al., 2009) using as effect size d z = .88, α = .05, 1 − β = .95. The effect size estimation was performed on the RIF effect reported by Cinel and colleagues' (2018) in Experiment 2. The minimum sample size was 19.

Participants
In total, 35 participants took part in the experiment in six groups of 5-9 participants. All participants were undergraduate students from the University of Hull and participated in exchange for course credit. Three participants who did not complete all 15 pages of the practice booklet were removed from the analysis (see below). The remaining 32 participants consisted of 7 males and 25 females, with a mean age of 20.5 (SD = 2.5, range 18-28 years). The sample size was based on comparable and sufficiently powered previous RIF studies (Hanczakowski & Mazzoni, 2013). It is also in line with the classic study that first demonstrated the RIF effect (Anderson et al., 1994 n = 36) and most subsequent studies demonstrating the RIF effect (Murayama et al., 2014).

Procedure and materials
Participants signed up on campus to participate in a twosession team-building experience, as part of a study into how games help group formation. This was done to ensure the participants were unaware of the real aim, which was to create specific autobiographical memories in a controlled manner. The study contained three consecutive phases: an experience phase, a retrieval practice phase and a test phase. Until the test phase, participants were unaware of the real aim of the experiment, as we wanted to study the RIF effect using ecological paradigms tapping on incidental autobiographical memory processes. Participants completed the experience phase across two different rooms and then they were told that after 48 hr, they had to come back to another room (i.e., the lab) to complete the study.
Experience phase. The experience phase included 20 games in total, which were played in two sets presented in a fixed order due to the content of the games included in the two sets (i.e., the first set was planned to be an icebreaker session). The first 10 games (game set A) were played in a room in a building on campus under the guidance of the experimenter (room 1). After completion they were told that they had to leave the room because it was booked for somebody else but that they could continue in another room in another building on campus (room 2). The experimenter and all participants walked together to the new room, in which they played another set of 10 games (game set B). The two sessions were thus performed in direct succession, separated only by the time needed to walk from one building to the other (~ 5 min). The games played in the first room (set A) required the participants to sit on chairs at fixed positions in the room throughout the session and the games also required them to talk. The games in the second room (set B) required the participants to walk through the room, or to make other bodily movements, throughout the session, but did not require them to talk. In the Online Supplementary Material, we report the complete descriptions of the games included in both sets. The purpose of these differences in the nature of the games between sets A and B was to boost the formation of a link between the games played and the room in which they were played. Please, note that participants were expected to be very familiar with rooms and buildings, as all the sessions were performed in a specific group of buildings in which participants took lectures and where rooms and buildings are named after famous scientists. Game set A (sitting/talking games) was always played first, as it included ice-breaking games, followed by game set B (walking/movement games). Four rooms in two different buildings on campus were used. The interiors of the rooms were overall quite similar. Rooms/buildings order was counterbalanced across participants and sets. Half of the participants practised games from set A, and the other half games from set B. Four possible booklet sets were included in the practice phase (see below: Retrieval practice phase section) and were completely counterbalanced across participants. In the test phase, half of the participants were shown the final booklets starting with room 1, while the other half started with room 2 (counterbalanced across the two sets).
Retrieval practice phase. Forty-eight hours after the experience phase, the participants came to the laboratory (a different location from rooms 1 and 2 of the Experience phase). Immediately upon arrival, the retrieval-practice phase started. During the procedure participants were still unaware of the real aim of the experiment, which was only disclosed following the final test phase. To practice the retrieval practice phase, initially each participant received a practice booklet, in which each page contained one location-game cue pair (e.g., ROOM 250, Larkin Building-"I have never"). The participants had 1 min to write down in one or two sentences what they had done in that game. The experimenter indicated when the minute was over, after which participants turned the page. For the actual retrieval practice phase, half of the participants received booklets containing the names of five games from set A, the other half received booklets containing the names of five games from set B. Participants were therefore divided into four groups, depending on the practice booklets they received (containing cues referring to games 1, 4, 5, 7, and 9 from set A; to games 2, 3, 6, 8, and 10 from set A; to games 1, 4, 5, 7, and 9 from set B; or to games 2, 3, 6, 8, and 10 from set B). Each "location-game" cue pair was shown three times in the booklet in a random page order. The retrieval practice phase took about 15 min. Directly after completing the booklet a 20-min "distraction interval" started, during which participants were required to play a Sudoku puzzle.
Test phase. Directly following the distraction interval, participants received the final test booklets.
In these booklets, the only cue provided was the room and building where games were played. Half of the final booklets started with room 1, the other half started with room 2. This meant that half of the participants started with the practised category (Rp+) and the other half started with the not practised category (Nrp). They were asked to write down all the names of the games they remembered being played in that room. It should be noted that the names of the games were chosen to reflect the unique details of the specific event, that is, the name reflected the most characteristic aspect of the game that uniquely defined it. Thus, we assumed that remembering the name of the game was virtually equal to remembering the experience of the event. There was no time limit for completing the booklet but once recalling games from one room was finished and recall from the other room had started no more games could be added to the first room booklet.

Results
In the final test, the recall scores for Rp+, Rp−, and Nrp items did not differ between the group that started their recall with the room cue linked to the practised items and the group that started their recall with the room cue linked to the non-practised items, room 1 versus room 2, Rp+, t(30) = 0.28, p = .78; Rp−, t(30) = 0.293, p = .77; Nrp, t(30) = 0.82, p = .42. There was also no difference between the final recall rate of game set A (M = 7.13, SD = 1.18) and game set B (M = 7.22, SD = 1.36), t(31) = 0.337, p = .74. To check that the games had similar memorability, the total amount of recall in the final test was calculated for each game. Each game was recalled by at least 8 and maximally 31 participants. According to the Shapiro-Wilk's test, the distribution of the recall rates was not different from the normal distribution (S − W = 0.921, p = .1).
The final recall scores for the Rp+, Rp−, and Nrp categories are shown in Table 1. As there were twice as many potentially recalled games for the Nrp category than for either Rp+ or Rp− items, Nrp scores were divided by 2 to make recall scores comparable for analysis. To assess the benefit of retrieval practice, we first performed a one-way analysis of variance (ANOVA) having participants' memory performance as the dependent variable and type of item (Rp+ vs. Rp− vs. Nrp) as the categorical predictor; to exclude possible effects of which set was practised, such condition was included as an additional between-participants factor (set A vs. set B). The effect of type of item was significant, F(2, 60) = 28.50, p < .001. In particular, participants recalled significantly more Rp+ than Nrp items, t(31) = 7.38, p < .001, and fewer Rp− than Nrp items, t(31) = −2.32, p = .04 (Bonferroni corrected). The effect of group and the interaction type of item by group were not significant, F(1,30) = 2.01, p = .16, and F(2,60) = .04, p = .95, respectively.
We further examined whether the decrease in recall of the Rp− items was due to output interference rather than to the retrieval practice of the Rp+ items. Output interference refers to the possibility that items recalled early during the final recall session (which are likely to be the Rp+ items) interfere with the recall of subsequent items (which are likely to be the Rp− items; Roediger & Schmidt, 1980). To examine if such an output interference effect may have contributed to the low recall of the Rp− items, we conducted an additional analysis.

Discussion
This study aimed to find out whether retrieval practice affects the forgetting of every-day consolidated autobiographical memories. We obtained the expected retrieval practice effect for the Rp+ items (Rp+ > Nrp), but also an RIF effect for the Rp− items (Rp− < Nrp). Given the limitations of the small number of previous studies assessing RIF in autobiographical memory, and given the characteristics of the memories used in this study, the current finding represents the first evidence that RIF occurs in real-life incidental autobiographical memory. This study replicates and extends the evidence reported by seminal autobiographical memory studies on RIF (Cinel et al., 2018) using more complex stimuli and experimental procedure. Cinel and colleagues (2018) obtained the RIF effect in an objectlocation-comment associations paradigm performed during a scavenger task. Consistent with their findings, here we show that the RIF occurs also for memories encoded incidentally in real-life events.
Specifically, the stimuli traditionally used in RIF studies, such as word lists, pictures and text passages, are not self-relevant and more importantly are not embedded in an organised and interconnected autobiographical knowledge base as autobiographical memories are. In studies that did examine more complex memories, stimuli were not part of an autobiographical knowledge base but very simple, random actions, unexpected in the specific context in which they were performed (Koutstaal et al., 1999;Sharman, 2011) or, when expected, they were extremely simple (Glazier et al., 2021). In contrast, this study strongly indicates that retrieving personally experienced, consolidated, and interconnected personal episodic memories linked to a particular cue can cause the forgetting of other similar memories linked to the same cue. The results are in accordance with RIF effects induced by discussions of events in children (Conroy & Salmon, 2006) and by Cinel et al. (2018).
The present finding of RIF for autobiographical memories is in line with the prediction from current theories (Brown, 2005;Conway, 2005) claiming that specific autobiographical memoires are organised into categories. These categories can bind memories together based on chronological order, geographical sameness, thematic similarities, or causal relationship. In our study, the team games were grouped based on location (i.e., room, building) and type of activity (i.e., games played sitting vs. standing). These elements (location and type of activity) can serve as retrieval categories similar to the traditional semantic categories of RIF studies (e.g., fruits and animals).
There are two main theories attempting to explain the mechanism underpinning the RIF effect. According to the inhibition-based theory (Anderson, 2003;Anderson & Levy, 2002), to be able to recall several specific memories associated with a cue, other memories associated with the cue need to be inhibited. In contrast, interference-based theories claim that inhibition is not necessary to explain RIF (MacLeod et al., 2003;Raaijmakers & Jakab, 2013), retrieval of the items strongly related to the cue (practised items) interferes with the retrieval of weaker (not practised) memories during the final recall, without their memory traces being inhibited. We found that output interference is not supported in our data. Specifically, the difference in position between Rp+ and Rp− did not predict participants' memory performance in Rp− items. Thus, our findings are more consistent with an inhibition-based account. Several studies on the RIF have shown the predominant role of inhibitory processes (for a review Anderson & Hulbert, 2021). More recent reviews document how inhibition at retrieval is not just one of the mechanisms that by promoting memory loss, enhance other cognitive and noncognitive functions, such as facilitating retrieval of important information and minimising errors. It seems that these mechanisms might also affect directly more general mnemonic processes and create some forms of amnesia in nonclinical individuals (Anderson & Hulbert, 2021). Moreover, the links observed between memory-related inhibitory processes and frontal areas that exert control over memory processes (Anderson et al., 2016) insert such inhibitory processes as part of an essential executive/control function in human cognition. However, the present data do not provide a direct falsification of an interference-based explanation, as the output order was not controlled during the recall test by cueing individual test items. The choice of not cueing individual items was taken to avoid possible ceiling effects, thus opting for cueing using rooms and buildings names. In our study, thus, the contribution of inhibition versus interference processes in autobiographical memory cannot be fully disentangled, and it is possible that both processes are involved. As no independent recall cues were used in the final test, or retrieval practice with extra study, we cannot claim that the RIF effect observed here is due only to inhibition processes. We can only point to the fact that recall position did not predict memory performance as a suggestion that interference might not have played a major role. Still, interference might have contributed to some degree, as forgetting of Rp− items might have been in part due to the strengthening of Rp+ items at retrieval practice. While there is clear evidence that RIF is, at least partially, the result of inhibition at retrieval (e.g., Del Prete et al., 2015;Verde, 2013), additional experiments are needed to examine its contribution to the effect.
One might wonder why retrieval practice of items does not cause facilitation of the non-practised related items (Rp− items) because retrieval of one memory could in principle facilitate retrieval of related memories, as suggested by, for example, spreading activation (Collins & Loftus, 1975) and associative memory (Raaijmakers & Shiffrin, 1981). Indeed, previous studies found that under certain circumstances, Rp− items were remembered better than Nrp items (Anderson et al., 2000;Chan et al., 2006). Two main features have been identified that may help explain why sometimes a facilitation of the Rp− items is found. The first is the length of the delay between the retrieval practice and the final test session. While after a short delay (20 min) RIF is typically found, after a long delay (at least 24 hr) some studies reported facilitation (e.g., Chan, 2009), which may be related to the transient nature of RIF (Bjork et al., 2006). Note, however, that long-term RIF effects have been reported (e.g., Garcia-Bajos et al., 2009;Storm et al., 2012). The other identified feature is the extent to which the individual items are semantically/temporally integrated (a process first showed by: Anderson & McCulloch, 1999; for further evidence see e.g. Chan et al., 2006;Maxcey et al., 2018). Specifically, across three experiments, Anderson and McCulloch (1999) showed that instructing participants to interrelate category exemplars during an initial study phase reduced the RIF, thus suggesting that certain semantic structures in which the items are particularly interrelated might be resistant to RIF. In our paradigm, the delay between retrieval practice and test was relatively short (15 min) and all items consisted of distinctly different, unrelated games. This is likely causing the forgetting rather than facilitation of Rp− items in this study. In future studies, an independent cue (i.e., a different cue which equally well discriminates between the two sets of memories) could be used to further test the hypothesis that the RIF observed in autobiographical memory is better explained by the inhibition account than by the interference account.
Our finding that RIF of personal events occurs in naturalistic, yet staged, scenarios, strongly suggests that it also plays a role in determining what we remember of our spontaneous daily-life autobiographical experiences. We all know the saying "be careful what you wish for"; perhaps we should also say "be careful what you recall". The very act of recalling autobiographical memories biases our view of oneself and of others due to suppression of related autobiographical memories. One obvious real-world situation where this is particularly relevant is eyewitness testimony (e.g., Laney & Loftus, 2018;Schacter & Loftus, 2013). The act of repeatedly retrieving selected parts of a certain memory considered more crucial (the equivalent of the Rp+ retrieval phase) may, inadvertently, cause other related parts of the autobiographical memory (the Rp− items) not to be remembered. Thus, even though the eyewitness is entirely truthful in their testimony, they could produce biased evaluations of others and of events due to RIF (see: Storm et al., 2015, for an overview of real-world RIF applications within the autobiographical and other domains).
Finally, one main limitation should be acknowledged. In the testing phase, participants were only asked to remember the names of the games they played in each room. From our perspective, the names of the games represented a "title" for a complex personal experience and asking for the names of the games was a way to ask for the experience. This procedure in our opinion should have induced participants to rely on autobiographical memories of the games played, besides being the best option to ensure that participants' responses were fully quantifiable, avoiding the need to introduce qualitative judgements of participants' responses. However, we acknowledge that more basic episodic processes can be involved, and participants might have simply remembered, episodically, just the titles of the games. Episodic processes are commonly involved in autobiographical remembering (e.g., Schacter & Madore, 2016), but an autobiographical experience is certainly richer than just remembering titles of games. In our study, the names of the games were part of an experience/encoding phase which was completely different from classical word-list RIF experiments and was aimed at ensuring that memories of genuine personal experiences were created (autobiographical memories). Games were not performed in the laboratory, but in spacious rooms in which real team-building exercises could have been held. For these reasons, we believe that the memories, including the names of games refer to more complex personal experiences. The single games (and their names used as cues) were a natural, integral part of the situation that participants had experienced.
In conclusion, we believe that by using self-relevant memories that are embedded in an organised, interconnected, autobiographical knowledge base, this study demonstrated that RIF plays a role in determining which autobiographical memories are remembered and which are not.