Stroop Interference is a Composite Phenomenon: Evidence from Distinct Developmental Trajectories of its Components.

Only one previous developmental study of Stroop task performance (Schiller, 1966) has controlled for differences in processing speed that exist both within and between age-groups. Therefore, the question of whether the early developmental change in the magnitude of Stroop interference actually persists after controlling for processing speed needs further investigation; work that is further motivated by the possibility that any remaining differences would be caused by process(es) other than processing speed. Analysis of data from two experiments revealed that, even after controlling for processing speed using z-transformed reaction times, early developmental change persists such that the magnitude of overall Stroop interference is larger in 3rd - and 5th - graders as compared to 1st -graders. This pattern indicates that the magnitude of overall Stroop interference peaks after two or three years of reading practice (Schadler & Thissen, 1981). Furthermore, this peak is shown to be due to distinct components of Stroop interference (resulting from specific conflicts) progressively falling into place. Experiment 2 revealed that the change in the magnitude of Stroop interference specifically results from joint contributions of task, semantic and response conflicts in 3rd - and 5th -graders as compared to a sole contribution of task conflict in 1st -graders. The specific developmental trajectories of different conflicts presented in the present work provide unique evidence for multiple loci of Stroop interference in the processing stream (respectively task, semantic and response conflict) as opposed to a single (i.e., response) locus predicted by historically-favored response competition accounts. This article is protected by copyright. All rights reserved.


INTRODUCTION
The Stroop task (Stroop, 1935) requires individuals to identify, as quickly and accurately as possible, the font color of written words without reading them. Despite this requirement, the typical result is that individuals' identification times are longer and more error-prone for color-incongruent Stroop words (i.e., words displayed in a color that is different from the one they designate such as "BLUE" displayed in green ink; hereafter BLUE green ), than for color-neutral items (e.g., "DOG"/ "XXX" displayed in green ink,

Developmental change in Stroop Interference
In their cross-sectional study, Comalli, Wapner, and Werner (1962) were the first to report the change in the magnitude of Stroop interference across the life-span. More specifically, they reported that this magnitude consistently decreases during the course of early development (from 7 to 19 years) and that -after a period of stabilization during early and middle adulthood (from 19 to 44 years) -it increases during late adulthood (from 65 to 80 years). Since this seminal study, developmental studies of Stroop interference continued to flourish, albeit with a specific focus on either early (childhood and adolescence) or late (adulthood and aging) developmental change. Moreover, these lines of research have evolved in separation.
Interestingly, for a period of time, the importance of the late developmental change in Stroop interference had been called into question (e.g., Verhaeghen & De Meersman, 1998; see also Verhaeghen, 2011). Indeed, age-related differences in the magnitude of Stroop interference, where they had been seen at all, were merely attributed to a decrease in processing speed that is strongly associated with normal aging (e.g., Salthouse & Meinz, 1995).
Only more recent studies have demonstrated that age-related differences in Stroop interference actually persist even after controlling for general slowing (e.g., Augustinova, Clarys, Spatola, & Ferrand, 2018;Bugg, DeLosh, Davalos, & Davis, 2007;Jackson & Balota, 2013;Spieler, Balota, & Faust, 1996;Wolf et al., 2014). Given this control, these studies convincingly showed that the generalized slowing account (e.g., Myerson, Hale, Wagstaff, Poon, & Smith, 1990;Salthouse, 1996) is not sufficient to fully explain the age-related differences in the magnitude of Stroop interference in later life. And thus, they reinstalled the relevance of a popular account of these differences in terms of the decline in inhibitory control (Hasher & Zacks, 1988). It should be remembered that in the Stroop task, the irrelevant word-dimension of Stroop words (i.e., "blue" for BLUE green ) needs to be inhibited to allow individuals to generate the correct response based on the relevant color dimension of Stroop words (e.g., "green" for BLUE green ). Consequently, a higher level of Stroop interference is often equated with lower inhibitory control (e.g., Miyake et al., 2000).
The inhibition-based account is also dominant in early developmental studies. Circuits in the prefrontal and anterior cingulate cortex continue to mature until the late twenties causing the continuous improvement of the executive functions -including inhibitory control -that are based on these circuits (e.g., Davidson, Amso, Anderson, & Diamond, 2006;Luna, et al., 2001;Luna, Garver, Urban, Lazar, & Sweeney, 2004;Munoz, Broughton, Goldring, & Armstrong, 1998;Prencipe et al., 2011; for reviews see Diamond, 2002;Diamond, 2013). As a result, in developmental neuropsychological research and practice, inhibitory abilities are thought to present a specific developmental trajectory that is clearly captured by changes in the magnitude of Stroop interference during childhood and adolescence (e.g., Armengol, 2002;Roy et al., 2017; see also e.g., Aïte et al., 2018;Peru, Faccioli, & Tassinari, 2006; see also here below).
However, exactly like during the course of late-development discussed above, the early developmental trajectory of inhibitory abilities and the one of processing speed overlap (i.e., both increase during the course of early development; e.g., Kail, 1991Kail, , 2007Fry & Hale, 1996;Nettelbeck & Burns, 2010) -making it difficult to disentangle the specific contribution of these processes to the magnitude of Stroop interference. Despite this latter fact, only one early developmental study actually controlled for processing speed (Schiller, 1966). Therefore, the aim of Experiment 1 was to examine whether, and the extent to which, the early developmental change in the magnitude of Stroop interference still occurs when agerelated processing speed differences are accounted for (see e.g., Lété & Fayol, 2013;Ziegler, Lété, Bertrand, & Grainger, 2014 for this type of endeavor in children's studies). If this is the case, then a remaining question is which process or processes underlie this change; a question we aimed to address in Experiment 2.

Processes underlying the early developmental change in Stroop interference
As already mentioned, early developmental change in the magnitude of Stroop interference is often thought to reflect improved inhibitory control during childhood and adolescence. Results of several studies -including the seminal study of Comalli and colleagues (1962) -are consistent with an inhibition-based account (see above). Indeed, they reported the highest Stroop interference in seven-years-olds (i.e., the youngest age-group investigated), the magnitude of which continued to decrease until the age of nineteen (see also e.g., Armengol, 2002;Peru, Faccioli, & Tassinari, 2006;Roy et al., 2017).
However, it is not the inhibition-based account that is mobilized by Comalli and colleagues (1962) to interpret their results. They argue that there is actually a one-to-one relationship between the magnitude of Stroop interference that children display and their "(…) capacity to maintain a course of action in the face of intrusion by other stimuli" (p.47).
Said differently, these authors were the first to conclude that the modification of Stroop interference during childhood is clearly determined by what is now termed as the capacity of goal -or task-set -maintenance (see e.g., Chevalier & Blaye, 2008; see also De Jong, Berendsen, & Cools, 1999;Parris, Bate, Brown, Hodgson, 2012). The distinction between goal maintenance and inhibitory capacity (e.g. De Jong et al., 1999) reflects the debate about whether selective attention operates by enhancing the processing of the relevant dimension or enhancing inhibitory control of the irrelevant dimension (e.g., Egner & Hirsch, 2006).
Finally, a quick inspection of other early developmental Stroop studies indicates that in some quarters (e.g., developmental psycholinguistics), the early developmental change in Stroop interference is thought to index the automatization of word-recognition and/or reading skills in general (e.g., Ehri, & Wilce, 1983;Stanovich, Cunningham, & West, 1981). In contrast to studies above emphasizing a continuous decrease in the magnitude of Stroop interference across childhood (e.g., Comalli et al., 1962;Roy et al., 2017), such a conceptualization predicts an initial increase in Stroop interference when children are first learning to read. Several past findings -including the one that controlled for age-groups differences in processing speed (Schiller, 1966) -are in line with this conceptualization. A robust Stroop interference is indeed detected in 1 st -graders (6-7 years old) after only five months of reading instruction (Stanovich et al., 1981) and its magnitude has been reported to peak after two or three years of reading practice (Schiller, 1966;Schadler & Thissen, 1981).
Consistent with this latter finding, Rand, Wapner, Werner, and MacFarland (1963) found the greatest Stroop interference in a group of nine-year-olds, the magnitude of which was significantly greater than the one observed in a group of six-years-old pre-readers.
To sum up, the different lines of research presented above clearly diverge on the nature of the process underlying the early developmental change in the magnitude of Stroop interference. For some this change reflects a developmental trajectory of goal maintenance, for others the development of inhibitory control, and others still, the developing automatization of word-reading. However, these approaches to explaining the early developmental change in Stroop interference do all appear to agree on the fact that Stroop interference is a unitary phenomenon, and such is to be explained by a single process. This latter view contrasts with several other lines of research suggesting that Stroop interference is a more complex phenomenon, potentially influenced by more than one process.
To illustrate, Bub, Masson and Lalonde (2006) argue that the magnitude of Stroop interference reflects both children's inability to consistently maintain the task set of color naming and to suppress irrelevant responses stemming from the word-dimension of Stroop words. However, the specific contribution of goal-maintenance vs. inhibition to overall Stroop interference was not measured directly in this study. Megherbi and colleagues (2018), on the other hand, suggest that the magnitude of Stroop interference displayed by children is a result of both the effect of the mandatory decoding of the distracting words and the capacity of inhibition (see also e.g., Peru et al., 2006;Wright & Wanley, 2003). Again, the specific contribution of word-reading vs. inhibition to the overall Stroop interference was not measured directly in this study.
These latter lines of research agree that the processes underlying the early developmental change in the magnitude of Stroop interference are twofold -one of which being the improvement of inhibitory abilities. But they clearly diverge on the nature of the other process involved. For Megherbi et al. (2018), the change in the magnitude of Stroop interference during childhood also indexes a developmental trajectory of automatization of word-reading, whereas it is goal-maintenance for Bub and colleagues (2006). However, it is critical to note that, just as in studies subscribing to single-process approaches discussed above, these allegedly multiple processes were in fact merely inferred from the overall magnitude of Stroop interference (e.g., BLUE green -DOG/XXX green ). Therefore, the extent to which single versus multiple processes actually determine the as discussed in the following section.

Single-vs. Multi-stage accounts of Stroop interference
The aforementioned popular idea that the magnitude of Stroop interference specifically indexes inhibitory abilities (see section 1.1. and 1.2.) is rooted in single-stage response competition accounts that have historically been favored in the Stroop literature (Augustinova, Silvert, Spatola, & Ferrand, 2018;Risko, Schmidt, & Besner, 2006). These kinds of single-stage accounts share the aforementioned idea that word-reading -routinized in skilled readers -provides the basis for a response (i.e., "blue" for BLUE green ). Because this incorrect response interferes with the one cued by the relevant color dimension of colorincongruent Stroop words (i.e., "green" for BLUE green ), it gives rise to so-called response conflict -the magnitude of which is determined by the degree of inhibitory control. In this view, Stroop interference is thought to specifically index the magnitude of this latter conflictoccurring in the processing stream at the level of response output (e.g., Hommel, 1997;LaBerge & Samuels, 1974;Morton & Chambers, 1973;Shiffrin & Schneider, 1977). These latter accounts prevailed over another class of single-stage accounts -earlyselection accounts. They share the idea that a single conflict generating Stroop interferenceso-called stimulus conflict -actually occurs much earlier in processing (as compared to response conflict depicted above). For instance, Seymour (1977) considers that this (early) conflict occurs at conceptual encoding of color-incongruent words (e.g. BLUE green ) because the meaning of the word dimension (i.e., blue for BLUE green ) and that of the color dimension (i.e., green here) both correspond to colors. Indeed, "(…) delays of processing occur whenever distinct semantic codes are simultaneously activated, and that these delays become acute when the conflicting codes are values on a single dimension or closely related dimensions." (p. 263; see also e.g. see also e.g. Luo, 1999;Scheibe, Shaver, & Carrier, 1967;Seymour, 1974;1977;Stirling, 1979).
In sum, "The early-selection account focuses on the similarity between the relevant stimulus and the irrelevant stimulus, whereas the late-selection account focuses on the similarity between the irrelevant stimulus and the response. Both similarity relationships are, of course, present in the Stroop task -in fact, they constitute a confounding that makes distinguishing empirically between the two accounts difficult." (Zhang & Kornblum, 1998, p. 4). It is thus not surprising that the first multi-stage accounts assumed that color-incongruent words (e.g. BLUE green ) generate both stimulus (SC) and response conflicts (RC; hereafter SC-RC accounts, see  for this terminology and review of these accounts).
However, several other multi-stage accounts also assume that Stroop interference results from the simultaneous contribution of two distinct conflicts. In addition to response conflict as depicted above, they assume the existence of so-called task conflict (TC; hereafter TC-RC accounts, see Parris et al., submitted; for reviews) instead of the semantic conflict assumed by the aforementioned SC-RC accounts. Task conflict is thought to arise for all kinds of readable items (including color-congruent words, e.g., BLUE blue ) and is thus independent of the specific color-incongruency conflict occurring for color-incongruent Stroop words (e.g., BLUE green ). This is because the individual's attention is drawn to an irrelevant task (i.e., word-reading) instead of being fully focused on the relevant task (i.e., color-naming), leading to the two task sets competing (e.g., Goldfarb & Henik, 2006, 2007Kalanthroff, Goldfarb, Usher, & Henik, 2013;Monsell, Taylor & Murphy, 2001;Parris, 2014 for empirical demonstrations; see also e.g., Bench, Frith, Grasby, Friston, Paulesu, Frackowiak, et al., 1993 for fMRI evidence).
To sum up, the SC-RC and TC-RC multi-stage accounts described above anticipate the contribution of three distinct conflicts to overall Stroop interference. The SC-RC accounts emphasize that semantic and response conflicts need to be de-confounded and -as suchassessed in separation whereas the TC-RC accounts express the very same concerns for task and response conflicts. Given that a considerable behavioral, EEG and fMRI evidence points to the viability of both SC-RC and TC-RC multi-stage accounts of Stroop interference (see above), several lines of research highlight the necessity to adopt an integrative perspective that allows for bridging these two perspectives Parris, Hasshim, Wadsley, Augustinova & Ferrand, submitted; for reviews).
This integrative perspective posits that all three conflicts (i.e., task, semantic and response conflicts) contribute to standard Stroop interference and are thus confounded in its overall magnitude. By providing direct and simultaneous evidence for each of the three conflicts, Augustinova and colleagues (2018) showed that is indeed the case (see section 3 for further details). Therefore, this work not only strongly reaffirmed that the standard (i.e., overall) Stroop interference constitutes a composite and not a unitary phenomenon, but also clearly showed the relevance of an integrative perspective bridging SC-RC and TC-RC multistage accounts.
So far, however, this integrative perspective is supported by only a single empirical study. Additional converging evidence is therefore required to strengthen its case. If it can be shown that task, semantic and response conflicts actually present with specific developmental trajectories, this would provide unique evidence in favor of an integrative multi-stage perspective of Stroop interference Parris et al., submitted). Therefore, Experiment 2 was specifically aimed at providing this kind of evidence. As noted above however, before addressing this more complex question, we first set out to determine whether the oft-reported developmental change in Stroop interference (color-incongruentcolor-neutral trials; BLUE green -XXXX green ) survives controlling for response speed.

EXPERIMENT 1
The studies of the late developmental change in the magnitude of Stroop interference discussed above have significantly increased Stroop researchers' interest in the need to control for processing speed. However, as already mentioned, at least to our knowledge, only one early developmental study had actually involved this type of control (Schiller, 1966). Because all other past studies compared untransformed raw RTs, the early developmental change in Stroop interference itself, as well as the implication of its underlying processes (e.g., inhibition) might have been exaggerated. Additionally, the previously reported discrepancies across studies concerning the actual shape of this early developmental change (see section 1.2.) might also result -at least in part -from important differences in processing speed that exist across different age-groups, in addition to existing inter-individual differences within these groups of children (e.g., Faust, Balota, Spieler, & Ferrero, 1999 for this reasoning about Stroop interference that groups of younger vs. older adults display). Therefore, Experiment 1 was designed to examine the extent to which the early developmental change in the magnitude of Stroop interference still occurs when age-related processing speed differences are accounted for.
To this end, participating children (N=218) presenting different levels of automatization of word-recognition and/or reading skills (i.e., 1 st -graders (6-7 years old), 3 rdgraders (8-9 years old) and 5 th -graders (10-11 years old)) were administered a standard Stroop task. Collected mean RTs to color-incongruent words (BLUE green ) and color-neutral items (XXXX green ) were subsequently analyzed before and after being converted into z-scores. This latter transformation was based on Faust and colleagues (1999, p. 788 Considering the findings of the only other study that controlled for processing speed (Schiller, 1966; see also late developmental studies cited above), we, a priori, expected the early developmental change to be reflected in the magnitude of z-transformed Stroop interference. However, given the discrepancy of past studies concerning the trajectory of early developmental change in the magnitude of Stroop interference (a consistent decrease as a function of age versus a peak after a few years of reading practice), we did not further predict how the magnitude would change across those age-groups showing robust Stroop interference.

Participants and inclusion criteria
234 right-handed native-speakers (presenting normal or corrected-to-normal vision) were recruited from two public elementary schools in Aix-en-Provence (France). The standardized reading score in Alouette test (Lefavrais, 2005) was used as the inclusion criterion. Thus, sixteen children -whose score was 18 months below the expected level (i.e., formal criterion used by dyslexia assessment centers in France)-were excluded from data analyses. Information about gender, chronological and reading age of the remaining 218 children is provided in Table 1.

Design and Stimuli
The data was collected using a 2 (Stimulus-type: standard color-incongruent words vs. color-neutral letter strings) × 3 (Age-group: 1 st -graders vs. 3 rd -graders and 5 th -graders) design, with Stimulus-type as a within-participants factor. There were 24 trials for each Stimulus-type factor condition, whose presentation order was randomly determined for each participant within a single block of 48 experimental trials. [green]) and strings of xxx of the same length as the color-incongruent trials. Colorincongruent words always appeared in colors that were incongruent with the meaning of their word-dimension.

Apparatus and Procedure
All aspects of this experiment were approved by the Statutory Ethics Committee of Children were seated approximately 60 cm from the computer screen. A 17-inch HP laptop computer was used for stimulus presentation and data collection was done with DMDX software (Version 2.9.01; Forster & Forster, 2003). Their task was to identify the color of the letter-strings presented on the screen (while ignoring their meanings), as quickly and accurately as possible, by pressing with their dominant hand one of four color-keys on a button box specifically designed for this experiment. To this end, children were instructed to focus on the fixation cross (i.e., not to move their eyes from it). This white cross ("+") appeared in the center of the (black) screen for 500 ms and it was then replaced by a letterstring that continued to be displayed until the child responded (or until 3500 ms had elapsed).
Once the participant had responded, the screen was cleared and a new trial began after a 1000-ms delay. The children were familiarized with these requirements during a set of 12 practice trials consisting of strings of various letters (e.g., "bbbb" presented in green color) that was then followed by a block of experimental trials.

Results and Discussion
Response times (RTs) greater than 3 SDs above or below each participant's mean latency for each condition (i.e., less than 2% of the total data) were excluded from the analyses. The data were subsequently analyzed in a 2 (Stimulus-type: standard colorincongruent words vs. color-neutral letter strings) × 3 (Age-group: 1 st -graders vs. 3 rd -graders vs. 5 th -graders) within-participants ANOVA.  Table 2 for descriptive statistics). To test this hypothesis, and as in , we supplemented the standard color-incongruent words (e.g., BLUE green ) and color-neutral words (e.g., DOG green ) that are commonly used in the standard Stroop task (see section 2) with two additional types of Stroop items. This extended form of the semantic Stroop paradigm (e.g., Augustinova & Ferrand, 2014 for the original version) thus also comprises color associated incongruent words (e.g., SKY green ) and color-neutral letter-strings (e.g., XXX green ). The inclusion of those two additional types of Stroop items allows for the de-confounding of the three types of conflict, as explained below (see Figure 1).

FIGURE 1 ABOUT HERE
First, the inclusion of the color-neutral letter-strings allows the separation of the effect of task conflict from the other two conflicts. Because the irrelevant dimension of most stimuli that this paradigm contains is readable (i.e., composed of letters), it is assumed that they all generate task conflict (see section 1.3.). Also, and importantly, they do so to the same extent, except for the non-readable color-neutral letter-strings (e.g., XXX green ). In line with the bimodal, interactive activation model with (amodal) semantics (McClelland & Rumelhart, 1981;Grainger & Ferrand, 1996;Ferrand & New, 2003), the processing of the written dimension of these color-neutral letter-strings (i.e., xxx) stops at the orthographic pre-lexical level. The processing of the written dimension for all other stimuli composed of words (e.g., dog, sky and blue) stops on the other hand with access to meaning (i.e., after a full chain of visual, orthographic, lexical and semantic processing has come to completion).
Consequently, and in line with the subtractive logic of this paradigm, the significant difference in mean response latencies between Stroop color-neutral words and letter-strings (e.g., DOG green -XXX green ) that was observed in Augustinova et al.'s study was taken to solely reflect differences in activation of the irrelevant reading task set and hence of the differential amount of the task conflict that this entails (see Figure 1). Indeed, because the meaning of color-neutral words (e.g., dog for DOG green ) is not related to a color (unlike sky or blue), the aforementioned contribution of task conflict to overall Stroop interference is not intermixed with that of the semantic and response conflicts that are generated by color-incongruency.
Turning now to the separation of semantic and response conflicts, numerous studies have shown that the semantic conflict is caused by color-incongruency (see Seymour's reasoning depicted in section 1.3. and e.g., Augustinova & Ferrand, 2014b for a review). Also, and importantly, in line with Seymour (1977), semantic conflict is generated to the same extent by associated (e.g., SKY green ) as compared to standard (e.g., BLUE green ) Stroop words (e.g., Augustinova et al., 2015 for N400-like evidence). Consequently, the significant difference in mean response latencies between color-associated and color-neutral trials (e.g., SKY green -DOG green , see Figure 1) that was also observed in the study of Augustinova and colleagues (see Figure 1) was taken as evidence of the semantic conflict that color-associated (e.g., SKY green ) unlike color-neutral (DOG green ) Stroop  Once the irrelevant word dimension of standard incongruent trials has been semantically processed, it indeed primes the aforementioned (pre-)response tendency that for these words (e.g., blue for BLUE green ) is in the response set. It therefore interferes with the (pre-)response tendency primed by the meaning of the relevant color-dimension (green here).
Consequently, the significant difference in mean response latencies between standard and associated color-incongruent trials (e.g., BLUE green -SKY green , see Figure 1) observed in the study of Augustinova and colleagues (see Figure 1) solely results from this (pre-)motor (i.e., response) conflict occurring at the level of response processing and/or output. Indeed, both task and semantic conflict are equal in those two types of color-incongruent items (BLUE green and SKY green , see above).
Therefore, the positive difference in mean response latencies between color-neutral words and letter-strings (e.g., DOG green -XXX green ) was used to capture the specific contribution of task conflict to overall Stroop interference displayed by the children participating in this experiment (see Figure 1). Additionally, the positive difference in mean response latencies between color-associated and color-neutral trials (e.g., SKY green -DOG green , see Figure 1) was used to isolate the specific contribution of semantic conflict to overall Stroop interference. And finally, the positive difference in mean response latencies between standard color-incongruent and color-associated trials (e.g., BLUE green -SKY green , see Figure  1) was used to capture the specific contribution of response conflict to overall Stroop interference (e.g., BLUE green -SKY green . At least some specific contribution of task conflict to overall Stroop interference was expected in all age-groups. Even in pre-readers, visual expertise for letters is known to be present (Maurer, Brem, Bucher, & Brandeis, 2005), suggesting that for all our age-groups (all of whom had at least begun to receive reading instruction) one would expect written stimuli to trigger task irrelevant processing. In contrast, the contribution of semantic and response conflicts was expected to differ across age-groups. It should be remembered that both of these latter conflicts are generated by color-incongruency -involved in color-incongruent words (BLUE green and SKY green ) -occurring once the written-dimension of these words has been not only visually, orthographically, lexically but most importantly semantically processed.
Because this latter processing chain -typical of fully developed word-recognition -only occurs in more advanced readers, the specific contribution of the semantic and response conflicts to overall Stroop interference was expected to be significant in only in 3 rd -graders (8-9 years old) and 5 th -graders (10-11 years old), as opposed to 1 st -graders (6-7 years old) that are only starting to receive reading instruction in France.

Participants and inclusion criteria
173 native-speakers (presenting normal or corrected-to-normal vision) were recruited from a single private elementary school in the suburbs of Clermont-Ferrand (France). The standardized reading score in Timé test (Ecalle, 2003;2006) was used as the inclusion criterion. Thus, twelve children -whose score was 18 months below the expected level (i.e., as in Experiment 1)-were excluded from data analyses. Twenty-four other children were also excluded from further analyses due to a malfunctioning microphone. Information on gender, chronological and reading age of the remaining 137 children is provided in Table 3.

Design and Stimuli
The data were collected using a 4 (Stimulus-type: standard color-incongruent words vs. associated color-incongruent words vs. color-neutral words vs. color-neutral letter strings) × 3 (Age-group: 1 st -graders vs. 3 rd -graders and 5 th -graders) design, with Stimulus-type as a within-participants factor. There were 36 trials for each Stimulus-type factor condition, whose presentation order was randomly determined for each participant within a single block of 144 experimental trials. Thus, as in Experiment 1, 50% of the experimental trials involved colorincongruency (i.e., standard color-incongruent words and associated color-incongruent words), whereas 50% of the experimental trials where color-neutral (color-neutral words and color-neutral letter strings).

Apparatus and Procedure
All aspects of this experiment were approved by the South-Eastern Statutory Ethics Committee (

Results and Discussion
Response times (RTs) greater than 3 SDs above or below each participant's mean latency for each condition (i.e., less than 2% of the total data) were excluded from the analyses. In order to control for differences in processing speed, these were then transformed into zRT (see section 2 for further details). Mean raw and zRT along with error rates were subsequently analyzed in a 4 (Stimulus-type: standard color-incongruent words vs. associated color-incongruent words vs. color-neutral words vs. color-neutral letter strings) × 3 (Agegroup: 1 st -graders vs. 3 rd -graders and 5 th -graders) within-participants ANOVA (see Table 4 for descriptive statistics).

Analyses of the overall Stroop interference (observed with vocal responses)
Further contrast analyses (with Bonferroni corrections to counteract problems associated with multiple comparisons) revealed that, the mean zRTs for standard colorincongruent words (e.g., BLUE green ) was significantly greater than the one for color-neutral  Table 4 for descriptive statistics). The same contrast analyses conducted on mean error-rates reported above mirrored those on zRTs.

Analyses of different components of the overall Stroop interference
Further contrast analyses (with Bonferroni corrections to counteract problems associated with multiple comparisons) conducted with the aforementioned 4-Stimulus-type × 3-Age-group ANOVA (see above) revealed that -as expected -the overall Stroop interference displayed by 1 st -graders (6-7 years old) solely resulted from the significant contribution of the task conflict (see Figure 3). Indeed, the mean zRTs for color-neutral words (e.g., DOG green ) was significantly greater than the one for color-neutral letter-strings (e.g.,  Table 4 for descriptive statistics).
As can be seen in Figure 3, the overall Stroop interference displayed by 3 rdand 5 thgraders (respectively 8-9 and 10-11-years old) resulted from a joint contribution of all three conflicts. Indeed, mean zRTs for color-neutral words (e.g., DOG green ) being significantly greater than the one for color-neutral letter-strings (e.g., XXX green ) in both 3 rd - with the significant contribution of the response conflict.

CROSS-EXPERIMENT ANALYSES
It should be remembered at this point that the Stimulus-Type × Age-group interaction on zRTs -that is likely to epitomize the early developmental change in the magnitude of Stroop interference -remained non-significant in Experiment 1, whereas it was significant in Experiment 2. Given these discrepancies (that are not uncommon, see e.g., Tse & Neely, 2007), the very first aim of the cross-experiment analyses was to examine whether the Stimulus-Type × Age-group interaction on zRTs actually replicates across multiple independent experiments. To this end, we used a procedure known as Winer's z test (Winer, 1971, pp. 49-50; see e.g., Augustinova, Flaudias, & Ferrand, 2010;Tse & Neely, 2007, for applications). When the F values of Experiment 1 and 2 were combined using this approach, the Stimulus-Type × Age-group interaction was indeed significant (z = 3.87, p <.01).
In light of this significant interaction -suggesting that the early developmental change in the magnitude of Stroop interference is indeed likely to replicate across studies -, the second aim of the cross-experiment analyses was to examine further its trajectory. To this end, mean zRTs for standard color-incongruent words (e.g., BLUE green ) and color-neutral letterstrings (e.g., XXX green ) observed respectively in Experiment 1 (using manual responses) and in Experiment 2 (using vocal responses) were combined in a single analysis.

GENERAL DISCUSSION
The results presented above provide no statistical evidence for early developmental change in the magnitude of Stroop interference with the manual response Stroop task (Experiment 1). However, Experiment 2 (using the vocal response Stroop task) and data combined from both experiments indicate that this change still occurs such that the magnitude of overall Stroop interference is larger in 3 rd -graders (8-9 years old) and 5 th -graders (10-11 years old) as compared to 1 st -graders (6-7 years old). Inspection of these combined results and of Figure 2 (see above) indicates that -in line with past findings (e.g., Brown et al., 2002;Sharma & McKenna, 1998), the use of vocal (instead of manual) responses is likely to accentuate the overall pattern of early developmental change that was undetected with manual responses (see Figure 2).
Also, and importantly, the trajectory of this developmental change convincingly shows that -independently of differences in processing speed that undeniably exist both within and between age-groups -the magnitude of Stroop interference peaks after two or three years of reading practice (Schiller, 1966;Schadler & Thissen, 1981;Rand et al., 1963). Perhaps only after the peak reported here will its magnitude start decreasing as reported by numerous studies reviewed above (see section 1.2.). Given the age-groups included in our study, we are unable to demonstrate the actual onset of this decrease that some have claimed is likely to stop with maturation of executive functions (Aïte et al., 2018;Diamond, 2013;Prencipe et al., 2011). These remaining issues are of considerable importance and thus remain to be addressed by future empirical studies mobilizing age-groups beyond the 5 th -grade (10-11 years old, i.e., the oldest age-group in the present study).
Despite this limitation, another strength of the present study is that it addressed, for the first time, the isolable processes underlying Stroop interference and their different developmental trajectories. Indeed, all past studies investigating the early development of Stroop interference simply inferred those processes from the change in its overall magnitude.
Therefore, the measures used in adult Stroop studies (e.g., ; see also Augustinova & Ferrand, 2014a, b) to assess these underlying processes were applied in Experiment 2 while the potentially confounding effect of processing speed was controlled for.
The ensuing results point to perhaps the most novel aspect of the present study.
It lies in the fact that the aforementioned peak in the magnitude of Stroop interference reliably resulted from the joint contributions of task, semantic and response conflicts. Indeed, significantly smaller magnitudes of Stroop interference observed in 1 st -graders (6-7 years old) resulted from a sole contribution of task conflict. This change reflects the fact that all distinct components of Stroop interference (resulting from specific conflicts) are progressively falling into place, yet not at the same pace. Initially only task conflict contributes to Stroop interference in 1 st -graders, but then after two or three years of reading practice (i.e., in 3 rdand 5 th -graders), the magnitude of Stroop interference peaks substantially (Schiller, 1966;Schadler & Thissen, 1981;Rand et al., 1963) -at least when the vocal response-modality is used -and this peak is specifically due to the joint contributions of task, semantic and response conflicts.
It is important to note that the present findings are exempt from the common limitations arising from the employment of color-congruent trials (e.g., BLUE blue ). Even though this type of trial is often used in the Stroop literature to supposedly measure Stroop interference, the positive difference in color-naming times between color-incongruent and color-congruent items (e.g., BLUE blue -BLUE green ) actually corresponds to "(…) the sum of facilitation and interference, each in unknown amounts" (MacLeod, 1991, p.168, italics added). Furthermore, processing underlying facilitation on congruent trials still remains unclear and has been interpreted variously as resulting from response convergence (Cohen et al., 1990), inadvertent reading (Kane & Engle, 2003;MacLeod & McDonald, 2000), or competing influences of interference and facilitation in different portions of the RT distribution (Heathcote et al., 1991). Finally, the inclusion of congruent trials can also lead to response contingency effects (Melara & Algom, 2003;Schmidt & Besner, 2008) and it remains unclear whether facilitation would remain if response contingency was controlled for.
The aforementioned concerns have been successfully avoided by the deployment of an appropriate color-neutral base-line in the present study (see Augustinova et al., 2016;T. L. Brown, 2011;Parris et al., submitted, for discussions). Therefore, the results of the present study have several straightforward implications that are methodological, theoretical and applied in nature.
In line with what is now common practice in adult developmental Stroop studies (e.g., Bugg et al., 2007;Spieler et al., 1996), these results (1) invite early developmental researchers' using the Stroop task to increase their interest in the need to control for processing speed (e.g., Lété & Fayol, 2013;Ziegler et al., 2014 for this type of endeavor in psycholinguistic children's studies). These results also (2) invite researchers to preferentially use vocal responses especially when the composite nature of Stroop interference is under scrutiny (see also Augustinova, Parris, & Ferrand, 2019;Augustinova & Ferrand, 2014b for discussions of this issue in adult studies). Another implication of considerable methodological importance is that (3) in order to avoid any conclusions about the (early) developmental trajectory of inhibition (i.e., an executive function that deals with interference) remaining tentative, congruent trials (i.e., trial producing facilitation) should be avoided.
At the theoretical level, the findings reported in this paper clearly point (4)  The results from the present study also suggest that (11) potentially different neural substrates underlie each conflict type. Consistent with this notion, van Veen and Carter (2005) observed no overlap of activation between semantic and response conflict. They showed that semantic conflict activated dorso-lateral prefrontal cortex, posterior parietal cortex and the anterior cingulate cortex (ACC), whereas response conflict activated more inferior lateral prefrontal cortex, left premotor areas and regions of the ACC more anterior and ventral to that activated by semantic conflict. The authors concluded that their data were evidence for separate but analogous mechanisms for dealing with different kinds of representational conflict (see also Chen et al., 2013;Milham et al. 2001). Consistently, task conflict might also have a unique neural substrate. MacLeod and MacDonald (2000) noted that the ACC appeared to be more activated by incongruent and congruent stimuli when compared to repeated letter neutral stimuli (e.g., xxx; see also, Bench et al., 1993) indicating the detection of task conflict. That said, no study has yet directly investigated this possibility using the required contrast of color-neutral words to color-neutral letter-string, so the precise location of activation within the ACC associated with task conflict is not known. The present study therefore (12) motivates a comparison of the neural substrates of all three conflict types in the same neuroimaging study.
Finally, the reported findings are equally important from an applied point of view.
Indeed, (12) the fact that the specific contribution of all three types of conflict can be clearly seen within the semantic Stroop paradigm administered with vocal responses (Experiment 2) might make it possible to construct a more sensitive evaluation tool that is simple enough (i.e., a card version) to be administered in both lab and field (i.e., clinical) settings. Indeed, the evaluation tools that are currently used in developmental neuropsychological practice (e.g., Armengol, 2002, Roy et al., 2017 Heath, Curtis, Fan & McPherson, 2015) and drugs (e.g., Burton et al., 2015), these issues are of utmost importance and should therefore be addressed directly in future studies.

CONCLUSION
While we urge caution in extrapolating from data from a single task to make claims about development more generally, the present study makes one important contribution. It shows that researchers interested in the development of executive functions, especially inhibitory control, should focus more on how conflict is created in the tasks they use, and what specific types of conflict these tasks generate. Different types of conflict may emerge, and be successfully resolved, at different ages. This point echoes that recently made by Simpson et al. (2012), who found that 4-year-old children in a non-linguistic Stroop-like task (the Day-Night task) were insensitive to semantic conflict, but were sensitive to response conflict. These findings are broadly consistent with the lack of semantic conflict observed in younger children in Experiment 2 of the present study. A more detailed theoretical treatment of these ideas has been made recently in Simpson and Carroll (in press).
Given the ubiquitous use of the Stroop task to study a wide variety of phenomena in psychology and cognitive science broadly defined, the present study potentially makes another important point. Currently, the Stroop task is considered as "a prototypical inhibition task (…) in which one needs to inhibit or override the tendency to produce a more dominant or automatic response (i.e., name the color word)" and the magnitude of Stroop interference as reflecting "one's ability to deliberately inhibit dominant, automatic, or prepotent responses when necessary" (Miyake et al., 2000, p.57). Therefore, perhaps the most immediate conclusions to be drawn from the present study is that Stroop interference is complex and that its underlying processes might remain unseen and/or be misinterpreted when observed using the standard Stroop paradigm. The use of a more fine-grained implementation of a Stroop task (see also e.g. De Houwer, 2003;Hasshim & Parris, 2014; should therefore be considered.