The representation of stimulus conjunction in theories of associative learning: A context-dependent added-elements model.

This article briefly reviews 3 theories concerning elemental and configural approaches to stimulus representation in associative learning and presents a new context-dependent added-elements model (C-AEM). This model takes an elemental approach to stimulus representation where individual stimuli are represented by single units and stimulus compounds activate both those units and configurational units corresponding to each conjunction of 2 or more stimuli. Activity across these units is scaled such that each stimulus always contributes the same amount of activity to the system whether it is presented in isolation or in compound; the configurational units "borrow" activity from representational units for individual stimuli (and from each other). This scaling is affected by the extent to which stimuli interact with each other perceptually. Hence, the model is conceptually similar to Wagner's (2003) replaced elements model but lacks features that explicitly code for the absence of stimuli (i.e., inhibited elements). Simulations of the model are reported for a range of generalization and discrimination learning tasks, conflicting results from which have previously been taken to provide support for either configural or elemental theories of learning. (PsycInfo Database Record (c) 2020 APA, all rights reserved).

(3) The value of Vj is not necessarily the same as the associative strength of configural unit j, since generalization between patterns means that other configural units may also be partially activated by pattern j, and these units may in turn excite US units. For example, if a network has been exposed to patterns A and AB, then when stimulus A is presented, the configural unit for pattern AB will also be partially activated due to the similarity of pattern AB to A. The extent to which these other, partially activated configural units contribute to the overall activation of the US unit is determined by the level of activation of each configural unit and the strength of its association with the US unit. Accordingly, the activation of the US unit is given by Equation 4. Here, n is the total number of configural units in the network, and Sj,i is the similarity of pattern j to pattern i. Third, the degree to which elements of stimulus A are replaced when it is presented in compound with stimulus B (and hence, the parameters rb and sb) is determined, to some extent at least, by perceptual properties of A and B. Stimuli which belong to the same modality are assumed to interact with each other at a perceptual and representational level to a greater extent than stimuli which belong to different modalities.
If we accept that replacement is greater for stimuli within the same modality than for stimuli taken from different modalities, REM can account for some of the conflicting results of experiments conducted in Pearce's and Wagner's laboratories (such as simple and differential summation). The same is not the case, however, for complex negative patterning discriminations of the form A+ B+ C+ AB+ AC+ BC+ ABCø. REM is also computationally quite complex. As the number of stimuli which may be presented in combination increases, there is an exponential explosion in the number of ways that they may interact and, consequently, the number of distinct populations of representational elements that must be considered. Because each additional stimulus may interact with existing populations of representation elements in three ways (context independent, context dependently activated, or context dependently inhibited), this expansion in populations may be derived from trinomial theorem (see Appendix A). For any system consisting of n stimuli, n3 n-1 distinct populations of elements may be identified. Hence, the number of populations of elements required to model systems in which 1, 2, 3, 4, or 5 stimuli may interact is 1, 6, 27, 108, and 405, respectively.
To address these two issues with REM I present a simplified model of stimulus interaction.
The model is inspired by REM but based more closely on the principles of  configurational cue model. It has the twin benefits of being capable of explaining a wider range of experimental findings and of being computationally significantly less demanding than REM.

A Context Dependent Added Elements Model
The model incorporates features of Wagner and Rescorla's (1972) configurational cue extension to the RW model and of Wagner's (2003) REM. This Context-dependent Added Elements Model (C-AEM) is based on the assumptions that a) whenever two or more stimuli are presented in compound they generate unique configurational cues, and b) the relative activation of elemental stimulus representations and the configurational cues varies as a function of a parameter, r, which reflects the degree to which representations of the stimuli interact with each other. C-AEM departs from REM in two fundamental aspects. First, it does not invoke the notion of inhibited elements and, hence, replacement. Second, it is based on unitary representations of stimuli and of configurational cues. C-AEM is conceptually similar to (but not the same as) a version of REM in which replacement of elements is determined randomly without having to make assumptions about the nature of the population from which a random selection of added elements is sampled.
The representational structure of a stimulus compound in C-AEM is the same as that in the RW model if it is assumed that a configurational cue for each combination of stimuli within that compound is generated. Hence, stimulus A will activate a single unit (a); the compound AB will activate units that represent stimuli A and B (a, b) and their conjunction (ab); the compound ABC will activate units a, b, c, ab, ac, bc, and abc. Unlike the RW model, however, the activity level of units is context dependently scaled such that total activity in the system is always equal to the sum of the intensities of the individual cues present. For example, if we assume that the intensities of all stimuli are individually equal to 1, then when the compound ABC is presented, the combined activity in units a, b, c, ab, ac, bc, and abc will equal 3. In this manner, each stimulus contributes the same absolute amount of activity to the system whether it is presented alone or in compound.
It is not just the intensity of a stimulus that influences the activation level of a unit, but also the context in which that stimulus is presented. Stimuli will interact with each other perceptually, and the extent of this interaction is reflected by the parameter r. For simplicity, discussion here will be restricted to an ideal system in which the intensity of all stimuli is the same and equal to 1, and all stimuli within a system interact with all others to the same degree, r. When a single stimulus, A, is presented it will activate a single unit, a. In this case, the activation of unit a, γa, will be equal to the intensity of the stimulus, IA, which is 1. When A is presented in compound with a second stimulus, B, three units will be activated: a, b, and ab. Now, the activation of unit a will be reduced by the extent to which the two stimuli interact, r. Hence, γa = (1 -r). Stimulus A also contributes to the activation of the ab configurational unit and its contribution to γab is equal to the reduction in activation of unit a, r. The interaction between stimuli A and B is reciprocal, and so the activation of unit b, γb, is also given by (1 -r). Similarly, B contributes r to the activation of unit ab, so that γab = 2r. In effect, units a and b lend some of their activation strength to the configurational unit ab. It should be apparent that the figures given here for the activation level of each unit correspond to the proportion of elements within the populations Ai, Bi, and the combination of Ab and Ba, in Wagner's REM model. The critical difference is that there are no units that specifically represent the absence of other stimuli when a stimulus is presented alone (i.e. A~b or B~a). Instead, the units that represent the individual stimuli, a and b, will be more active when the stimuli are presented alone than when they are presented in compound.
The situation is slightly more complicated when three stimuli are presented in compound, but follows the same principle of statistically independent interaction as REM. That is, in compound ABC, γa is reduced by interaction from both B and C and γa = (1 -r)(1 -r) = (1 -r) 2 . Stimulus A still contributes r to the activation of unit ab (as does B), but activity in ab is itself reduced by the interaction with stimulus C. Now, γab = r(1 -r) + r(1 -r) = 2r(1 -r). Finally, activation will propagate to a unit that represents the configuration of all three stimuli, abc. Activation of this unit is γabc = 3r 2 .
The activity in any unit is given by Equation 9 where k is the number of stimuli contributing to the activation of that unit, and n is the total number of stimuli present. A more general version of Equation 9 is given in Appendix B, which allows γ to be calculated when the intensities of stimuli are not all equal and when stimuli interact with each other to different extents.
The number of units that receive activation from k stimuli is given by the binomial coefficient shown in equation 10. For example, in a system with three stimuli (A, B, and C), three units will each receive activation from a single stimulus (a, b, and c), and another three units will each receive activation from two stimuli (ab, ac, and bc). In the former case n = 3 and k = 1, so Equation 10 gives us 3!/1!(3-1)! = 3!/2! = 3. In the latter case, n = 3 and k = 1, and Equation 10 gives 3! = 2!(3-1)! = 3!/2! = 3. A single unit (abc) will receive activation from all three stimuli (3!/3!(3-3)! = 1). It follows from Equations 9 and 10 that the total activity across all units can be calculated by Equation 11, and equals n.
Learning within C-AEM proceeds in a manner similar to both RW and REM, but learning is scaled by the activation of each unit. When a stimulus, or stimulus compound, is presented, the change in the associative strength, V, of a particular representational unit is given by Equation 12 where α and β are learning rate parameters associated with the unit and the US, respectively and λ is the magnitude of the US. Vnet is the expected outcome and is determined by the sum of the products of the activation of each unit and the strength of its association with the US as shown in Equation 13.
The activation of a unit affects both its contribution to prediction of the US (Vnet), and how much is learnt about that unit following a conditioning trial.
C-AEM is computationally a much simpler model than REM whilst retaining the key principle of stimulus interaction. As the number of stimuli that may be presented in compound increases, there is an exponential growth in the number of discrete populations of elements within REM. This growth follows the function n3 n-1 . The growth of representational units within C-AEM is also exponential, but at a much slower rate. The expansion in C-AEM may be derived from binomial theorem (see Appendix C). In a system comprising n stimuli, (2 n -1) units may be activated. Hence, where REM requires 1, 6, 27, 108, and 405 populations of elements to represent systems consisting of 1, 2, 3, 4 and 5 stimuli, respectively, C-AEM requires just 1, 3, 7, 15, and 31 units.
Due to the similarity of C-AEM to REM and the RW model, the two models make similar predictions in a variety of situations. The differences between the models do, however, result in deviations between their predictions in some situations where REM is unable to account for all of the experimental data. To test the predictions of C-AEM and compare them against those made by REM, the RW model, and Configural Theory, a series of computer simulations was conducted.

Application of the Models to Empirical Data
Simulations are presented here of Configural Theory, REM and C-AEM for a selection of situations in which the RW model and Configural Theory make different predictions, and where there is empirical support for the predictions of each. This is not intended to be a comprehensive review of the capabilities of any of the models. Indeed, all of the models considered here make use of a summed error term in their learning rules, which makes it difficult for them to account for changes in the associative strength of stimuli that differ in their associative history when they are conditioned in compound (Rescorla, 2000). Rather, the discussion here is limited to some effects of similarity and generalization which inspired the development of REM, and a closely related patterning discrimination task.
Unless otherwise stated, parameter values were as follows. α was set at .05 for all stimuli and configurational cues for the RW model, all configural units for Configural Theory, all populations of elements for REM, and all representational units for C-AEM. For simulations of all models, λ was equal to 1 and β was .05 when the US was present, and λ was zero and β was .025 when the US was absent 2 .
For REM and C-AEM, all r values were equal and for C-AEM the intensity parameter, I, was set to 1 meaning that Equation 9 could be used to calculate unit activations. For simulations of Configural Theory, d = 2. Where a salient contextual cue was included in the simulation, it was treated in the same manner as a stimulus, with α = .05 and I = 1.
Overshadowing and external inhibition Pavlov (1927) described an experiment in which two stimuli were paired, in compound, with a US. When each stimulus was presented by itself following this training, a weaker CR was provoked than when they were presented together. This overshadowing effect is readily predicted by the RW model and the prediction is easily derived from Equation 2. Compound training will have the effect of increasing Σα and thus reducing the asymptotic value of VA to something lower than λ. Furthermore, a more salient stimulus will overshadow conditioning to a less salient stimulus to a greater extent than the more salient stimulus will be overshadowed by the less salient one (see Miles & Jenkins, 1973;Kamin, 1969). Again, this effect is predicted by Equation 2: since the learning rate parameter α reflects the salience of a stimulus, the addition of a highly salient stimulus will have a greater impact on the value of (αA / Σα) than will the addition of a less salient stimulus. The RW model, however, does not predict external inhibition. Pavlov (1927) also observed that if some additional stimulation such as a change in the illumination of the experimental room, or a loud noise from outside, coincided with the presentation of an established CS, the magnitude of the CR was diminished. Similarly, when a stimulus is presented in compound for the first time in blocking experiments, the CR is sometimes smaller than on the preceding conditioning trial when it was presented alone (e.g., Kamin, 1969). Presenting an additional (neutral) stimulus in combination with an established CS should not affect responding to that CS according to the RW model since the associative strength of the compound is simply the sum of the associative strengths of its components.
Configural theory provides a ready explanation for generalization decrements. If an animal has received conditioning with the compound AB, then presentation of stimulus A alone will activate the AB configural unit only to the extent that A is similar to AB. Since the similarity of these two patterns is less than 1 (according to Equation 5, SAB,A = .5), the unit will receive less activation in response to the presentation of A alone than to the presentation of compound AB. A symmetrical effect is predicted when compound AB is presented following conditioning to stimulus A; activation of the A configural unit by compound AB is similarly determined by the similarity of AB to A. Varying the discriminability parameter, d, will affect the similarity of patterns, but for all values Configural Theory predicts symmetrical effects of overshadowing and external inhibition. Brandon, Vogel and Wagner (2000) observed neither the patterns of results predicted by the RW model or by Configural Theory. They trained three groups of rabbits using eye-blink conditioning.
For the first group, stimulus A was paired with a paraorbital electrical shock. A second group was trained with compound AB, and the third with compound ABC. Following conditioning, animals in all three groups received test trials with A alone, and with the compounds AB, and ABC. Both overshadowing and external inhibition were observed; either adding or removing features from each training pattern resulted in a reduction in the conditioned response. These effects were not symmetrical; removing a feature from the training pattern had a greater impact on responding than adding a feature. Brandon et al's (2000) results are consistent with the predictions of REM. When conditioning is conducted with compound AB, four populations of elements will acquire associative strength. These are context-independent Ai and Bi elements and context-dependent Ab and Ba elements. The relative size of these populations is (1 -r), (1 -r), r and r, for Ai, Bi, Ab, and Ba, respectively. When A is presented by itself, it will activate context-independent Ai elements and context-dependent A~b elements. Hence, only the Ai elements are activated by both compound AB and stimulus A alone. The proportion of AB's elements that are also activated by A is ½(1 -r) because none of B's elements are activated by stimulus A. Conversely, whenever a feature is added to a pattern, it will result in the replacement of a fixed proportion, r, of the elements activated by that pattern. The elements of A that are also activated by compound AB are again the context-independent Ai elements, and the portion of A's elements that are activated by compound AB is (1 -r). Generalization between AB and ABC follows similar rules.
Generalization of associative strength from AB to ABC will again be equal to (1 -r); addition of a stimulus results in replacement of a fixed proportion of the total elements of the original pattern.
Generalization of associative strength from ABC to AB is, however, predicted to be ⅔(1 -r) because only two of the three stimuli compound ABC are also present in compound AB. In all cases removal of a feature is expected to have a greater effect than the addition of a feature.
The predictions that C-AEM makes concerning the relative size of the effects of overshadowing and external inhibition are not as straightforward as those of REM. Rather, they depend on the number of features in the training and testing patterns. C-AEM makes the same predictions as REM about external inhibition; adding a stimulus will always reduce activity in units by the proportion r. Removing a stimulus, however, does not simply result in fewer units being activated, but also changes the activation level of those units. The top-left panel of Figure 1 shows how generalization between A and AB varies with r. For all values of r, the effect of over-shadowing is greater than that of external inhibition. The top-right panel of Figure 1 shows corresponding predictions concerning generalization between A and ABC. Here, we can see that for some values of r between about .3 and .5, the difference in the size of the effects of over-shadowing and external inhibition is quite small. The bottom two panels of Figure 1 show predictions for generalization between ABC and either AB (left panel) or ABCD (right panel). In both cases, there are values of r for which the effect of external inhibition is predicted to be greater than that of overshadowing.
Asymmetrical generalization has been reported in several experiments with rats (González, Quinn & Fanselow, 2003;Bouton, Doyle-Burr & Vurbic, 2012) and humans (Glautier, 2004;Wheeler, Amundson & Miller, 2006;Thorwart & Lachnit, 2010). Other authors have, however, observed symmetrical effects of overshadowing and external inhibition in very similar situations (Young, 1984cited in Pearce 1987Rescorla, 1999;Thorwart & Lachnit, 2009). Perhaps then, it is premature to suggest that the experimental evidence provides particularly strong support for any one of these theories over any other. I am, however, aware of no evidence in support of C-AEM's prediction that, under some conditions, the effect of external inhibition should be greater than that of overshadowing.
REM and C-AEM also make the seemingly unreasonable prediction that adding a feature to a pattern will result in the same decrement in generalization regardless of the number of features of which that pattern is composed. Brandon et al's (2000) results, however, support this prediction: two groups of rabbits given conditioning with stimulus A or compound AB and then tested with compounds AB or ABC, respectively, showed equivalent decrements in responding (16% vs. 18%).

Simple Summation
If two stimuli, A and B, that have been separately paired with a US are then presented together, responding to the compound AB is sometimes observed to be greater than to either of the individual stimuli (e.g., Whitlow & Wagner, 1972). This summation effect is predicted by the RW model because first, the associative strength of a stimulus compound is assumed to be equal to the sum of the associative strengths of its constituent stimuli (i.e. VAB = VA + VB) and second, the relationship between associative strength and response strength is assumed to be at least ordinal.
Configural Theory struggles to explain simple summation. Following A+ B+ training, responding to the compound AB will depend upon generalization of associative strength from the A and B configural units. According to Equation 5, the similarity of AB to each of the individual cues is .5.
Hence, half of the associative strength of each stimulus will generalize to AB and the net associative strength of AB will be the average of the associative strengths of A and B. The failure to predict summation might not, however, be catastrophic for configural theory. It should be noted first that summation is not a ubiquitous effect. Although it has been observed in some experiments (e.g., Aydin & Pearce, 1995Kehoe, 1986;Kehoe, Horne, Horne & Macrae, 1994), there are others where it has not (e.g., Aydin & Pearce, 1995Kehoe, Horne, Horne & Macrae, 1994). There are also nonassociative explanations of the summation effect, such as stimulus intensity dynamism (e.g., Hull, 1949) and disinhibition of delay (e.g., Pavlov, 1927).
Configural theory does not have to rely on non-associative mechanisms to explain summation in all situations. In one experiment, Pearce, George & Aydin (2002) gave rats training in which two stimuli, A and B, were each individually paired with food, as was the compound CD (A+ B+ CD+). At test, responding was greater to compound AB than to CD -summation was observed. In a second experiment, the same comparison was made between-subjects. One group received A+ B+ training, and a second were trained with just AB+ trials. At test there was no difference in the rate of responding during presentations of compound AB for these two groups. Inclusion of CD+ trials influenced summation of responding to A and B. One explanation for this effect that Pearce et al considered concerned the nature of the stimuli used, and the effect that this might have on generalization between stimuli. In these experiments, A and C were visual stimuli, whereas B and D were auditory stimuli. Pearce et al suggested that stimuli from the same modality might share some common features which are not shared between stimuli belonging to different modalities. Hence, A, B, C, and D may be conceptualized as ax, by, cx, and dy, and the compounds AB and CD as abxy and cdxy. Due to the influence of generalization on the asymptotic associative strengths of the various configural units, and generalization of associative strength from ax, by, and cdxy to abxy at test, configural theory predicts the net associative strength of abxy (1.1λ) will be greater than that of cdxy (λ) for the withinsubject comparison. For reasons explained earlier, no summation would be expected following simple A+ B+ training since half of the associative strength of ax and of by will generalize to abxy.
Nevertheless, several experiments have demonstrated summation following simple A+ B+ training (e.g., Hendry, 1982;Kehoe, 1986;Konorski, 1948, Thein, Westbrook & Harris, 2008. Even in these circumstances, however, it is possible for Configural Theory to explain summation, if it is assumed that the experimental context is of relatively high salience. This seems to be a reasonable assumption in some cases at least, for example where aversive Pavlovian conditioning is conducted over an appetitive instrumental baseline (Hendry, 1982), or conditioning is conducted in restrained rabbits (Kehoe, 1986) or restrained dogs (Konorski, 1948). Where the context is salient, simple A+ B+ training may be re-described as AX+ BX+ Xø where X is the salient context, and the test compound is ABX. If the saliences of A, B, and X are equal, then at the asymptote of conditioning the net associative strengths of patterns AX and BX will be equal to λ and that of X will be 0. The associative strengths of the configural units, however, will be as follows: AX = BX = 1.33λ; X = -1.33λ. Since the similarity of ABX to AX and BX is high (.66), and relatively little inhibitory associative strength will generalize to ABX from X due to their low similarity (.33), the net associative strength of the compound is predicted to be 1.33λ. Pearce, George, Redhead, Aydin and Wynne (1999; see also Pearce, Redhead & George, 2002) reported a related effect in pigeon autoshaping. They manipulated generalization between stimuli A and B and the test compound AB by changing the salience of the background illumination of the television screen on which they were presented. Summation was observed when the background was white and was also illuminated throughout the inter-trial-interval, - Configural Theory can also predict simple summation if generalization between patterns is increased by giving the d parameter a value lower than 2. For example, when d = 1.5, the net associative strength of compound AB following asymptotic conditioning with A and B will be 1.19λ.
Reducing the value of d does, however, cause some problems for Configural Theory. For example, when d < 2 Configural Theory predicts that complex negative patterning discriminations of the form A+ B+ C+ AB+ AC+ BC+ ABCø are insoluble, which is not the case (Redhead and Pearce, 1995).
Increasing d above 2 decreases generalization and results in compound AB having lower net associative strength than A or B alone.
REM and C-AEM make the same predictions as each other concerning simple summation following conditioning with patterns that share no common features. In this situation, generalization from the training patterns to the test pattern is determined purely by the r parameter. For REM, during conditioning with stimulus A, content independent elements Ai and context-dependently inhibited elements A~b will accrue associative strength. During presentations of test compound AB, these latter elements will not be active, and generalization of associative strength from A to AB will be based upon the proportion of context independent elements, s = (1 -r). Associative strength will generalize from stimulus B to compound AB in the same way, and the compound will have a net associative strength of 2(1 -r). In C-AEM, stimulus A will only activate its own representational unit, a, and that unit will gain associative strength until VA = λ. The activation of unit a (and unit b) when compound AB is presented may be calculated using Equation 9 and will again be equal to (1 -r). The lower panel of Figure 2 shows the results of the predictions of REM and C-AEM for a summation experiment where stimuli A and B were presented alone and in compound following conditioning with A and B. The models predict that summation will occur when interaction between the stimuli is low (r < .5). For high values of the r parameter (r > .5), they predict less responding to compound AB than to either A or B individually. Hence, all three models can accommodate the results of experiments which have demonstrated response summation using stimuli drawn from different modalities (e.g., Whitlow & Wagner, 1972) and those which have failed to find evidence of summation within a single stimulus modality (e.g., Rescorla & Coldwell, 1995), if we assume that r < .5 in the former case, and r ≈ .5 in the latter case. REM and C-AEM also predict lower responding to a compound when the replacement parameter is very high, an effect that has been observed by Aydin & Pearce (1995.

Differential Summation
Pearce, Aydin & Redhead (1997) gave pigeons autoshaping training in which presentations of three visual stimuli, A, B, and C, were paired with food. For one group of pigeons, these stimuli were presented individually (A+ B+ C+), whereas for a second group, they were presented in pairs (AB+ AC+ BC+). Following this training, responding to the compound ABC was assessed in each group.
Responding during these test trials was slower in the group trained with the individual stimuli. This result is predicted by Configural Theory because the similarity of ABC to the compounds AB, AC, and BC is greater than the similarity of ABC to the individual stimuli A, B, and C. Although generalization between the pairs of stimuli will result in the configural units for AB, AC, and BC each having an asymptotic associative strength of .66λ, two-thirds of the associative strength of each compound will generalize to the compound ABC, resulting in a net associative strength of 1.33λ. When the three stimuli are trained individually the asymptotic associative strengths of the configural units A, B, and C, will be λ. Because only one third of the associative strength of each will generalize to ABC the net associative strength of the test pattern is predicted to be λ. The top-left panel of Figure 3 shows the predicted net associative strength of compound ABC for the two groups in Pearce et al's experiment (A+ B+ C+; ABC? or AB+ AC+ BC+; ABC?).
The RW model makes the opposite prediction. It supposes that training with the individual stimuli will result in each gaining an asymptotic associative strength of λ. When the stimuli are trained in compound, however, Equation 2 predicts that the asymptotic associative strength of each stimulus will be .5λ. Because the net associative strength of the compound is the simple arithmetic sum of the associative strengths of A, B, and C, RW predicts that ABC will have an associative strength of 3λ following individual training, but of only 1.5λ following compound training. Myers et al (2001) replicated Pearce et al's (1997) Figure 3. During AX+ BX+ CX+ Xø training, there will be some generalization between the compounds (SAX,BX = .25), but considerably more generalization between each compound and the context (SAX,X = .5). This means that the configural units for AX, BX, and CX will have asymptotic associative strengths of 1.33λ whereas the unit for X will have an associative strength of -2λ. Half of the associative strength of AX, BX, and CX will generalize to ABCX, but only one quarter of that of X will. Hence, the net associative strength of the test compound is 1.5λ. For ABX+ ACX+ BCX+ Xø training, there will be considerably more generalization between the compounds (SABX,ACX = .44) and less generalization between the compounds and the context (SABX,X = .33). At test, three quarters of the associative strength of each compound and one quarter of the associative strength of the context will generalize to ABCX, meaning that its net associative strength will be 1.29λ. Configural units ABX, ACX, and BCX will end up with associative strengths of .64λ and X will have an associative strength of -.64λ.
Configural Theory also predicts that compound ABC will have greater net associative strength following A+ B+ C+ training than following AB+ AC+ BC+ training if the d parameter is reduced significantly below 2, but it is difficult to justify increasing generalization when it prevents the model from solving some discrimination problems. Increasing the value of d above 2 does not affect the ordinal predictions of Configural Theory. Predictions from REM may be derived quite straightforwardly. Since replacement of A elements by different stimuli (i.e. B and C) is statistically independent, there are nine distinct populations of A elements to be considered; each population is either context independent, context-dependently inhibited, or context-dependently activated with respect to B and with respect to C. These nine populations and their relative sizes are enumerated in Table 1.
For animals trained with the individual stimuli A, B, and C, none of the added elements (i.e., Ab, Ac,, Ab~c, Ac~b, Abc) will be activated during training, and generalization from A to the test compound ABC will rely solely on the elements that are context independent with respect to both B and C (Ai).
Thus, generalization from A to ABC will equal (1 -r) 2 . Associative strength will also generalize from B and C to ABC in the same proportions. When r = 0, VABC = 3λ. As the amount of replacement increases, generalization will decrease exponentially until the net associative strength of ABC will equal zero when r = 1. For animals trained with compounds AB, AC, and BC, generalization from stimulus A to ABC will rely not only on the context independent Ai elements, but also on the context dependent elements that are commonly activated by compounds ABC and either AB or AC (i.e. Ab and Ac). The combined size of these latter populations of elements is given by 2r (1 -r). The relationship between the value of this term and r is not monotonic. Instead, it increases with r in the range 0 ≤ r ≤ .5 until it reaches a maximum value of .5 but decreases as r increases in the range .5 ≤ r ≤ 1. This means that as r increases in value, a decline in the size of the population of context-independent Ai elements will, to some extent, be offset by an increase in the number the context-dependent Ab and Ac elements. The interplay between these different populations of elements means that for some intermediate values of r, REM predicts that the test compound ABC will have greater net associative strength following AB+ AC+ BC+ training than following A+ B+ C+ training (see the middle panel of Figure 3).
In C-AEM, the activation level of a unit affects both its contribution to the net associative strength of a pattern, and also the change in the associative strength of that unit following a ABC is presented at test, each of these units will have an activation of (1 -r) 2 , and hence the net associative strength of the compound will be 3(1 -r) 2 . This is the same prediction as REM. During compound conditioning, however, units ab, ac, and bc will also acquire associative strength. Because the activation level of each unit acts as a rate parameter in Equation 12, the distribution of associative strength between the representational units activated by a compound is influenced by r. Within each compound, the proportion of associative strength that will accrue to the configurational cue will be equal to r: Vab / (Va + Vb + Vab) = r. At test, the activation of the configurational cue ab will be 2r (1 -r),

Differential External Inhibition
Pearce, Adam, Wilson and Darby (1992) also compared responding to the compound ABC in two groups of pigeons given slightly different training involving the three stimuli. The first group were trained on a discrimination involving the individual stimuli in which A and C were each individually paired with food whereas B was not (A+ Bø C+ the net associative strength of ABC will be 2(1 -r) 2 following A+ Bø C+ training and 2(1 -r) following AB+ Bø BC+ training. Because (1 -r) ≤ 1, the former value is smaller than the latter except when r = 0 or r = 1, when they are equal.

Complex Negative Patterning
For a feature negative discrimination of the form A+ ABø, the addition of a common element, C, to each pattern (AC+ ABCø) retards acquisition of the discrimination (Pearce & Redhead, 1993;. Configural Theory predicts this effect because the similarity of (and therefore, generalization between) the reinforced and non-reinforced patterns is increased by the addition of a common element. Redhead and Pearce (1995)  According to the RW model, compounds AB, AC, and BC will provoke larger CRs than the individual stimuli A, B, and C due to summation of associative strength. This model, therefore, predicts the opposite pattern of results to that observed by Pearce and Redhead (1993) and by .
In order to solve the discrimination and reduce responding to the non-reinforced compound ABC, the model must assume that at least some stimulus conjunctions generate a unique configurational cue which will gain inhibitory associative strength. Slightly different predictions may be derived dependent upon whether configurational cues are generated by each conjunction, or by ABC alone, but these differences do not affect the ordinal prediction that the net associative strength of AB, AC, and BC will be higher than that of A, B, and C alone (before learning reaches asymptote  To the best of my knowledge, this type of parity discrimination has been employed only in the experiments reported by Pearce et al (2008) and in one other study (George, 2018). I trained human participants on simultaneous negative (A+ B+ C+ ABø ACø BCø ABC+) and positive (Dø Eø Fø DE+ DF+ EF+ DEFø) parity discriminations in a predictive learning task. For some participants, each of the stimuli was a circle that consistently appeared in a specific location on a computer monitor (the six stimuli were arranged at the vertices of an imaginary hexagram so that ABC and DEF were the corners of two equilateral triangles), and all of the circles were of the same colour. For other participants the six stimuli were quite different from each other, based on those used by  in a pigeon auto-shaping experiment. On each trial, participants were asked to rate the likelihood that the visual pattern would be followed by the presentation of a tone (+) on a scale that ranged from 1 = very unlikely to 9 = very likely. Both groups of participants solved the discrimination problems, but stimulus The results of simulations of Configural Theory, REM, and C-AEM are shown in Figure 6 for the human parity discrimination task. For simplicity, only the data from the negative parity discrimination are shown; data from the positive and negative parity discriminations are rotations of each other around the zero point of the y-axis. For these simulations, β was .05 both when the US was present and when it was absent, and λ was -1 when the US was absent. These values were chosen because in human causal learning experiments the presence and absence of the outcome are usually events of equal salience and likelihood (Livesey, Thorwart & Harris, 2011 with high-similarity stimuli. Only C-AEM can predict both patterns of results. The low-similarity stimuli differed across three perceptual dimensions (colour, shape, and orientation), although they were all visual.  found that pigeons trained on a complex negative patterning discrimination with these stimuli behaved in a similar manner as Redhead and Pearce's (1995) pigeons trained with stimuli that differed along a single dimension (colour). Hence, it is reasonable to set the r parameter to a value in the range .4 -.5 for which C-AEM correctly predicts the results of other pigeon auto-shaping experiments. As can be seen in the bottom-centre panel of Figure 6, the predictions of C-AEM with r = .4 match the pattern of rating for the low-similarity stimuli. For high values of r (.8), C-AEM predicts the pattern of results obtained from participants trained with highsimilarity stimuli (bottom-right panel). It may be fair to suppose that perceptual interaction was abnormally high for the high-similarity stimuli. Because the individual elements of the patterns were identical, defined only by their location, their geometric arrangement may have been very salient. A, B, and C alone were simply points in space; AB, AC, and BC were pairs of points lying along lines a 120°, 60°, or 0° from horizontal, respectively; ABC was three points at the corners of an implied equilateral triangle. When the individual stimuli were less similar their identities may have been rather more salient than their spatial arrangement, resulting in less perceptual interaction (but still more than for multi-modal stimuli). Simulations of C-AEM reproduced another aspect of the experimental data; C-AEM predicts faster learning when r = .8 than when r = .4. Although there was no statistically significant difference in the overall rates of learning of the two groups, there was a trend for the group trained with high-similarity stimuli to learn more rapidly than those trained with low-similarity stimuli.
In order for C-AEM to explain both Pearce et al's (2008) and my (George, 2018) results, it is necessary to assume that stimuli that differ only in their location interact with each other to a much lesser extent for pigeons that for people. Points on a computer monitor might be seen as the vertices of shapes by people, but not by pigeons. There is some evidence that this is true, or at least that vertices and edges are processed differently by the two species. Biederman (1987), for example, found that people's recognition of simple line drawings of objects was impaired to a much greater extent by the deletion of vertices than by the deletion of edges. The opposite effect was found in pigeons by Rilling, De Marse and La Claire (1993;see Qadri & Cook, 2015, for a discussion of divergences between avian and mammalian visual cognition).

Summary
The context-dependent added-elements model presented here is computationally less complex than Wagner's (2003) replaced elements model while retaining many of the properties of REM. C-AEM is able to accommodate conflicting findings concerning summation and differential summation from experiments involving pigeon autoshaping (e.g., Pearce et al, 1997) and rabbit eyeblink conditioning (e.g., Myers et al, 2001), and also the consistent effect of differential external inhibition in these two preparations (Pearce et al, 1992;Kundy et al, 2002) in a similar manner to REM.
In addition to its relative simplicity, C-AEM has the strength that it provides a better overall fit to data from experiments involving complex patterning discriminations than does either REM or Pearce's Configural Theory. For both complex negative patterning and parity discriminations, REM predicts a consistent (ordinal) pattern of results for all values of r greater than zero. Configural Theory provides a very close match to data from Pearce et al's (2008) parity experiments, but cannot accommodate the differential results of complex negative patterning experiments (Redhead & Pearce, 1995;Myers et al, 2001) or the effects of similarity on human parity discrimination learning (George, 2018).

Other models of learning
The primary purpose of this paper was to explore whether the configurational cue, or added elements, approach of Spence (1952) and Wagner and Rescorla (1972) was able to account for the variety of data concerning the effects of similarity on generalization and discrimination learning as well as Wagner's (2003)  patterning discriminations where all of the stimuli were coloured dots. The first discrimination was of the standard form: A+ B+ ABø. In the second, a common feature was present on all trials: CD+ CE+ CDEø. Addition of the common element increased similarity between reinforced and non-reinforced patterns and retarded acquisition of the discrimination, as predicted by Configural Theory. Bahçekapılı  described in Myers et al, 2001) found the opposite pattern of results when he trained two groups of rabbits on feature negative discriminations with (AC+ ABCø) or without (A+ ABø) a common feature using multi-modal stimuli. Redhead and Curtis (2013) replicated this moderating effect of stimulus modality on the ability of a common feature to retard or enhance discrimination learning in a human contingency learning experiment. They also conducted simulations of AMAN, which showed that by manipulating the similarity of the stimuli, the model could predict their results. However, in a direct test of the AMAN's predictions, I manipulated the similarity of stimuli across four patterning tasks and in no case could AMAN accommodate the results (George, 2018). Mackintosh (2000, 2002;McLaren, Kaye & Macktinosh, 1989) presented a realtime elemental model of learning. The predictions that this model makes concerning the addition of a common element to a patterning or feature-negative discrimination depends upon how much learning takes place on each trial. When little is learned, the model's predictions match those of the RW model, but when the amount learned is high, they match the predictions of Configural Theory.
Conflicting results of experiments using multimodal and unimodal stimuli (or rabbit eyeblink conditioning vs. pigeon autoshaping) may, therefore, be the result of differences in learning rates. It is, however, difficult to compare learning rates between experiments and few experiments have made a direct comparison. Redhead and Curtis (2013) found no difference in the rate of learning between participants trained with multimodal or unimodal stimuli even though the relative rates at which they learned a simple negative patterning discrimination (A+ B+ ABø) and one with the addition of a common feature (CE+ DE+ CDEø) was affected by stimulus modality. I also found no difference in the overall rate of learning between groups of participants that showed the patterns of results predicted by the RW model and by Configural theory for a variety of patterning discrimination tasks (George, of the latter two models are not simply the results of differences between elemental and configural stimulus processing. Indeed, Ghirlanda (2015) has demonstrated that, under conditions met by existing configural models, it is always possible to construct a configural model equivalent to a given elemental model. Thorwart, Uengoer, Livesey and Harris (2017) have argued that the critical differences between the RW model and configural theory concern normalization of activity within the models and context-dependency of stimulus processing. In Configural Theory, activity of units within the input network are normalized so that only one configural unit is ever fully activated. The activation of configural units is also context-dependent; configural unit A is fully activated by stimulus A, but only partially activated by compound AB. In the RW model, conjunctions of stimuli will result in the generation of a unique configurational cue, but this is in addition to the elemental representations of the component stimuli which are otherwise unaffected by the presence or absence of others. Hence, stimulus representations in RW are both context-independent and non-normalized. In both REM and C-AEM, the activation of individual elements, or the activation level of a unit, is dependent upon the context in which a stimulus is presented, but each stimulus always contributes the same amount of activity to the system; these models are context-depended, but non-normalized. The Inhibited-Elements Model (IEM) described by Wagner and Brandon (2001) as an elemental equivalent to Configural Theory is a normalized, context-dependent elemental model. By varying the degree of normalization with the IEM and comparing its predictions with those of REM and Configural Theory, Thorwart et al (see also Thorwat & Lachnit, 2020) were able to independently evaluate the importance of normalization and context-dependency to the models' ability to predict acquisition of positive and negative patterning discriminations. They concluded that a low-level of context dependency was the critical factor in replicating their results rather than either normalization or a difference in elemental vs. configural processing. Although not presented here, the predictions of C-AEM match those of REM for Thorwart et al's task. This should not be surprising since C-AEM shares those same properties of non-normalization and context-dependence.
Given that REM and C-AEM are both non-normalized and context-dependent, we must consider why they make different predictions in some instances. There are three key differences between the models. First, in C-AEM (and the RW model and Configural Theory), stimuli are represented by units, whereas in REM representations consist of collections of elements or microfeatures. It is unlikely that this difference is responsible for variations in the models' behaviour.
Glautier (2007)  ). In REM an element is either active or inactive; if it is active then its associative strength may be modified according to Equation 1. Populations of elements will, therefore, accrue different total amounts of associative strength depending on their size, but each element will contribute its full associative strength to the prediction of the US if it is active. Given that Glautier (2007) has shown that REM may also be simulated by representing each population as a single binary unit, and scaling changes in association strength, the differences in the predictions of REM and C-AEM are not likely to be due to either the effects of unit activation on net associative strength or on changes in associative strength, but rather the combination of these two factors.
The third difference between the models is, however, especially important for the complex negative patterning and parity discrimination tasks; in C-AEM a stimulus does not activate units which explicitly signal the absence of other stimuli, whereas the inhibited elements in REM serve this purpose. When a stimulus is presented alone, in C-AEM it will activate its own representational unit fully, and no others. That means that in a complex negative patterning or parity discrimination where A, B, and C are individually paired with the US, units a, b, and c will each have an asymptotic associative strength of λ. This is not affected by the value of r. In REM, however, this associative strength is distributed between four populations of elements (context-independent Ai, and context-dependent elements A~b, A~c, and A~b~c), and the distribution of associative strength between these populations is affected by r. The result is a complex interplay between the expression of this associative strength on compound conditioning trials and changes in the associative strength of configurational cues (or context-dependent elements) dependent on r. The effect is significant variation in the distribution of associative strength between units or populations of elements between the two models.
In conclusion, I have presented a context-dependent added elements model which is a version of the RW model with configurational cues in which the activation of units representing stimuli and their configurations are dependent upon the context in which a stimulus is presented. C-AEM can accommodate a wide range of conflicting data from Pearce's and Wagner's laboratories concerning the effects of stimulus similarity on generalization and discrimination learning if we assume that the context-dependency of representational units is affected by the perceptual properties of stimuli. The success of C-AEM relative to both REM and Configural Theory suggest that a configurational cue approach put forward by Spence (1952) and adopted by Wagner and Rescorla (1972) still has something to offer contemporary models of associative learning. 2. It is common to assume that the value of β is greater when the US is present than when it is absent. Some effects are only predicted by RW model when this is the case. One example is the relative validity effect (Wagner, Logan, Haberlandt & Price, 1968) where less conditioned responding to stimulus X was observed following AX+ BXø training than following training in which compounds AX and BX were each reinforced on 50% of trials. When the two β values are equal the RW model predicts no difference between conditions. All simulations reported here were repeated with β equal to .05 both when the US was present and when it was absent.
While this change had a trivial effect on the rate at which learning progressed, it did not alter the overall pattern of results for any simulation.

Appendices Appendix A: Derivation of population expansion in REM using trinomial theorem
We may represent the effect of adding a stimulus (B) on individual elements within the representation of another stimulus (A) using balanced ternary notation where the element is either context dependently activated (+1; i.e. Ab), content-independent (0; i.e. Ai), or context dependently inhibited (-1, i.e. A~b). Since replacement by different stimuli is statistically independent, we can then characterize the effect of any number of different stimuli upon a particular population of elements within the representation of A using a vector of ternary bits. In any system where n stimuli may be  (Andrews 1990;Andrews & Baxter, 1987) and k has the range (-m, m). The total number of populations of elements within the representation of A, is then given by the term

Appendix B: General form of the C-AEM activation function
Equation 9 gives the activation function for units in C-AEM when all r values and stimulus intensities are equal. In many situations this will, however, not be the case. For example, when a stimulus may be presented in compound with other stimuli from either the same or a different modality. A general form of the activation function is given in Equation B1 which allows stimuli to differ both in their intensity and the extent to which they interact with each other. In this equation, IH is the intensity of stimulus H, rj,h is the extent to which stimulus J interacts with the perception of stimulus H, n is the total number of stimuli present and k is the number of stimuli that contribute to the activation of the unit.
In a stimulus compound, each stimulus will contribute to the activation of more than one unit.
When there are three or more stimuli present, then each stimulus will contribute to the activation of multiple configurational units. Because of this, activation of a unit may be affected by the presence of stimuli which do not directly contribute to its activation. Consider compound ABC. Stimuli A will contribute to the activation of units a, ab, ac, and abc. Because stimulus A always contributes the same total activity to the system, equal to its intensity IA, the presence of stimulus C will reduce the amount of activity available for units a and ab. In Equation B1, the contribution of each stimulus (h = 1 to k) to the activation of a unit is calculated based on its intensity (IH) and these contributions are summed (stimuli A and B both contribute directly to the activation of unit ab, all three stimuli contribute directly to the activation of unit abc). The intensity of each stimulus is multiplied by the product of the r values for each additional stimulus that contributes to the activation of the unit, and the product of the (1 -r) values for each additional stimulus that does not contribute to the activation of the unit. Hence, the contribution of stimulus A to the activation of the ab unit is IArb, a(1 -rc,a), and the total activation of the unit, γab = IArb,a(1 -rc,a) + IBra, b(1 -rc,b). Activation of unit a is γa = IA(1 -rb,a)(1 -rc,a), and of unit abc is γabc = IArb,arc,a + IBra,brc,b + ICra,crb,c.
The total activity across all units cannot now be calculated using Equations 10 and 11. It will, however, always be equal to ΣI. It seems somewhat implausible to suggest that when two stimuli of greatly differing intensities are presented in compound that the less intense stimulus will interfere with the more intense stimulus to the same extent as the more intense stimulus interferes with the less intense. This may be illustrated with an example in which stimulus A has an intensity of 4, and stimulus B an intensity of 1. If the replacement parameter, rb,a, has a value of 0.4, then from Equation B1 we can see that the activation levels of a, b, and ab, will be 2.4, 0.6, and 2, respectively. Hence, the configurational cue ab is of greater intensity than one of the stimuli contributing to its activation when presented alone. For this reason, I propose that two further modifications might be appropriate. First, perceptual interaction between two cues may not always be symmetrical: the degree to which B interferes with A, rb,a, is not necessarily equal to the degree to which A interferes with B, ra,b. Second, the value of the parameter rb,a may be proportional to the ratio between the intensities of the two stimuli A and B: the greater the ratio A:B, the greater value of ra,b and the smaller the value of rb,a.
These modifications do not affect the total contribution that each stimulus makes to activity across all units; γtotal will always be equal to ΣI. Exploration of the effects of these modifications are, however, If we substitute 1 for x in Equation C1, we can see that it reduces to 2 n . This total includes the case where k = 0, but we do not require a unit to which no stimuli contribute activation. Since 1 0 n  =   , the total number of units required to represent a system in which n stimuli may be presented in compound is, therefore, (2 n -1). Tables   Table 1. The populations of elements within the representation of a stimulus, A, according to Wagner's REM. There are nine distinct populations to consider when stimulus A may be presented in combination with either or both of two other stimuli, B and C. Each population of elements may be context-independent, context-dependently activated (added), or context-dependently inhibited with respect to B and with respect to C. The relative size of the populations is determined by parameters rb and rc. Whenever A is presented, whether alone or in compound with B and/or C, four of these populations of elements will be active. Activity within those populations will always sum to 1.