High frequency trading from an evolutionary perspective: financial markets as adaptive systems

The recent rapid growth of algorithmic High Frequency Trading (HFT) strategies makes it a very interesting time to revisit the long standing debates about the efficiency of stock prices and the best way to model the actions of market participants. To evaluate the evolution of stock price predictability at the millisecond timeframe and to examine whether it is consistent with the newly formed Adaptive Market Hypothesis (AMH), we develop three artificial stock markets using a Strongly Typed Genetic Programming (STGP) trading algorithm. We simulate real-life trading by applying STGP to millisecond data of the three highest capitalised stocks: Apple, Exxon Mobil and Google and observe that profit opportunities at the millisecond timeframe are better modeled through an evolutionary process involving natural selection, adaptation, learning and dynamic evolution than by using conventional analytical techniques. We use combinations of forecasting techniques as benchmarks to demonstrate that different heuristics enable artificial traders to be ecologically rational, making adaptive decisions that combine forecasting accuracy with speed. ----------------------------------------------------------------------------------------------------------------Keywords


1.Introduction
Due to the advances in technology and the rapid growth of high frequency trading (HFT), advanced financial markets have substantially eliminated human intermediation in the trading process and replaced it with automated electronic limit order books which have allowed the growth of trading algorithms as one of the main investment tools. Some of the trading algorithms generated imitate the behaviour of humans in the trading process and over the last few years, these trading algorithms have substantially improved their speed to match the incidence of bid and ask orders (McGowan, 2010).
Concerns over financial market stability and the equitable treatment of all market participants have prompted renewed interest in market quality and market regulators are still debating whether or not HFT is beneficial or harmful to market efficiency 1 . Since the seminal work of Fama (1965Fama ( , 1970 who introduced the efficient market hypothesis (EMH), there has been a plethora of studies analysing market efficiency and adaptability. Recently this topic has been brought into new focus by the rapid growth of profitable HFT (Baron et al. 2012). However, many studies of market efficiency have significant deficiencies such as data-snooping (Sullivan et al. 1999), ex-post selection of profitable trading rules (Zakamulin 2014), false discoveries (Harvey and Liu 2014), inattention to transaction costs (Park and Irwin (2007) as well as studies basing their conclusions on econometric tests or theoretical hypotheses which treat market properties as essentially static, failing to consider the evolutionary processes of adaptation, learning and survival of market participants (Campbell et al. 1997). Lo (2004) argues that market outcomes are obtained not in an analytical way, but through an evolutionary process of trialand-error and natural selection. The process of natural selection enables survival-of-the fittest and determines the composition of market participants and their trading strategies. In contrast, this study implements a special adaptive form of the Strongly Typed Genetic Programming (STGP) utilising historical millisecond data of the most capitalised companies: Apple, Exxon Mobil and Google. The STGP (described in the Appendices) represents a sophisticated trading algorithm that successfully replicates real-life trading strategies performed at the millisecond timeframe. Using STGP, we compare the forecasting performance of our artificial traders with several combined forecasting methods. In other words, we simulate real-life trading sessions which allow us to avoid the obstacles in the studies discussed above. Given the arguments of Lo (2004;2005) that financial markets are governed by evolutionary processes, STGP is an extremely suitable approach to investigate market efficiency. This is due to the fact that all 100,000 artificial traders in each of the three stock markets in our experiment compete, learn, adapt, evolve and try to survive. The environment of heterogeneous traders where stock prices and traders' beliefs co-evolve over time provides an appropriate laboratory platform to investigate market efficiency. Hommes (2011) suggests that adaptation and learning in heterogeneous structures are important tools for analysis of financial market behaviour. Moreover, the random nature of the initial trading rules of all traders and their subsequent evolution allows us to observe directly the processes described by Lo (2004;2005). Therefore, the aim of this study is to evaluate the evolution of stock price predictability at the millisecond timeframe and to examine whether this evolution is consistent with the notion of adaptive markets.
The contribution of this study is three fold. First, this is the first study to use an innovative trading algorithm and real-life millisecond data to provide concrete empirical evidence of market adaptability at the millisecond timeframe. We estimate in precise quantitative terms the daily profits of high frequency traders (HFTs) after taking into account realistic transaction costs, providing an advantage over existent studies, such as that of Brogaard (2013) and Carrion (2013) which observed the activities of HFT using only aggregate data, thus preventing them from calculating the exact profitability. In order to measure the statistical accuracy and trading efficiency, we compare the predictive ability of our artificial traders to benchmarks of forecast combination approaches such as Support Vector Regression (SVR), the Least Absolute Shrinkage and Selection Operator (LASSO) and the Kalman Filter (KF).
Second, we take into account all the issues in previous studies as potentially affecting the reliability of trading results. The presence of 100,000 artificial traders in our experiment ensures forecasting model stability and lower sensitivity to random factors. All artificial traders learn from their experience, evaluating the profitability of trading rules based on their predictive power rather than in-sample fit.
We also avoid data-snooping biases by ensuring all trading rules are evaluated and executed by artificial traders.
Third, we observe that various heuristics enable artificial traders to be ecologically rational, making adaptive decisions that combine forecasting accuracy with speed. We have found that market participants learn, compete, adapt, survive and evolve toward a higher degree of sophistication. These findings suggest that our artificial stock markets can be modelled as evolving ecological systems which consist of large numbers of heterogeneous traders competing for profits on each market. Such ecological systems not only experience varying degrees of efficiency but also cycles of efficiency and adaptability as changes in profit opportunities lead to shifts in the composition of market participants. Therefore, we find strong evidence to show that market participants can be modeled in a way supportive of the Adaptive Market Hypothesis.
The remainder of this paper is organised in the following way: Section 2 comprises the literature review, while Section 3 presents the experimental design of the three artificial stock markets, the forecasting models and data utilised in this study. Section 4, reports the artificial agents' trading activity and profitability, while Section 5 presents the conclusions. Additional clarifying and technical material can be found in the Appendices.

2.1.The Adaptive Market Hypothesis (AMH)
The AMH, formulated by Andrew Lo (2004Lo ( , 2005, argues that many of the behavioural biases in finance are in fact consistent with an evolutionary model of investors learning and adapting to a changing environment. It is the impact of these evolutionary forces on financial institutions and market participants that determines the efficiency of markets, and the performance of investments, businesses and industries. The principles of AMH are that; (1) Individuals act in their own self-interest, (2) Individuals make mistakes, (3) Individuals learn and adapt, (4) Competition drives adaptation and innovation, (5) Natural selection shapes market ecology and (6) Evolution determines market dynamics. These principles offer a number of practical implications within finance. Firstly, the risk premium varies over time due to the stock market environment and demographics of investors in that environment. For example until recently, US markets were populated with investors who had never experienced a genuine bear market, which no doubt has shaped their aggregated risk preference. Thus irrespective of whether prices fully reflect all available information, the particular path dependency that market prices have taken over the past few years influences their current aggregate risk preferences. The second implication that is contrary to the EMH is that arbitrage opportunities do exist from time to time in the market. Lo (2004) cites Grossman and Stiglitz (1980) who observe that without such profit opportunities, there would be no incentive to gather information, and the price discovery aspect of financial markets would collapse. Thus from an evolutionary viewpoint, active liquid financial markets imply that profit opportunities must exist. However as they are exploited, they disappear. But new opportunities are continually being created as certain species of trader die out and rather than move towards a higher efficiency, the AMH implies that complex market dynamics such as trends, panics, bubbles and crashes are continually witnessed in natural market ecologies. The third implication is that investment strategies are successful or unsuccessful, depending on the particular market environment. Contrary to the EMH, the AMH implies that strategies may decline for a time, and then return to profitability when environmental conditions become more conducive to such trades. The final implication of the AMH is that characteristics such as value and growth may behave like 'risk factors' from time to time, that is, stocks with these characteristics may yield higher expected returns during periods when those attributes are in favour. Lo (2005) argues that convergence to equilibrium is neither guaranteed nor likely to occur, and that it is incorrect to assume that the market must move towards some ideal state of efficiency. Instead, the AMH relies on more complex market dynamics, such as cycles, trends, crashes and bubbles.
The AMH has gained substantially attention in the recent literature, with a number of papers supporting the hypothesis in developed stock markets (for example Ito and Sugiyama 2009;Kim et al. 2011;Urquhart and Hudson 2013;, developing stock markets (Smith 2012;Smith 2013a &2013b;Hull and McGroarty 2014;Smith 2016), foreign exchange markets (Charles et al. 2012;Levich and Poti 2015) and even precious metal markets (Charles et al. 2015;. Relatively little research has applied Genetic Programming to questions related to market efficiency and the AMH. In terms of testing market efficiency using genetic algorithms, Neely et al. (1997) provide evidence of the effectiveness of genetic algorithms in the foreign exchange markets although a subsequent paper by Neely (2003) is less supportive of their value in the stock market. Allen and Karajalainen (1999) use genetic algorithms to find technical trading rules using daily data for the S&P 500 index from 1928 to 1995. They find little evidence of economically significant rules although they do consider that more advanced genetic algorithms are worthy of investigation as are more liquid markets. The rapid advances in genetic algorithm design and reductions in market trading costs seen in recent years make revisiting this area seem appropriate. Manahov and Hudson (2014a) develop artificial stock markets using a special adaptive form of the Strongly Typed Genetic Programming based learning algorithm and apply it to data from the FTSE 100, S&P 500 and Russell 3000. They show that the stock market dynamics are consistent with the evolutionary process of the AMH since trader populations behave as an efficient adaptive system over time.

Forecasting with Genetic Programming (GP)
A number of studies have used GP to investigate trading rules and forecast stock markets with varying degrees of success. Allen and Karjalainen (1999) find that out-of-sample trading rules do not earn excess returns compared to a buy and hold strategy after taking into account transaction costs although they suggest their results were based on a very basic algorithm. Iba and Sasaki (1999) demonstrate that GP possess significant forecasting accuracy compared to neural networks in an experiment using data of Nikkei 225. Kaboudan (2000) implement GP to generate one-day-ahead forecasts and design a successful single day-trading strategy in which decisions are based on GP forecasts of lowest and highest daily prices. Becker and Seshadri (2003) used monthly data to design specific trading rules that were able to outperform a buy-and-hold strategy when dividends are not included in stock returns. Potvin et al. (2004) adopted GP to investigate the Canadian TSE 300 index and show that GP trading rules are accurate when the market falls or when it is stable.
Other studies apply GP to foreign exchange and derivative markets. Neely et al. (1997) examine the foreign exchange market with GP to find strong evidence of economically significant ex ante excess returns for six different exchange rates between 1981 and 1995.
In another study, Neely and Weller (2003) investigate the performance of GP using half-hourly highfrequency data of four currency pairs and found no profitability. Chen et al. (1998) andChidambaran et al. (1998) use GP to develop option pricing mechanisms which preform much better than the Black-Scholes formula. Bhattacharyya et al. (2002) obtained one-hour data for the USD/DEM currency pair and GP mechanism to find reliable predictive performance of the model. Banzhalf et al. (1997), Chen andYeh (1997) and Farnsworth et al. (2004) provide additional evidence that forecasting and trading can be performed successfully using GP.
In summary, recent empirical studies in a wide variety of markets find strong evidence consistent with the AMH, which supports the theoretical discussion of Soufian et al. (2014) who argue that the AMH gives a theoretical basis for a new financial paradigm which better describe the behaviour of financial markets. In contrast, GP has, as yet, been little used to explore these issues.

Experimental design
We use a special adaptive form of the Strongly Typed Genetic Programming (STGP), similar to Manahov et al. (2014b) and Manahov (2016), which enables us to choose and adjust different parameters to suit our specification, such as the minimum price increment, number of participants (agents) and their wealth, the level of transaction costs, and differing trading preferences. The exact number of evolutionary parameters that we can specify is listed in Table 1 2 . Each market participant represents an artificial trader who is equipped with their own trading rule. Each trader has initial wealth of $100,000 making it meaningful to consider subsequent profits in monetary terms. The selection of the best performing traders and the production of the new genomes is conducted through the recombination of the parent genomes by crossover and mutation operations, which are further elaborated in the Appendices. The basic idea is that the trader's trading rule will improve by a natural selection process based on the survival of the fittest. Therefore the evolutionary nature of the trading process and price dynamics enable the artificial traders to recognize, learn and exploit profit opportunities while continually adapting to the changing market conditions. Consequently, the STGP trading algorithm evolves the model step-by-step by feeding it with real time millisecond quotes of Apple, Exxon Mobil and Google stocks, and therefore the forecasting models evolve by referencing to the characteristics of the actual markets studied.

The process of developing the initial trading rules
Each individual trader has only one trading rule (examples of trading rules for both trading groups can be found in the Appendices) which are initially created randomly enabling the whole range of possible trading rules to be studied. To create later generations, we apply the crossover recombination technique and mutation operation, where the crossover recombination technique randomly chooses parts of two trading rules to exchange in order to create two new trading rules, and the mutation operation randomly changes a small part of a trading rule (Dempster and Jones, 2001). This process is repeated until at least one trading rule in the population achieves the desired level of fitness, measured by a trader's investment return over a specified period. Each trading rule in our artificial stock market setting take real-time millisecond prices of Apple, Exxon Mobil and Google and generate advice which consists of the desired position which is defined as a percentage of the trader's wealth and an order limit price for buying and selling the financial instrument 3 . The logic of the trading rules uses information on price and volume, minimum, maximum and average functions related to millisecond price and trading volume data, and different logical and comparison operators. In the conventional Genetic Programming (GP) procedure, trading rules are evaluated by the same fitness function in each generation. In contrasts, the STGP evaluates the fitness of traders through a dynamic fitness function, which enables the return estimation period to move forward and include the most recent quotes in the markets. Sermpinis et al., (2015) notes that having a novel fitness function is crucial in financial modelling, where statistical accuracy does not always correspond to financial profitability of the deriving forecasts. Also, while the GP replaces the entire genetic population through crossover and mutation techniques at a time, STGP only replaces a small proportion of the entire population which ensures a gradual change in population and thus greater model stability 4 (Manahov, 2016).
Another important feature of the STGP is that each trader discovers the intrinsic value of each stock individually without any communication between traders, ensuring individuality and that the level of intelligence of each artificial trader is not affected by other traders. To more realistically model the features of HFT we consider two groups of traders with different properties: scalpers and strategic informed traders as discussed in sub-section 3.2.

Structure of the artificial stock market and the differences between HFT scalpers and strategic informed traders
We examine the profitability of HFT strategies and their relationship with the Adaptive Market Hypothesis (AMH) within the context of a number of markets which are populated by up to 100,000 bounded rational traders. We develop different artificial stock markets for each of the three financial instruments and all artificial traders are created using the STGP programming technique documented in the Appendices. None of the traders are orientated towards any particular strategy and are therefore free to develop and continually evolve better trading rules over time.
All of the traders constantly learn, adapt and evolve and the survival-of-the-fittest principle in place means the elimination of the worst performing traders, where performance is measured by their

Breeding Fitness Return
The Breeding Fitness Return is a trailing return which is calculated over the n quotes of data of an exponential moving average of traders' wealth, where n is set to the minimum breeding age with a maximum of 250 5 . This return is used to measure the fitness criterion for the selection of traders to breed, where breeding is a process of creating new artificial traders to replace poor performing ones.
All artificial traders accumulate wealth by investing in the financial instruments available, namely the three risky stocks and the risk-free instrument represented by cash. Since the markets continuously evolve, artificial traders who perform well will become wealthier which will positively influence the forecasting accuracy of the model. We follow Manahov (2016) to estimate the artificial trader wealth: (1) where is the wealth accumulated by trader in period ; and represents the money and the amount of security held by artificial trader respectively, in period , and is the price of the asset in period . We incorporate two different types of trader in our analysis: scalpers and strategic informed traders. The main difference between the two trading groups is that the HFT scalpers group consists of traders that momentarily perform best in terms of the continuous Breeding Fitness Return, and therefore possess lower latency. Although the strategic informed traders and HFT scalpers both observe the same millisecond data of the three stocks and generate trading orders, HFT scalpers are able to access and process the data first due to their low latency features. In other words, HFT scalpers are able to foresee the quotes of the three financial instruments and submit trading orders before strategic informed traders.
Another difference between the two trading groups is that HFT scalpers are designed to capture and avoid sweep risk, which is the risk related to trading against large informed toxic orders (for instance, large institutional orders) positioned at multiple levels of the order book (Manahov,2016). This means that HFT scalpers are highly risk adverse and will not hold what they perceive as potentially risky positions, and aim for high frequency turnover to neutralise unanticipated adverse flows. Large losses due to sweeps can mitigate any profitability, so management of sweep risk is of paramount importance for HFT scalpers.

The clearing mechanism and order generation for the three artificial stock markets
The three artificial stock markets are simulated double auction markets, where all the buy and sell orders are collected. The artificial traders receive real-time quotes of Apple, Exxon Mobil and Google and evaluate their trading rule and subsequently calculate the number of shares they need to buy or sell. If shares need to be bought or sold, an order is generated to buy or sell the required amount determined by the specified limit price. For example, if a trader holds 2,000 shares of Google which is priced at $22 and has $70,000 in cash, their wealth is $114,000 and their position in Google is 38.5%. If the 5 In the case where the age is less than n, no value is calculated.
trading rule generates a signal of a position of 60% and a limit price of $22, the limit order will be produced to purchase 1,109 6 additional Google shares with a price of $22. The three artificial markets then calculate the clearing price and all trading orders are executed at the clearing price which is where the highest trading volume from limit orders can be matched. In cases when the clearing price can be matched at multiple price levels, then the clearing price is the average of the lowest and highest of those prices. The number of shares purchased by traders is always equal to the number sold by other traders and if the number of shares offered and the number of shares asked are not equal, the remaining orders will be partially executed. Therefore the orders at the clearing price will be selected for execution with priority for market orders over limit orders, and then on a first-in-first-out (FIFO) basis. In the unlikely event of no matching limit orders, no market orders are executed and the artificial stock market price will be the price of the previous quote (Manahov,2016).

Benchmark forecasting methods.
We follow Sermpinis et al., (2014) in our selection of benchmark forecasting methods. We implement hybrid forecasting models including Support Vector Regression (SVR), Least Absolute Shrinkage and Selection Operator (LASSO) and the Kalman Filter as benchmark forecasting models because they offer linear as well as nonlinear modelling features. The idea of combining different forecasting models to improve prediction accuracy originates from Bates and Granger (1969), who suggest that forecast combinations can be viewed as a simple and effective way to improve and justify the forecasting ability over that provided by individual models.

Support vector regression (SVR).
SVR is based on the statistical learning theory which is related to the properties of learning machines enabling them to generalise effectively in response to data. The associated learning algorithms are able to analyse data used for classification and regression analysis. In simple terms, when used for regression, the method aims to find a function that differs from the dependent variable by less than a certain amount whilst being as simple ('flat') as possible. SVR functionality is based on the computation of a linear regression function in a high dimensional feature space where the input data are mapped via a nonlinear function (Basak et al.2007). The main advantage of SVR is that instead of minimising the observed training error, SVR attempts to minimise the generalized error bound so as to achieve enhanced forecasting performance. Suykens et al., (2002) argue that SVR forecasting models do not experience multiple local minima and are capable of dealing with the highdimensional, noisy and complex problems that are found with millisecond data. Vapnik (1995) introduced the ε-sensitive loss function making SVR a robust technique for constructing datadriven and nonlinear regression models. According to Sermpinis et al., (2014) SRV forecasts offer a balance between model accuracy and model complexity.
Every SVR consists of ε-SVR and v-SVR algorithms. To describe the ε-SVR algorithm we consider the and is the total number of trading samples. The SVR function can be expressed as: ( 2) where and represent the two regression parameter vectors of SVR and represents the nonlinear function that positions the input data vector to a feature space where the data used for training is linear. Equation (2) can be represented as a dial problem and its solution is based on the introduction of two Lagrange multipliers , and mapping with the kernel function : where , Factor b in Equation (3) has been estimated following the Karush -Kuhn -Tucker conditions (Vapnik, 1995).
The v-SVR algorithm consists of the ε parameter and the new parameter . In terms of v-SVR we can transform the optimization problem to: where and are two slack variables and represent the complexity (flatness) of the model. The solution of the quadratic optimization problem described in Equation (4) can be represented in the following way: where , Parameter selection is an important issue in SVR, with the lack of information related to the noise of the trading datasets making priori ε-margin setting a difficult task (Sermpinis et al., 2014). Prior studies have highlighted four different approaches to deal with this issue, with Trafalis and Ince (2000) suggesting setting ε as a non-negative constant (ε=0 or alternatively equal to a very small number), whereas Smola et al. (1998) recommend choosing ε based on the statistical efficiency of a location parameter maximization. Cao et al. (2003) advise that we need to determine ε by implementing the crossvalidation technique, while  argue that ε can be controlled via v-SVR. In our study we use the kernel function K, and therefore we need to estimate two kernelindependent parameters (v and C) and the RBF kernel parameter : , We have implemented the RBF kernels in our experiment because they effectively overcome the issue of overfitting of data and are very precise in directional accuracy. In order to determine the optimal value for the two kernelindependent parameters (v and C), we follow Cherkassky and Ma (2004) standard parameterization of the SVR solution: For , we estimate the upper level of the SVR function as .Hence, we calculate C as: where , represent the mean and the standard deviation of the millisecond data. Similarly to Sermpinis et al., (2014) we use 5-fold crossvalidation to calculate the values of and .

Least absolute shrinkage and selection operator (LASSO).
LASSO is a regression analysis technique that implements variable selection and regularisation in order to improve the prediction accuracy and interpretability of the prediction model it produces. In broad terms, it modifies the model fitting process to select only a subset of the possible independent variables compared to traditional regression methods. LASSO improves prediction error by shrinking large regression coefficients which reduces overfitting, which is a major pitfall in forecasting. Essentially LASSO works by forcing the sum of the absolute value of the regression coefficients to be less than a fixed value.
Consequently certain coefficients become set to zero and LASSO chooses a simpler predictive model that does not include those coefficients. We estimate the LASSO coefficients by using Breiman's nonnegative garrotte minimization process (Yuan and Lin, 2007): where is the tuning parameter which monitor the amount of shrinkage applied to the coefficients.

Kalman Filter (KF).
The KF is very powerful tool for data filtering and predicting the future state of a system based on the previous ones. The KF uses different measurements of a group of variables observed over time, partly consisting of statistical noise and other inaccuracies, and generates estimates of unknown values of the variables that are more precise than those based on a single measurement alone, by computing a joint probability distribution over the variables for each timeframe. The KF is essentially a set of mathematical equations that consists of predictor-corrector type estimator that is optimal when some presumed conditions are met. The KF infers parameters of interest from indirect and uncertain observations and generates good results in practice due to optimality and structure. The KF in our experiment consists of the following measurement and state equations: , , where represent the dependent variable or the combination forecast at time ; denote the independent variables (individual forecasts) at time ; describe the timevarying coefficients at time ; and measure the two noise terms.
We calculate all in time, with the log-likelihood of the model taking into account observations up to time . We then apply a numerical optimization algorithm based on to maximize the likelihood function.

Description of data and transaction costs.
We When presenting out-of-sample results, the sample split which identify the beginning of the out-ofsample period is a choice variable. Hansen and Timmermann (2012) argue that there are no broadly accepted recommendations on how to define the sample split. Forecasters have adopted different practical approaches. Hyndman and Athanasopoulos (2013) suggest that the in-sample dataset should be approximately 80% of the entire data with the remaining 20% allocated to out-of-sample evaluation, although the values depend on how long the entire sample is and how far ahead one would like to forecast. Another popular approach is to select the initial estimation sample to have a minimum length with the remaining data used for out-of-sample evaluation. For instance, Marcellino, Stock and Watson (2006) and Pesaran, Pick and Timmermann (2011) implement the first 20 years of the dataset, when available, to compute the forecasting models for different macroeconomic variables. Another approach is to do the opposite and select a specific sample length for the out-of-sample period (Inoue and Kilian, 2008). Alternatively, Welch and Goyal (2008) and Rapach, Strauss and Zhou (2010) suggest that forecasters may examine multiple out-of-sample periods and observe the forecasting performance of each.
However, all of the above approaches depend on ad-hoc choices of the individual split points. In addition, Hansen and Timmermann (2012) argue that multiple split points could lead to data-mining issue because the reported values could be those that favor a given model. Therefore, we adopted the approach of Hyndman and Athanasopoulos (2013)  is and what the forecasting horizon is. Since our sample consists of 1,742,958 observations in total for the three financial instruments, we allocated slightly more than 20% of the millisecond dataset to the out-of-sample evaluation period. Apple Inc, the most traded of the three stocks in our study has 26.8% or 198,818 out of 742,008 observations for out-of-sample evaluation, followed by Exxon Mobil with 21% and Google with 20%. Narang (2013) suggest that the typical round trip transaction costs in HFT are $0.003 per share. We employ transaction costs of $0.004 for our profit calculations. Although slightly higher than the current standards, the level of transaction costs used is conservative and takes into account the operational costs of HFT companies such as investments in technology, data and collection fees, and salaries. These include software platforms, labour and risk management systems but does not include co-location of services due to price differences in various trading venues.

HFT predictability and profitability.
In this section we measure the forecasting accuracy of our models 7 . Table 2 reports the insample and outofsample descriptive statistics for the three financial instruments. It is evident that stock prices are slightly skewed for nearly all insample and out-of-sample observations. Due to the fact that both sample sizes for the three stocks are large, an important issue that occurs is Lindley's paradox (Lindley, 1957). This paradox can lead to overstatement of statistical significance and a tendency to reject the null hypothesis even when the posterior odds favour the null. A number of studies since Lindley have recognised that as sample size increases, there is a tendency to reject the null hypothesis unless the significance level is adjusted downwards. Connolly (1989) proposes the following solution to the Lindley's paradox and estimation of sample size adjusted critical values for t statistics 8 : where T is the sample size; k measure the number of estimated parameters. The corresponding Bayesian sample size adjusted critical F-values are: where T is the sample size; k is the number of coefficients estimated and p is the number of coefficients restricted under the null hypothesis. If the estimated test statistic exceeds the appropriate critical value from (12) or (13), the sample evidence is in favour of the alternative hypothesis. According to Connolly, if the prior odds are, for example, 3 to 1, the right-hand side of (12) or (13) must be multiplied by 3, because the posterior to favour the alternative hypothesis should take at least three times the F-statistic.
One of the properties of high frequency millisecond data is the high presence of no price changes even in very frequently traded stocks like Apple, Exxon Mobil and Google. We account for this market inactivity by modifying the Student's t distribution associated with the standardized residuals:

If (14)
Or 7 We extracted the data from STGP algorithm. To avoid the statistics being affected by the initialisation process, the first 2500 quotes of all three financial instruments were omitted from empirical testing. We consider the first 2500 quotes of historical data as a training period during which the model may show initially chaotic behavior. 8 Connolly (1989) used US data to examine the robustness of the day-of-the-week (DOW) effect and weekend effect to alternative estimation and testing procedures. The results suggested that sample size can distort the interpretation of classical test statistics unless the significance level is adjusted downward.
where measure the Student's density function; measure market inactivity as follows: If , the forecast for (Meade, 2002).
In order to evaluate statistically the forecasting abilities of our models, we compute the RMSE, the MAE and the MAPE. For all three of the error statistics, the lower the output, the better the forecasting accuracy of the model: where represent the actual values and is the forecasted values.
A direct in-sample and out-of-sample comparison between traditional forecast combination models and models designed using the STGP indicate the superiority of the latter. The RMSE, MAE and MAPE for STGP are significantly smaller than the errors produced by the SVR, LASSO and Kalman Filter (Table   3, Panels A and B). From Table 3, we note that that STGP presents the best statistical results in all insample and out-of-sample periods for the three stocks. Artificial agents trading Apple's shares are the best performer both in-sample and out-of-sample periods as measured by the lowest forecasting errors.
Moreover, we employ Giacomini and White (2006) unconditional predictive ability approach for outof-sample predictive ability testing and forecast selection. This approach investigates which forecasting method rather than which model is more accurate on average. The main advantage of Giacomini and White's approach is the unified treatment of both nested and nonnested models. The comparison between nested models is important because it is often of interest to tests whether forecasts generated by a given model can outperform those generated by a nested benchmark model. The unconditional predictive ability approach is the Diebold and Mariano (1995) test extended to an environment that   Table 4, the numbers within parentheses are the ratios for mean squared forecast errors (MSEs) for the method in the column relative to the method in the row and a plus (minus) sign indicates that the method in the column outperforms (underperforms) the method in the row at the 10% significance level, as evidenced by a relative MSE less (greater) than 1. The rows labelled 'Bound' accommodate Hochberg's (1988) modified Bonferroni p-value bounds for testing the multiple hypothesis that all pairwise comparisons are zero for a specific column reference method 9 .
The square brackets in Table 4 represent Hochberg's (1988) p-value bound for the hypothesis that all pairwise comparisons are zero for that panel. Table 4 shows the statistical superiority of the STGP method in all comparisons with the SVR, LASSO and Kalman Filter methods for the three financial instruments. For instance, equal unconditional forecasting ability of the STGP method and the LASSO method is rejected with a p-value of 0.003 and the STGP method outperforms the LASSO method with a MSE ratio of 0.6 for Apple. In contrast, the Kalman Filter is characterised by the worst forecasting performance for Apple, Exxon Mobil and Google. Hochberg-Bonferroni modified p-values does not change the conclusions that emerge from the pairwise testing.
The RMSE, MAE and MAPE are all important error measures, yet they may not correspond to attainable profits from trading rules. We therefore calculate the daily profits generated by the best statistical performersartificial traders designed on the STGP basis for the most traded five days in each month from February to June, 2014 10 . Daily profits for each market participant, , are estimated for each trading day, , based on marked-to-market accounting, taking into consideration the fact that every artificial trader begins each trading day with a zero inventory position. All 100,000 artificial traders end the trading day with a zero inventory and therefore a marking-to-market technique at the end of the day is an appropriate method for estimation of daily profits (Baron et al., 2012). More specifically, we calculate the end of the day profits for all traders as the cumulative cash received from selling short positions minus the cash gained from buying long positions, plus the value of any outstanding positions at the end of the trading day, marked to the market price of Apple, Exxon Mobil and Google at close of trading: 9 Hochberg (1998) suggested ordering the p-values from testing r hypotheses as p(1),….,p(r) and estimating the bound as Bound=minj=1,….,r(r-j+1)p(j). 10 Due to space constraints the full trading volume report is available upon request from the authors. Round trip transaction costs of $0.004 per share are taken into account. Table 5 indicates that STGP forecasting models generate out-of-sample daily profits ranging from $57 on 2 nd of February, 2014 to $224 recorded on 25 th of June, 2014. We observe substantial out-of-sample profits produced by our artificial traders indicating high external validity of our STGP evolving forecasting models. This finding suggests that the out-of-sample tests are free from data over-fitting, which represents the biggest forecasting pitfall, providing a more powerful framework to evaluate the performance of the competing forecasting models.

HFT and the AMH.
The AMH postulates that learning, competition, and evolutionary selection pressures govern financial markets. Market participants are no longer the rational beings of the standard paradigm, but rather boundedly rational 'satisfiers' (in the terminology of Simon (1995)). The main advantage of our artificial stock market is that we are able to directly observe traders' behavior at the individual level and demonstrate how market participants compete, learn, adapt, survive and evolve through time. The AMH has several relevant predictions for the present study. First, we provide evidence suggesting that profit opportunities exist in financial markets. Our findings indicate that profit opportunities are better modelled through an evolutionary process involving natural selection rather than using analytical econometric models. On the one hand, the process of natural selection ensures the survival-of-thefittest, that is, the need to better cope with changing market conditions, market dynamics and trading opportunities. On the other hand, natural selection in our experiment determines the number of market participants (the worst performing traders are eliminated from the experiment and replaced by new ones) and trading strategies. In other words, natural selection is in place to select the fittest market participants within an evolutionary framework in which artificial traders interact and evolve dynamically to use the best trading strategies.
Under these circumstances our artificial traders compete, learn, adapt and evolve. Second, market participants learn from trialanderror, compete and adapt to a constantly changing market environment, survive and evolve. Artificial traders evolve from their initial random trading strategies , , , , 1 towards trading strategies with a higher degree of sophistication. To examine this process, we would need to provide a definition of complexity. Since the behavior of all artificial traders in our experiment is characterized by their forecasting models, which are in the format of STGP, we can easily formulate two definitions of traders' complexity. While the first definition is based on the number of nodes within the GP tree, the second definition is based on the depth of the tree. On each trading day from 3 rd of February to 30 th of June, 2014, we have a profile of the evolved STGPtrees for 100,000 traders . Since all forecasting models are in the format of STGP parse trees, the sophistication of traders' strategies can be measured by the associated depth. According to Chen and Yeh (2002) Our profitability comparisons show that a higher degree of traders' sophistication is associated with higher profitability of trading rules. For instance, artificial traders generated profits of $57 at the initial stage of the experiment, while more sophisticated trading rules retuned $224 on the last day of the experiment highlighting the existence of learning and that evolutionary processes are in place. We take naïve strategy such as martingale to be a benchmark for comparison with the observed behaviour of our artificial traders. For both measures of sophistication, the complexity degree of the martingale model is 1. Consistent with Chen and Yeh (2002), we find no evidence that traders' behaviour will converge to the simple martingale model.
The AMH suggest that trading rules became less profitable as markets became aware of their existence.
We empirically test this assumption within the millisecond timeframe by applying spectral analysis to the out-of-sample profits for the three stocks. We follow Hasbrouck and Sofianos (1993) Spectral analysis sees marked-to-market profits as a function of two different time series, prices and the level of inventory, which can vary at different frequencies. Similar to Baron et al. (2012), we implement Fourier analysis to decompose prices and inventories into groups of different frequencies. In the case when the two time series, prices and inventories are in the same phase (artificial traders buy before the price of the three financial instruments increases) they generate profits. If the two time series are not in a phase (artificial traders buy before the price of Apple, Exxon Mobil and Google decreases) they experience losses. Marked-to-market profits for HFT scalpers can be expressed as: where represents the inventory holdings of HFT scalpers at time and is the price of Apple, Exxon Mobil and Google at time .
One of the requirements of the spectral analysis is the stationarity of and . This requirement has been satisfied because HFT scalpers' inventories is a mean-reverting process and the first difference of the prices process denoted as is a martingale difference sequence. We follow Baron et al. (2012) and develop the following two functions: where represents the frequency of different groups; and are the two spectral densities of the and . We apply Fourier analysis to Equation (26) and achieve the following: where Real represents a function that takes a real part in a complex number; is the component of the marked-to-market profits generated by the artificial traders at frequency .
The second equality in Equation (27) is a result based on the fact that an imaginary part of is equal to zero. Table 6 shows that in the outofsample period under investigation, the artificial traders make the largest profits of $301 at the very short interval between 0 and 500 milliseconds and the smallest profits of $10 at the longest time scale between 2,801 and 3,000 milliseconds. Hence, we can conclude that market efficiency measured at millisecond timeframes varies over time and trading rules became less profitable as markets became aware of their existence, consistent with the AMH. Moreover, the timeframe over which artificial traders generate their profits provides more specific details about their trading strategies. It seems that the artificial traders do not try to infer the long-term fundamental value of the four financial instruments but emphasize entirely capturing short-term price dynamics.
The artificial traders' profits are not determined by the difference between their entry price and the fundamental value of the three assets, but by the difference between their entry and exit prices.
Consistent with Carrion (2013), we observe that the fastest HFT scalpers generate a larger fraction of profits, indicating that they possess better models than strategic informed traders for predicting intraday price evolution. Our empirical findings suggest several broad conclusions. We observe that profitable opportunities exist and they persist for considerable periods and are not the result of data mining.
Empirical findings based on spectral analysis indicate that trading rules became less profitable as markets became aware of their existence. The concept of 'ecological rationality' developed by Todd and Gigerenzer (2003) offers a reasonable explanation as to why artificial traders equipped with relatively simple decision making mechanism perform well in forecasting exercises. It seems that this forecasting success is due to the fact that the structure of the mechanism is adapted to the structure of the information in the stock market environment. Different heuristics enable artificial traders to be ecologically rational, making adaptive decisions that combine forecasting accuracy with speed. The 'ecological rationality' phenomenon describes the structure of the market environment as well as the structure of the heuristics and the match between them. All 100,000 artificial traders in our experiment use relatively simple heuristics for forecasting purposes in an ecologically rational way, using rather limited information based on the historical quotes of the three financial instruments. Moreover, we have found that market participants learn, compete, adapt, survive and evolve toward a higher degree of sophistication. These findings suggest that our artificial stock markets can be viewed as coevolving ecological systems which consist of 100,000 heterogeneous traders competing for profits in each market. Such ecological systems not only experiences varying degrees of efficiency but also cycles of efficiency and adaptability as changes in profit opportunities are leading to shifts in the composition of market participants. All these findings are consistent with Blume and Easley (1992) who argue that the 'market selection hypothesis' which is governed by natural selection and survival of economic agents provide a better representation of market efficiency.
Furthermore, the concept of 'ecological rationality' indicate that the relationship between market efficiency and the rational 'economic agent' postulated by the traditional economic theory is rather inappropriate.

Robustness checks.
Robustness checks are necessary for our main results. We obtain historical millisecond data for Russell A direct in-sample and out-of-sample comparison between traditional forecast combination models and models designed using the STGP indicate the superiority of the latter (Tables 7 and 8, Panels A and B).
All forecasting errors for STGP are significantly smaller than the errors produced by the SVR, LASSO and Kalman Filtet. Similarly to our experiment with the three individual stocks, we note that that STGP presents the best statistical results in all in-sample and out-of-sample periods for Russell 1000 and Russell 2000, even when we increased the out-of-sample period. This demonstrates the stability of our empirical findings.

5.Conclusion
Despite the large number of studies on the EMH both from developing and developed markets, a consensus on whether financial markets are efficient continues to be elusive. Although there is striking evidence that stock prices do not follow random walks and possess some degree of predictability, there is a lack of strong alternative theoretical explanations to the EMH. Lo (2004)  conditions. We aim to fill this gap by evaluating the evolution of stock price predictability at the millisecond timeframe and examining whether its evolution is consistent with the notion of the AMH.
Our forecasting exercise shows that a direct in-sample and out-of-sample comparison between traditional forecast combination models and models designed using STGP indicate the superiority of the latter. The RMSE, MAE and MAPE for STGP are significantly smaller than the errors produced by the benchmark forecasting models such as SVR, LASSO and the Kalman Filter. STGP gives the best statistical results in all insample and outofsample periods for Apple, Exxon Mobil and Google measured by the lowest forecasting errors. Moreover, we have found that STGP forecasting models generate in -sample daily profits ranging from $57 on 2 nd of February, 2014 to $177 recorded on 29 th of May, 2014. We also observe substantial out-of-sample profits generated by our artificial traders. Our findings indicate that profit opportunities are not best identified analytically, but through an evolutionary process involving natural selection. The process of natural selection ensures the survivalof-the-fittest, that is, the ability to cope better with changing market conditions, market dynamics and trading opportunities. Natural selection in our experiment also determines the nature of market participants (the worst performing traders are eliminated from the experiment and replaced by new ones) and trading strategies. In other words, natural selection is in place to select the fittest market participants within an evolutionary framework in which artificial traders interact and evolve dynamically. We observe that our artificial traders evolve their initial random trading strategies toward trading strategies with a higher degree of sophistication. Different heuristics enable artificial traders to be ecologically rational, making adaptive decisions that combine forecasting accuracy with speed.
These findings suggest that our artificial stock markets can be viewed as evolving ecological systems which consist of 100,000 heterogeneous traders competing for profits on each market. Therefore, our empirical findings indicate that the relationship between market efficiency and the rational 'economic agent' governed by the traditional economic theory is rather inappropriate and financial markets should be viewed from a Darwinian survivalofthe fittest perspective and, more specifically within an evolutionary framework in which markets, financial instruments, and traders interact, compete, learn, adapt and evolve dynamically, as suggested by the AMH.
However, it should be noted that there is a difference between human intelligence and artificial trader intelligence. Artificial traders are programmed to obey orders and perform certain tasks as per the commands given to them. Moreover, artificial traders are lacking feelings and emotions while human beings can feel various emotions and also express these emotions to others. The main limitation of this particular experiment comes from the fact is that the computational memory of artificial traders is rather short-term in nature. Short-term memory is the type of memory that is used to manipulate information. Engle et al. (1999) demonstrates a weak relationship between short-term memory and human intelligence indicating the limitations of cognitive abilities of artificial traders. Tables   Table 1: Artificial stock market parameter settings. a denotes the maximum genome size measure the total number of nodes in a trader's trading rule. A node is a gene in the genome such as a function or a value. b denotes the maximum genome depth measuring the highest number of hierarchical levels that occurs in an trader's genome (trading rule). The depth of a trading rule can be an indicator of its complexity. c is the minimum age required for agents to qualify for potential participation in the initial selection. The age of a trader is represented by the number of quotes that have been processed since the trader was created. This measure also specifies the period over which agent performance will be compared. Our minimum breeding age is set to 80, which means that the trader's performance after the last 80 quotes will be compared. d is where 5% of the best performing traders of the initial selection that will act as parents in crossover operations for creating new traders.

Genetic Programming (GP) and Strongly Typed Genetic Programming (STGP)
GP is a machine-learning method to automate the development of computer programs in terms of natural evolution process (Banzhaf et al., 1998). One population of programs is replaced by a next generation of programs that are better suited to performing the designated task. This is done by using genetic algorithms that are based on evolutionary principles. In the GP process, individuals in a population compete and mate with each other in order to generate increasingly stronger individuals (Lien et al., 2003). In GP parse trees (also known as tree genomes) are implemented, rather than lines of program code, as the basis of the algorithms. For instance, Figure 1 illustrates a tree that uses the input variables x = (a,b,c) to calculate the mathematical expression a -2b + 9/c. The parse trees represent the trading rules of HFT scalpers and strategic informed traders in our experiment. The typical genetic structure of the trading rule consists of hundreds nodes and is rather complex to write out, but it can be simplified to an equivalent algorithmic trading rules as shown below.    Figure 3 indicates that the trading rule of HFT scalpers sends a buy signal if the average stock price over the past 1 millisecond is greater than the current price and the current volume is less than 500 while a sell signal is sent otherwise. The current volume function protects HFT scalpers from sweep risk exposure. Large losses caused by sweeps (adverse price movements against HFT scalpers transient positions) can substantially reduce or even eliminate their profitability, so the management of sweep risk is of paramount importance for HFT scalpers. HFT scalpers use the market microstructure to capture and avoid sweep risk, which is the risk related to trading against large informed toxic orders (for instance, large institutional orders) positioned at multiple levels of the order book (Manahov, 2016).
STGP is an advanced version of GP with more sophisticated application of generic functions and data types. In conventional GP there are no intrinsic constraints on the data types that can be used as nodes. This means the user must specify all the programs and variables that can be used as nodes in a parse tree and deal with a search space of the order 10 30 -10 40 (Montana,1995). STGP is set up so that there is an allotted type for each constant and each variable. Thus, there is a requirement in STGP that the type of data for each element is specified in advance. As a result the initialization process and the different genetic operators will only support the construction of syntactically correct (i.e. legitimate) trees, and this is beneficial to the process because the search space is reduced (Haynes et al., 1996). STGP effectively reduces the state-space size which must be searched. The STGP search space is represented by the set of all legitimate parse trees, which means that all functions have the correct number of parameters of the correct type. The STGP parse tree is also usually limited to some maximum depth (in our experiments this is set to 20) which keeps the search space finite and manageable by preventing trees from growing to an excessive size.
Both GP and STGP processes follows a series of steps: 1. An initial population of programs is created randomly each one represented by a different parse tree. The random nature of the initial rules ensures a large variety of possible trading rules is investigated in the experiment.
2. These programs are then run and the programs producing the best solutions in the first generation are permitted to 'breed' and produce the next generation of programs. The best solutions are those that most closely match a specified fitness criteria. Because every pair of parents creates two offspring trading rules, the number of parents and the number of offspring is equal.
3. The newly created trading rules replace poorly performing trading rules in the initial selection based on the replacement fitness return. The breeding of next generations of programs continues until the solution reached is deemed sufficiently adequate.
The STGP process produces syntactically correct programs or trading rules as the generations progress through crossover and mutation. Crossover is the outcome of two parent programs having been selected as suitable to procreate. After the selection of two parent programs, a cross over point is determined by randomly selecting a functional or terminal node within each parent and producing a child program which consists of random elements of the parent programs (How et al,. 2009). Mutation occurs when an element of the parent program is randomly modified during breeding with resultant effect upon the child program. In our experiment, the crossover process can be described in the following way ( Figure 4). The trading strategies Si and Sj are the two parents and a onepoint crossover has been used to create children (new trading rules) SK and SI. Therefore, a breaking point has been randomly selected and parts of the two parent trading rules are exchanged to create two new trading rules.