Different distance measures for fuzzy linear regression with Monte Carlo methods

The aim of this study was to determine the best distance measure for estimating the fuzzy linear regression model parameters with Monte Carlo (MC) methods. It is pointed out that only one distance measure is used for fuzzy linear regression with MC methods within the literature. Therefore, three different definitions of distance measure between two fuzzy numbers are introduced. Estimation accuracies of existing and proposed distance measures are explored with the simulation study. Distance measures are compared to each other in terms of estimation accuracy; hence this study demonstrates that the best distance measures to estimate fuzzy linear regression model parameters with MC methods are the distance measures defined by Kaufmann and Gupta (Introduction to fuzzy arithmetic theory and applications. Van Nostrand Reinhold, New York, 1991), Heilpern-2 (Fuzzy Sets Syst 91(2):259–268, 1997) and Chen and Hsieh (Aust J Intell Inf Process Syst 6(4):217–229, 2000). One the other hand, the worst distance measure is the distance measure used by Abdalla and Buckley (Soft Comput 11:991–996, 2007; Soft Comput 12:463–468, 2008). These results would be useful to enrich the studies that have already focused on fuzzy linear regression models.


Introduction
In many cases in real life, most of the data are approximately known.Fuzzy set theory introduced by Zadeh (1965) has found important application areas in different field of science as well as in regression analysis, because fuzzy set theory helps to define the vague relationship between variables or the observations that are reported as imprecise quantities for regression analysis.
Fuzzy regression analysis has been introduced by Tanaka et al. (1982).After the first attempt of using fuzzy sets in regression analysis, there have been several different approaches for the parameter estimation of fuzzy regression analysis.Bardossy (1990) introduced how the problem of fuzzy regression can be formulated as a mathematical problem.Tanaka and Lee (1999) considered interval regression analysis.Another approach to fuzzy regression is fuzzy least squares approach which is proposed by Diamond (1987).Peters (1994), Luczynski and Matloka (1995), Tanaka et al. (1995), and Yen et al. (1999) are some of the authors who focused on crisp input and fuzzy output regression models.D' Urso (2003) carried out fuzzy linear regression analysis for fuzzy/crisp input and fuzzy/crisp output data.Moreover, Roh et al. (2012) presented a new estimation approach based on polynomial neural networks for fuzzy linear regression.Recently, a generalized maximum entropy estimation approach to fuzzy regression model is introduced by Ciavolinoa and Calcagni (2016).Both of the above approaches to fuzzy regression require complex mathematical operations and long calculations.
Another approach to fuzzy regression is introduced by Abdalla andBuckley (2007, 2008) using MC methods.In this method, several random crisp or fuzzy vectors are generated as regression coefficient vector.Then using these random vectors, the dependent variable is calculated.Two error measures are obtained by the difference of observed and estimated values of dependent variable to decide the best random vector for parameter estimation.One of these error measures depends on the error measure defined by Kim and Bishu (1998).In this error measure, distance of two fuzzy numbers has to be calculated.Therefore, distance measure between two fuzzy numbers plays a key role in estimating fuzzy linear regression model parameters with MC methods.However, current studies about MC methods in fuzzy linear regression within the literature do not account for different definitions of distance measure between fuzzy numbers.
The main contribution of this study to literature is to figure out the appropriate distance measure between two fuzzy numbers for the estimation of fuzzy linear regression model parameters with MC methods.Therefore, different definitions of distance measure between two fuzzy numbers introduced by Kaufmann and Gupta (1991), Heilpern (1997), Chen and Hsieh (2000) are used in the error measure used by Abdalla andBuckley (2007, 2008).A simulation study is conducted to evaluate estimation accuracy of new and existing distance measures.The best distance measure and the one that should not be used for the estimation of fuzzy linear regression parameters with MC methods are identified without using any mathematical programming or heavy fuzzy arithmetic operations.
The rest of the paper is organized as follows: Sect. 2 contains some basic definitions and notations about fuzzy sets and fuzzy numbers.Section 3 includes the properties of fuzzy linear regression and the parameter estimation procedure in fuzzy linear regression with MC methods.Section 4 defines the distance measures for fuzzy numbers that are taken into account for parameter estimation in this study.The simulation study that compares the performances of distance measures is conducted in Sect. 5.After the decision of the best and the worst distance measures in MC methods for fuzzy linear regression models, these distance measures are used for the real data sets in Sect.6.The paper concludes with some discussions about applications and some possible future researches in Sect.7.

Preliminaries
In this section, some important definitions of fuzzy concepts which are used throughout the paper are recalled.Definition 2.1 μ A (x) is the membership function of an element x belonging to a fuzzy set A, where 0 ≤ μ A (x) ≤ 1.If μ A (x) = 1 then x belongs to A, on the other hand, if μ A (x) = 0 than x does not belong to A. Crisp sets are considered as special cases of fuzzy sets, whose membership values are always 0 or 1 (Dubois and Prade 1978).Definition 2.2 A general fuzzy number A is a normal convex fuzzy set of with a piecewise continuous membership function (Dubois and Prade 1978).The trapezoidal fuzzy number is the simplest form of fuzzy number.It is defined by four parameters A = [a 1 , a 2 , a 3 , a 4 ] the inner borders a 2 , a 3 , and the spreads a 2 − a 1 , a 4 − a 3 .The left and right sides of fuzzy numbers are L(x) = (a 2 −x) (a 2 −a 1 ) and R(x) = (x−a 3 ) (a 4 −a 3 ) , respectively.When a 2 = a 3 , triangular fuzzy number is obtained.The conditions a 1 = a 2 and a 3 = a 4 imply closed interval.In the case a 1 = a 2 = a 3 = a 4 , crisp number is obtained (Heilpern 1997).(Dubois and Prade 1978).
Definition 2.4 v k = (v 0k , . . ., v mk ) is called random crisp vector where v ik are all real numbers in intervals I i , i = 0, 1, . . ., m.Firstly, random crisp vectors v k = (x 0k , . . ., x mk ) with all x ik in [0, 1], k = 1, 2, . . ., N are generated.Then all x ik are put in the interval (Abdalla and Buckley 2008).Definition 2.5 V k = ( V 0k , . . ., V mk ) is called random fuzzy vector where V ik are all triangular fuzzy numbers.First crisp vectors v k = (v 1k , . . ., v 3m+3,k ) with all the x ik in [0, 1], k = 1, . . ., N are generated.Then the first three numbers in v k are chosen and ordered from smallest to largest.Let us assume that x 3k < x 1k < x 2k , then the first triangular fuzzy numbers is V 0k = (x 3k /x 1k /x 2k ).The other V ik are generated with the next three numbers in v k .In order to obtain V ik in certain intervals, it is possible to put all x ik into (Abdalla and Buckley 2007).

Fuzzy linear regression with Monte Carlo methods
One of the most important objectives of a regression model is to estimate the value of the dependent variable associated with independent variable(s) as close to the observed data as possible.Choi and Buckley (2007) classified fuzzy regression models in three categories: -Input and output data are both crisp.
-Input data is crisp and output data is fuzzy.
-Input and output data are both fuzzy.
In this paper only the second and the third categories are examined because the first category is an ordinary regression model.
The fuzzy linear regression model where input data are crisp and output data are fuzzy (Case-II) is expressed as follows: (1) Here, A i is a triangular fuzzy number with membership function μ A i and x il is a real number, i = 0, 1, . . ., m and l = 1, 2, . . ., n.The membership function of Y l is μ Y l .
The third category (Case-III) consists of fuzzy input variables and fuzzy outputs.It is given as follows: (2) In this model, X il and Y l for i = 1, 2, . . ., m and l = 1, 2, . . .n are triangular shaped fuzzy numbers and a i is a crisp number.
In the parameter estimation process for fuzzy linear regression with MC methods, possible solutions are generated randomly and inferior solutions are discarded.This continues for N times, and N is usually 10,000 or 100,000.Predicted values are determined by using randomly generated V k = ( V 0k , V 1k , . . ., V mk ) for Case-II with Eq. 3 and v k = (v 0k , v 1k , . . ., v mk ) for Case-III with Eq. 4 where k = 1, 2, . . .N (Abdalla andBuckley 2007, 2008).
In Case-II and Case-III, given values ( Y l ) and the predicted values ( Y * lk ) are triangular fuzzy numbers and have membership functions.Hence, the difference of these membership values should be used to measure the degree of the fitting of the estimated fuzzy linear regression model to the given data.
The sum of the differences between the observed and predicted fuzzy numbers is calculated as where D represents the difference of membership values between two membership functions (Kim and Bishu 1998).
Since predicted fuzzy number is expected to have membership function close to the observed fuzzy membership function, the error of the fitting of the membership functions can be defined by the ratio of the difference of membership values to the observed membership values.This is given with Eq. 6 (Kim and Bishu 1998).
If the difference of membership values between two membership functions D becomes zero, the error of fit E, becomes zero.This error measure is introduced by Kim and Bishu (1998).
In the light of the error measure defined by Kim and Bishu (1998), two different error measures are defined by Abdalla andBuckley (2007, 2008) for fuzzy linear regression with MC methods.These measures are used to assess the accuracy of candidate vectors V k and v k .They use two different error measures based on the given values Y l and predicted values Y * lk .First one is where the integrals are really only over intervals containing the support of the fuzzy numbers.Then second error measure is where Y l = (y l1 /y l2 /y l3 ) and Y * lk = (y lk1 /y lk2 /y lk3 ) are all triangular fuzzy numbers.
So V k and v k are obtained for the regression models given with Eqs. 1 and 2, respectively.Then E 1k and E 2k are calculated for k = 1, 2, . . ., N .The best solutions are V k ∈ {V 1 , . . ., V N } for Case-II and v k ∈ {v 1 , . . ., v N } for Case-III that minimizes E 1k and E 2k .Hence, there are two best solutions one with respect to E 1k and another for E 2k .
In this study, we consider the first error measure (E 1 ).Because the difference of two membership functions ( Y l and Y * lk ) is calculated in the first error measure.

Distance measures for fuzzy numbers
The methods of measuring the distance between fuzzy numbers have become important due to the significant applications in diverse fields like data mining (Sadi-Nezhad and Khalili Damghani 2010), pattern recognition (Wang and Xin 2005), and regression analysis (Abdalla andBuckley 2007, 2008).However, only one definition of the distance measure between two fuzzy numbers is used in the process of estimating fuzzy linear regression model parameters with MC methods in the literature.This section explains the new distance measures that are proposed for fuzzy linear regression with MC methods.Kaufmann and Gupta (1991) considered a distance measure between two fuzzy numbers.It is combined by the interval of α−cuts of fuzzy numbers and is given with Eq. 9.
In this equation, [A L (α), A U (α)] and [B L (α), B U (α)] are the closed interval of α−cuts of fuzzy numbers A and B. Heilpern (1997) proposed three definitions of the distance between two fuzzy numbers.
1.The mean distance method is generated by the expected values of fuzzy numbers .The lower and the upper expected value of a fuzzy number is given with Eqs. 10 and 11 respectively.
By using the values above, the expected value of fuzzy number is calculated as Thus, Heilpern (1997) defined the difference of two fuzzy numbers A and B with respect to the expected values of these fuzzy numbers with Eq. 13.
2. The second distance method is generated by combining Minkowski distance and the α−levels of the closed intervals of fuzzy numbers .Let A and B be two fuzzy numbers and the distance between these two fuzzy numbers is given with Eq. 14. Here are closed interval of α−cut of a fuzzy number A and B and d p A(α), B(α) is given as follows: This function is generated by Minkowski distance.In many situations, the distance is calculated with p = 1 (Heilpern 1997).3. The third distance method is based on the geometrical operation of fuzzy numbers ) be two fuzzy numbers, then the geometrical distance is given as This distance is called geometric distance between two fuzzy numbers.In many situations, the distance is calculated with p = 1.This distance method is only used in the case of trapezoidal fuzzy numbers (Heilpern 1997).
Hence, this distance is excluded for this study.Chen and Hsieh (2000) have defined the distance of two generalized fuzzy numbers by graded mean integration representation (GMIR).GMIR of generalized fuzzy number A is based on the integral value of graded mean α−level.It is given with Eq. 17 where 0 < α < w and 0 < w ≤ 1.
Let A = (a 1 , a 2 , a 3 , a 4 ) be a fuzzy number, Chen and Hsieh (2000) have already formulated the GMIR of this fuzzy number as follows: Generalized triangular fuzzy number is the special case of trapezoidal fuzzy number when a 2 = a 3 .Hence GMIR of triangular fuzzy number is Then the distance of two fuzzy number based on GMIR is defined as All of the above methods use crisp real number to calculate the distance of two fuzzy numbers (Hajjari 2010).

Simulation
We conduct a simulation study to compare the estimation performances of the distance measures mentioned in the pre-  1.Both sides of the real line and interval widths are considered in the determination of these intervals.
In Table 1, the interval I 0 excludes negative numbers also it is a short interval.The interval I 1 is short and includes negative numbers.The interval I 2 is long and excludes negative numbers.The interval I 3 is long and includes both negative and positive numbers.The interval I 4 is short and exclude positive numbers.The interval I 5 is long and include only negative numbers.
We use intervals I 0 to I 5 to estimate regression parameters A 0 , A 1 and A 2 for Case-II.Then we use the same intervals to estimate regression parameters a 0 , a 1 and a 2 for Case-III.
We obtain parameter estimates of fuzzy linear regression models by using the MC method for both Case-II and Case-III for each interval given in Table 1 for 10 3 times.
In both Case-II and Case-III dependent variable is triangular fuzzy number.Hence it is possible to measure the deviation between observed and estimated values by using triangular fuzzy numbers' left, center and right values.On this point, we use mean absolute error (MAE) given with Eq.21 to measure the deviation between observed and estimated values.
We apply each scenario given in Table 1 for the simulation study, generate 10 4 candidate solutions and apply the described MC method in order to obtain estimates of parameters of the fuzzy linear regression model by using distance measures given in Sect. 4. Afterward, based on the minimum error given with Eq. 7 the differences between the estimated values and observed values are calculated using MAE with Eq. 21.

Simulation study for Case-II
In each of 10 3 replication, we randomly generate data sets of size 10 from Normal(15, 9) distribution for the the first variable (x 1 ) and from Normal(−3, 2) distribution for the second (x 2 ) independent variable.Fuzzy numbers for the value of each parameter are randomly generated sequentially from Normal distribution with mean 1, standard deviation 0.04 for A 0 ; from Normal distribution with mean 4, standard deviation 0.9 for A 1 ; from normal distribution with mean −5, standard deviation 0.1 for A 2 .The corresponding values of dependent variable are obtained over the model given in Eq. 1.
We apply each scenario given in Table 1 for the simulation study.We generate 10 4 vectors and applied MC method in order to obtain estimates of parameters of the fuzzy linear regression model.Afterward, we consider different definitions of distance measures for fuzzy numbers given in Sect. 4 for the error measure given with Eq. 7.Then, based on the minimum error, the differences between the estimated values and the observed values are calculated using MAE.
Table 2 gives simulation results of Case-II for the error measure MAE.The results are presented as follows (minimum values of MAE are written as bold in Table 2): -Minimum values of MAE is reached with considering the distance measures described by Kaufmann and Gupta (1991), Heilpern-1 (1997), Heilpern-2 (1997) and Chen and Hsieh (2000) for interval I 0 and I 5 .On the other hand maximum value of MAE is calculated when the distance measure described by Abdalla and Buckley (2007) is handled for the same intervals.-According to interval I 1 , I 2 and I 4 , minimum values of MAE is reached with considering the distance measures described by Kaufmann and Gupta (1991), Heilpern-1 (1997), Heilpern-2 (1997).However, maximum value of MAE is calculated when the distance measure described by Abdalla and Buckley (2007) is taken into account.-Minimum values of MAE is reached with considering the distance measures described by Kaufmann and Gupta (1991) and Heilpern-2 (1997) for interval I 3 .Nevertheless, maximum MAE value is calculated when the distance measure described by Abdalla and Buckley (2007) is used for the same interval.
As a result, it is proven to be the best distance measure to estimate regression model parameters with MC method for Case-II is the distance measure described by Kaufmann and Gupta (1991) and Heilpern-2 (1997).One the other hand, distance measure used by Abdalla and Buckley (2007) is not appropriate to estimate fuzzy linear regression model parameters with MC method for Case-II.

Simulation study for Case-III
We randomly generate 10 triangular fuzzy numbers that have normal distribution with mean 0 and with standard deviation 2 for the first independent variable ( X 1 ), with mean −3 and with standard deviation 0.01 for the second independent variable ( X 2 ).Also three crisp numbers for the value of each parameters are randomly generated from normal distribution with mean −1, standard deviation 0.02 for a 0 , from normal distribution with mean 2, standard deviation 0.01 for a 1 and from normal distribution with mean 3, standard deviation 0.04, for a 2 .The corresponding values for dependent variable are obtained over the model given in Eq. 4.
We apply each scenario given in Table 1 for the simulation study, generate 10 4 vectors and apply the described MC method in order to obtain estimates of parameters of the fuzzy linear regression model.Afterwards based on the minimum errors, the differences between estimated values and the actual values are calculated using MAE.
Table 3 gives simulation results of Case-III for the error measure MAE.The results are presented as follows: -Minimum MAE value is calculated with considering the distance measures Kaufmann and Gupta (1991), Heilpern-1 (1997), Heilpern-2 (1997) and Chen and Hsieh (2000) for interval I 0 , I 2 , I 3 and I 5 .On the other hand maximum value of MAE is calculated when the distance measure described by Abdalla and Buckley (2008) is handled for the same intervals.-According to interval I 1 and I 4 , minimum values of MAE is reached with considering the distance measure described by Chen and Hsieh (2000).However, maximum value of MAE is calculated when the distance measure described by Abdalla and Buckley (2008) is taken into account.
As a result, it is proven to be the best distance measure to estimate regression model parameters with MC method for Case-III is the distance measure described by Chen and Hsieh (2000).One the other hand, distance measure used by Abdalla and Buckley (2008) is not appropriate to estimate fuzzy linear regression model parameters with MC method for Case-III.

Application
In this section, there are two different applications.First application is for the second fuzzy regression model category (Case-II) and the other one is for the third fuzzy regression model category (Case-III).
We consider different distance measures for fuzzy numbers given in Sect. 4 in the error measure E 1 with Eq. 7 for fuzzy linear regression models with MC approach.

Application for Case-II
The data for this application is taken from Kim and Bishu (1998) and is given with Table 4.There are eight items and three independent variables in the data set.This data set is studied by Tanaka (1987), Abdalla and Buckley (2007), Savic and Pedryzc (1991), Choi and Buckley (2007) for comparing the proposed methods with their works.We use this data to apply different distance measures in the error measure E 1 and to compare our new results with Abdalla and Buckley (2007).
Before the application we have to decide the intervals for I i , i = 0, 1, 2, 3 to obtain the model coefficients as explained in Definition 2.5.We use same intervals in order to compare the results we have with the results from Abdalla and Buckley (2007) in the literature.Four separate intervals (MCI, MCII, MCIII, MCIV) that they studied are given with Table 5.For more information about why using these intervals see Abdalla and Buckley (2007).
We apply different definitions of distance measure between two fuzzy numbers for estimating fuzzy linear regression model parameters in Case-II by using the data set given in with Table 4.For this purpose N = 10 5 random vectors Values of these parameters which gives minimum E 1 value is recorded according to each definition of distance measure used in this error measure.Results for the A i , i = 0, 1, 2, 3 are shown in Table 6 according to each interval.
The value of Error measure E 1 is computed by using different distance measure definitions given in Sect. 4. The results are shown in Table 7.
It is seen from Table 7 that the smallest error value for interval MCI is obtained with Heilpern-1 (1997) distance measure.In addition Chen and Hsieh (2000) gives minimum error value for the interval MCII.Besides, Abdalla and Buckley (2007) calculate minimum error value for MCIII and MCIV.Biggest error values are calculated when Kaufmann and Gupta (1991) and Heilpern-2 (1997) distance measures are taken into account in the error measure E 1 .

Application for Case-III
The data for this application is taken from Choi and Buckley (2007) and is shown in Table 8.There are ten items and two independent variables in the data set.This data set is studied by Choi and Buckley (2007), Diamond and Korner (1997), Abdalla and Buckley (2008) for comparing the proposed methods with their works.We use this data to apply different distance measures in the error measure E 1 and compare our new results with Abdalla and Buckley (2008).
Before the application we have to decide the intervals for I i , i = 0, 1, 2 to obtain the model coefficients as explained in Definition 2.4.We use same intervals in order to compare the results we have with the results from Abdalla and Buckley (2008) in the literature.Four separate intervals (MC I, MC I I, MC I I I, MC I V ) that they studied are given with Table 9.For more information about why using these intervals, see Abdalla and Buckley (2008).
We apply different definitions of distance measure between two fuzzy numbers for estimating fuzzy linear regression model parameters for Case-III by using the data set given in with Table 8.For this purpose N = 10 5 random vectors (v k = (v 0k , v 1k , v 2k )) which defines model parameters (a 0 , a 1 , a 2 ) are generated.Values of these parameters which gives minimum E 1 value is recorded according to each definition of distance measure used in this error measure.Results for the a i , i = 0, 1, 2 are shown in Table 10 according to each interval.
Optimal solutions for a i (i = 0, 1, 2) are stated in Abdalla and Buckley (2008) as a 0 = 4.19, a 1 = 4.97 and a 2 = 3.11.According to these results, Heilpern-1 (1997) gives the closest parameter estimations for interval MC I , Kaufmann and Gupta (1991) and Heilpern-2 (1997) gives the closest parameter estimations for interval MC I I and MC I I I , Abdalla and Buckley (2008) gives the closest parameter estimations for interval MC I V .
Error value E 1 for each distance measure is given according to the defined intervals (MC I , MC I I , MC I I I , MC I V ) in Table 11 for Case-III.
It is seen from Table 11 that smallest error value of E 1 for interval MC I , MC I I and MC I V is obtained with Abdalla and Buckley (2008).Besides, distance measure defined by Heilpern (1997)   gives minimum E 1 value for interval MC I I I .

Conclusion
In this study we use different definitions of distance measure between two fuzzy numbers for estimating the parameters of fuzzy linear regression models with Monte Carlo method.The reasons of this research are summarized below:   A simulation study is conducted to compare the estimation performances of the considered distance measures.Considering the overall statement of the simulation results, we reached minimum MAE values with taking into account the distance measure described by Kaufmann and Gupta (1991) and Heilpern-2 (1997) for Case-II.Besides, the distance measure described by Chen and Hsieh (2000) gives minimum MAE values for Case-III.It is demonstrated that the distance measure used by Abdalla andBuckley (2007, 2008) is not convenient to estimate fuzzy linear regression model parameters with MC methods.Since all maximum values of MAE are calculated with the distance measure that is considered by Abdalla andBuckley (2007, 2008) for both Case-II and Case-III.
Obtained results can and will be used to enrich the studies that have already focused on fuzzy linear regression models.For example, extreme learning machine Liu et al. (2016) can be enhanced by the help of fuzzy linear regression with MC methods according to the best distance measures determined in this study.Fuzzy regression model based on least absolute deviation studied by Li et al. (2016) can be improved using MC methods by the help of the distance measures described by Kaufmann and Gupta (1991) or Heilpern-2 (1997).
Using fuzzy distance measures in fuzzy linear regression models with Monte Carlo methods is a potential area for the future works.Since, all the distance measures discussed in this paper use the real number to calculate the distance between two fuzzy numbers.Moreover, other future research will be concern with investigating different definitions of distance measure between fuzzy numbers in different types of regression models, such as nonparametric regression, exponential regression or considering different types of fuzzy numbers, such as trapezoidal, Gaussian in these regression models.

Table 1
Intervals for Case-II and Case-III for the simulation study

Table 4
Data for the application (Case-II)

Table 6
Estimates of coefficients according to different distance measures under MC I

Table 9
Intervals for I i , i = 0, 1, 2 for Case-III