Regression assessment of the model based on the experimental planning matrix in composite materials ’ analysis problems

. An automated coefficients calculation of the regression model based on the experimental data in the form of a planning matrix is considered. The calculations are based on polynomial regression with possible consideration of the interaction effects between its factors. The application of the Moore – Penrose pseudoinverse matrix for determining the regression equation coefficients is shown. The choice of the calculated planning matrix is carried out taking into account the determination coefficient value and the factors’ calculated matrix rank. Calculation verification is carried out using the rank correlation between the experimental and calculated response functions. The experimental part of the work is given to determine the dependence of fungal resistance and fungicidal properties of filled cement composites on the type and grain size composition of the filler.


Introduction
Composite materials, cement composites are widely studied in order to determine both the most preferable composition and to analyze their properties under various operating conditions [1][2][3]. Much attention is paid to cement composites containing fillers of various types [4][5][6]. In many cases, the problem of determining a regression model that reflects the investigated relationships between the properties of composite materials is posed and solved [7][8][9]. For this, the mathematical apparatus of the theory of experiment planning and statistical methods of processing experimental data are used [10][11][12]. To check the adequacy of the results obtained, such an indicator as the determination coefficient R 2 can be used [12], which gives an opportunity to compare the values' closeness of the experimental and calculated response functions. In addition, the rank correlation coefficients can be used [13].
Experimental data analysis can be performed using regression analysis methods and experiment planning. To obtain a calculated adequate regression model, interaction effects can be introduced into the planning matrix, and various existing regression methods can be applied. This paper highlights the results ofa computational planning matrix formation using polynomial regression, which lays down in its implementation the principle of expanding differentiable functions in a Taylor series. The number of terms in polynomial regression can be increased by introducing the real factors' interaction effects and increasing polynomial degree of the regression model. It can be noted that in modern programming environments such as Python there are modules / classes to implement polynomial regression.

Materials and methods
An example of an experimental planning matrix presentation is presented in Table 1.
As it is possible to see from the Table 1, the working factor matrix (X) is generally rectangular, i.e., the number of its rows is not equal to the number of columns (n  k). In this case, it is assumed that the rank of the matrix X is equal to the dimension of the response function Y, i.e.,the rank (X) = n. In this case, it seems possible to resolve the regression equation relative to the vector of coefficients: Where Cis the vector of coefficients. To solve the equation (1) with respect to the vector of coefficients, we multiply it on the left by the pseudoinverse matrix (Х + ) Moore-Penrose [14]: If the rank of the matrix X is n, then the matrix X + X will be unit, which entails the desired solution with respect to the matrix of coefficients: When the rank of the matrix X is less than the value n, then it is necessary to search for a conditionally extended matrix of factors Xp with the introduction of interaction effects and taking into account the degrees of factors in accordance with polynomial regression. Then equation (3) will have the form: Polynomial regression is quite fully represented in the Python toolbox based on the Polynomial Features module. Now the calculated response function Yp will be defined as: It is advisable to check the calculations results for adequacy in terms of determining the rank correlation between Y and Yp using the methods of Pearson (Pearson's Linear Correlation Coefficient), Kendall's Tau Coefficient, Spearman's Rho. These methods are implemented, for example, in the MATLAB environment.

Research results
As an object of research, we used the results from [15], some of which are shown in Table 2. The values Х1, Х2, Х3 are the column values of planning working matrix in Table 2. Obviously, the rank of such a matrix will not exceed the number three, which is less than the number of experiments n = 10. Therefore, the research was carried out to search for a working planning matrix with the interaction effects and the corresponding degree of the polynomial, with which the coefficient of determination is practically equal to one and the rank of the matrix is 10. In accordance with the formulas (1) -(5), the calculated values of the regression model coefficients were obtained and an estimate was made with the correspondence of the calculated values of the response function Yp with the experimental values Y1, Y2, Y3, Y4, Y5, Y6. In all cases, the calculated value of the coefficient of determination R 2 = 1. Table 3 shows the results of rank correlation.  Table 3 the following notation is used: rhois the correlation coefficient, which can vary from -1 to +1; a value of +1 corresponds to an almost direct functional relationship; pvalis the probability that indicates rejection of the hypothesis that there is no correlation between Y and Yp; symbol  means the operation of calculating the rank correlation between two arrays. Table 4 shows the calculated values of the regression models' coefficients. In accordance with the coefficients' designations in Table. 4 the general view of the calculated regression equation will be as follows: Yp = a0 + a1X1 + a2X2 + a3X3 + a4X1 2 + a5X2 2 + a6X3 2 + a7X1X2 +  + a19X1X2X3.

Conclusion
An engineering technique that allows, to a certain extent, to optimize and automate the process of determining the coefficients of the regression equation, which reflects the calculated response function is considered and proposed. To solve the problem, a pseudo inverse Moore -Penrose matrix and a sequential study of the computational planning matrix dimension were used, in which the effects of real factors' interaction can be taken into account. To check the adequacy of the obtained regression models, the methods of rank correlation of Pearson, Kendall, S pear man were used, as well as the calculation of the values of the determination coefficient R 2 . The check was carried out on the study of the fungal resistance and fungicidal properties' dependence of filled cement composites on the type and grain size composition of the filler.