Assessing fertility complexity of Agro-gray soil on the East European Plain using correlation-regression analysis

. According to classical concepts, soil fertility is an integrating indicator of soil properties. Correlation-regression analysis allows for the evaluation of the complexity of Agro-gray soil fertility in the East European Plain of Russia. The participation of all recorded soil properties and their equal contribution is the criteria for optimal assessment. The initial data array on soil properties divided into two clusters using the cluster analysis method. The absence of significant differences between clusters and some parameters in the model determined the need for correction. An acceptable level of fertility of Agro-gray soil established. The minimum requirements for the soil include 3.2% humus, 181 mg/kg of mobile phosphorus, and 144 mg/kg of exchangeable potassium pH and 1.5 mg-eq/100 g, respectively, the ratio of saturation of the soil with bases is not lower than 92%. With such a numerical combination of soil properties, the complexity of fertility ensured.


Introduction
According to classical concepts, soil fertility is an integrating indicator of soil properties.Correlation-regression analysis allows for the evaluation of the complexity of Agro-gray soil fertility in the East European Plain of Russia.The participation of all recorded soil properties and their equal contribution is the criteria for optimal assessment.The initial data array on soil properties divided into two clusters using the cluster analysis method.The absence of significant differences between clusters and some parameters in the model determined the need for correction.An acceptable level of fertility of Agro-gray soil established.The minimum requirements for the soil include 3.2% humus, 181 mg/kg of mobile phosphorus, and 144 mg/kg of exchangeable potassium pH and 1.5 mg-eq/100 g, respectively, the ratio of saturation of the soil with bases is not lower than 92%.With such a numerical combination of soil properties, the complexity of fertility ensured.
Soil fertility is an important indicator of providing plants with nutrients, moisture, air and optimal conditions for life.Researchers consider fertility as a systemic, integrating, multidimensional indicator of soil processes and properties [1].The soil consists of sand, clay, humus, water, and fertility depends on the content of nitrogen, phosphorus, potassium salts and other substances.The structure and mechanical composition of tundra coarsehumus, gley, gley-podzolic, podzolic, gray forest, and chernozem soils of the East European Plain depend on the geographic location and climate.The processes occurring in the soil are interrelated and a change in any component in the composition of the soil leads to a chain reaction of changes in the entire composition and reduces the productivity and number of plants.Soil properties reflect the combined effect of various factors.
The anthropogenic factor has an important, often negative, impact on soil fertility.The use of soil for agricultural purposes reduces its fertility and changes its composition.The role of the anthropogenic factor in agriculture is the formation of certain, close relationships between soil properties, even those between which there may be no connection [2].For example, soil acidity (increase in pH values) not always positively correlated with humus.In agricultural soils, relationships between properties may disappear altogether.Such phenomena observed in cases of increasing doses of applied mineral fertilizers and the content of mobile forms of phosphorus and potassium increases in agricultural soils.The complexity of fertility is disturbed if this occurs against the background of high acidity and low humus content [3].The development of actinomycetes, ammonifiers, and nitrifiers can improve under these conditions, but excess mineral fertilizers (N180P60K200) can worsen them [4].This indicates a decrease in the activity of beneficial bacterial microflora, leads to a slowdown in the decomposition of fiber, nitrification and other biochemical processes.Indicators of soil biological activity correlate with each other, with other indicators of its fertility, with crop rotation productivity.Therefore, can serve as an additional diagnostic feature in assessing the anthropogenic stability of gray soil in intensive agriculture [5].
Soil fertility increased through soil improvement and advances in crop production.According to researchers, the statistical method can successfully classify structural soils with an attitude to treatment [7][8].
The composition of soils must be monitored to ensure fertility is renewed.
In recent years, agricultural science has successfully applied statistical methods.The use of correlation and regression analysis, the calculation of multiple, partial and pair correlations makes it possible to structure soil indicators so that they represent a single whole [12][13][14][15].It precedes more multivariate methods of statistical analysis, allowing a more objective assessment of the complexity of fertility.In this regard, need to monitor and comprehensively assess soil fertility using statistical analysis.Required development software and statistical systems for more accurate and faster analysis.
The purpose of this study consists in the using the developed correlation-regression analysis in assessing the complexity of the fertility of the Agro-gray soil of the East European Plain.

Methods
According to the agrochemical survey of soils, separately cultivated arable land plots divided into plots of 5-8 ha.From each plot, one mixed soil sample take from a depth of up to 20 cm, which would be divided into 20 to 45 individual samples for reanalysis.In mixed soil samples, the following were determined: mobile phosphorus (P2O5) and exchangeable potassium (K2O) according to Kirsanov; acidity in salt extract (рНKCl), hydrolytic acidity (Нг) -according to Kappen, organic matter (humus) according to Tyurin.The degree of saturation of soils with bases according to the formula: V = (S / CEC) * 100%, where is the sum of exchangeable bases CEC -cation exchange capacity cation exchange capacity.The volume of the analyzed sample was 224 samples.
The stages of statistical analysis designed assuming Agro-gray soil samples followed a normal Gaussian distribution.Therefore, to construct a histogram, the residuals are first standardized (M±m; ±σ).The ranking was smallest to largest regarding soil composition: mobile phosphorus, exchangeable potassium, humus.Correlation analysis was performed using STATISTICA 10 software.Significance of differences was determined by Student's ttest and φ-test (for percentages).
The work based on the materials of agrochemical examination and laboratory analyses of Agro-gray soils of the East European Plain in Russia.

Discussion
Previously, the data on soil indicators were divided into two groups using cluster analysis.Analysis of variance showed that hydrolytic acidity and humus do not have a significant relationship (P>0.05).There were no differences between the clusters in terms of soil parameters, except for phosphorus and potassium.Multiple correlations did not reveal significant significance for the totality of soil properties.The quantization of the analyzed indicators was in the range of correlations from -0.4 to +0.4.Significance of differences was found in the range of outliers (Р<0.01) (Fig. 1).

Fig. 1. Normal probability plot of residuals for the original dataset
The analysis of the initial array of data on soil properties shows a violation of soil fertility as an integrating indicator.At the next stage of work, the initial structure of the data array was corrected.A significant participation of all considered soil properties in multiple regression, their contribution to clustering and analysis of remote residues, was revealed.
The criterion for its effectiveness is a test for normal distribution and homoscedasticity.It is known that the impossibility of this premise is called heteroscedasticity.If it is present, the conclusions can lead to incorrect conclusions on the constructed model.Therefore, when compiling the optimal regression model, emissions are considering.Linear regression is very sensitive to outliers; they affect the regression line, and the final predicted value.The following observations take as outliers if the values of their standard residuals did not fall within the interval -the mean ± 3 times the standard deviation (±σ), as well as those with the largest values of the Mahalanobis distance.All observations with extreme values excluded from the array.As a result, the analyzed parameters of the histogram became closer to the line.Their S-shaped arrangement became less noticeable, which shows the approach of indicators to the normal distribution of residuals (Fig. 2).

Fig. 2. Normal probability plot of residuals after adjusting the original dataset
The best regression model chosen based on the minimum value of the residual variation.In our case, it was the smallest at a value of 11.7, with the maximum variation of the dependent variable ("cluster") -38.3 (Table 1).The Kolmogorov-Smirnov test with the Lilliefors correction showed a significance level (λ = 0.0643; P> 0.05) which meets the requirement of a normal distribution (Fig. 3).

Fig. 3. Histogram of a normal distribution about the minimum amount of residual variation
There was also no dependence of residuals on variables.Therefore, the conclusions on multiple regression, which show below, are fair.Typically, skewness and kurtosis values that are within ±1 of the skewness and kurtosis of a normal distribution show sufficient normality.In our case, the values of the indicators were -0.2 and -0.7 units.Respectively, which confirms the normal distribution of the residuals.
Checking for the presence of systematic relationships between the residues (autocorrelation) performed using the Durbin-Watson test (d).The coefficient d was 2.02, which shown the absence of correlation in the residuals (they are random).Did not find confirmation of heteroscedasticity perturbations in the regression.
The belonging of the totality of soil properties (predictors) to clusters (1 or 2) can be determined by the multiple regression equation: Y=-7.5+0.002P2O5+0.3pH+0.2Humus-0.2Hg+0.004K2O+0.06V(Table .2).The spread of variance (VIF) estimates the degree of redundancy for each of the explanatory variables.If VIF<10 (according to Marquardt) there is no collinearity.For all soil properties, VIF < 10 (Table 3), so there is no linear relationship between soil properties.This gives grounds for using the regression model.Beta can be used to compare independent variables' contribution to clustering.Its largest value (0.43) turned out to be for V, the smallest (0.12) for phosphorus.
The value of the coefficient of determination (R2) is an indicator of the degree of fit of the model to the data.In our case, R2 during multiple correlation with the participation of soil indicators was 0.77 units.This means that the proportion of the variance of the dependent variable (clusters) explained by the considered model (soil properties) was 59% which can be explained.The model is of high quality.The value of R2 tends to unity, and the value of the adjusted R2 has decreased slightly.The value of the standard error of estimation (SEE) is low (0.24), the significance level is less than 0.05.
The model is qualitative if the residuals of the model do not correlate with each other.Otherwise, there is a constant unidirectional effect on the explained variable of factors not considered in the model.This affects the quality of model estimates, making them inefficient.In the work, the Durbin-Watson test was used to detect autocorrelation obeying a 1st order autoregressive process.The Durbin-Watson criterion showed a coefficient of 2, proving there is no autocorrelation and the model is valid.
There are cases when two variables are interconnected, not due to internal links, but due to the relationship with the third and subsequent variables, or the influence of other factors on them.Detecting and excluding factors that affect correlations is important.Calculating partial correlations with a fixed value of some random variable..If the correlation between variables decreases, then this means that their relationship is partially due to this fixed value.If the partial correlation coefficient between the variables under study is equal to zero or close to it, then it concluded that the relationship between the variables under study is due to their own influence.Comparison of the values of the pair and partial correlation coefficients shows the direction of the influence of the fixed factor.The values of the pair correlation coefficients turned out to be higher than the partial correlation (Table 4).This means that the links between the cluster and each of the soil indicators determined by the impact on them of the other fixed variables.They did not weaken these ties by their influence.By the difference between the correlation coefficients apparently, one can judge the magnitude of the influence of soil properties.The largest difference (0.35) was found for pH.With a close relationship in the pair correlation (0.73), this may show that it is due to the influence of other soil properties.If they are not considered, then the proportion of cluster variability determined by pH was 53% (r2=0.53).In fact, not 53% is more realistic, but minus 12% ((ryx1 -ryx1 / x2,x3,x4,x5)2= 0.12) -41%.Mobile phosphorus and exchangeable potassium turned out to be less dependent on the combined effect of acidity, humus, and the degree of saturation with bases; the difference between the pair and private correlation was minimal -0.21 and 0.19 units respectively.Apparently, this is due to the greatest dependence of the content of nutrients on the anthropogenic factor.
Acceptable by all criteria, the regression model served as the basis for developing a fertility model for Agro-gray soil, considered the ranking of two groups (clusters Two variants of the Agro-gray soil fertility model with significant differences in soil properties for a particular farm have been established.For example, with a humus content of 3.2-3.4%,the amount of mobile phosphorus should be 181-192 mg / kg, exchangeable potassium -144-153 mg / kg, hydrochloric acidity should not be lower than 5.7 units should pH.An increase in nutrients in the soil against the background of a decrease in humus, an increase in acidity, etc. etc. violates the optimal ratios between indicators.The integral effect of soil fertility reduced to zero.

Conclusion
Thus, the assessment of fertility includes complexity (internal relationships).A more objective picture considered in terms of the resulting effect and the amount of information extracted.In addition, this picture drawn when interpreting the multiple correlation coefficient, comparing the paired and private coefficients of the connection.The values of soil properties calculated for the condition of manifestation of the complexity of fertility, i.e. significant participation of all selected soil indicators in the regression model.Changes in either of the numerical values of the indicators disunite the structural unity of soil properties, since some of them fall out of the regression, and this leads to a violation of the complexity of fertility.
An acceptable level of fertility of Agro-gray soil established: it provided with a humus content of at least 3.2%, mobile phosphorus and exchangeable potassium -at least 181 and 144 mg/kg of soil, saline and hydrolytic acidity -at least 5.7 units.pH and 1.5 mg-eq/100 g, respectively, the degree of saturation of the soil with bases is not lower than 92%.With such a numerical combination of soil properties, the complexity of fertility ensured.

Table 1 .
Variance table of the regression model.

Table 4 .
Correlation between the cluster variable and soil indicators (P<0.01).

Table 5 .
).It is presented in table 5. Statistical description of the fe rtility model of an Agro-gray surface.