Neural network method as means of processing experimental data on grain crop yields

. In the work based on agroecological and technological testing of varieties of grain crops of domestic and foreign breeding, winter triticale in particular, conducted on the experimental field of the Smolensk State Agricultural Academy between 2015 and 2019, we present the methodology and results of processing the experimental data used for constructing the neural network model. Neural networks are applicable for solving tasks that are difficult for computers of traditional design and humans alike. Those are processing large volumes of experimental data, automation of image recognition, approximation of functions and prognosis. Neural networks include analyzing subject areas and weight coefficients of neurons, detecting conflict samples and outliers, normalizing data, determining the number of samples required for teaching a neural network and increasing the learning quality when their number is insufficient, as well as selecting the neural network type and decomposition based on the number of input neurons. We consider the technology of initial data processing and selecting the optimal neural network structure that allows to significantly reduce modeling errors in comparison with neural networks created with unprepared source data. Our accumulated experience of working with neural networks has demonstrated encouraging results, which indicates the prospects of this area, especially when describing processes with large amounts of variables. In order to verify the resulting neural network model, we have carried out a computational experiment, which showed the possibility of applying scientific results in practice.


Introduction
Grain production solves the food and strategic problem of the country, as well as ensures its environmental security [1,2].
Winter crops, including triticale, are the most suitable for the central non-black soil region of the Russian Federation. The advantage of this crop over others manifests itself both on heavy and light sod-podzolic soils [2,3,4].
Yield is one of the main indicators of crop evaluation, its level depends on many factors such as: fertilizers, variety (hybrid), timing of seeding, seeding depth, seed quality, soil fertility, meteorological conditions during the years, etc. [5,6,7,8].
The multifactorial nature, non-linearity of dependencies and the randomness of processes affecting cereal yields make it very difficult to make statistical generalizations using previously developed methods. Meanwhile, the data obtained during scientific experiments are random statistical in nature. Moreover, we need to take into account that during the experiment and in the processing of the experimental data obtained, various kinds of errors might appear.
These errors arise due to using approximate methods of processing the physical parameters studied, the accuracy of experimental equipment, the influence of anthropogenic factors, etc. Such random errors are statistically analyzed. Here the main tool is statistical methods based on the provisions of mathematical statistics, which include regression, dispersion, correlation analysis, etc. Applying them, we have produced a statistical model, which studies the influence of random factors on the functioning of the system.
In order to conduct experiments competently, and then to analyze and process the experimental data obtained, the specialist must have fundamental knowledge in the field of mathematical statistics, have the skills of organizing and planning experiments, as well as be able to apply certain statistical processing methods in practice. Currently, the effective method of processing experimental data is a neural network method or, so-called, neural networks [9].
Active development of neural networks began in the 1950s and 1960s. Their feature lies in the use of a mathematical description of phenomena similar to those occurring in neurons (nervous system) of living organisms for data processing.
Neural networks are applicable to solving tasks that are difficult for traditional computers and humans alike. These include processing large amounts of experimental data, automating image recognition processes, approximating functions, predicting, etc. Artificial neural network similar to the human nervous system is made up of many simple elements operating in parallel. Neural network functions are largely determined by the connections between the elements.
Unlike the classic forecast-based analytics of dependencies, neural networks are capable of: • incremental accumulation of raw data; • rapidly providing results; • refinement of results as new data become available.
Neural networks act as a universal approximator, that is, allow to build generalized models based on processing large amounts of data. In other words, the neural network approximates: where X is an input vector, Y is a simulated function (output), and φ is a conversion performed by the neural network.
Considering this, the generalization of the results of experimental data on the study of the level of yields of varieties of winter triticale of domestic and foreign breeding using the neural network method is relevant.

Methods
To this end, data from field and laboratory studies conducted between 2015 and 2019 in the six-field crop rotation of the Smolensk State Agricultural Academy were used. Experimental data were processed using neural networks that showed the high effectiveness of this approach in engineering studies [9,10,11].
During the first stage, we have used the results of the study of the effect of mineral fertilizers on the yield of winter triticale grains. The table shows the range of nitrogen, phosphorus and potassium fertilizer doses studied, which was broken down in experiments into 10 values, which ultimately gave a data sample consisting of 97 vectors. The distribution density of fertilizer dose combinations is uniform, making the sample representative. The experiments were conducted over a period of four years, which increased the number of samples to build a neural model to 388. An important stage in building a model is selecting the minimum sample size required for modeling. The study's four-year-long period allowed an assessment of the impact of weather conditions and moisture on the yields of the crop under study.

Results and Discussion
The level of yield is influenced by a significant number of different factors, however, as the nature of this work is exploratory and all the available experimental data were used in it, at this stage of the study it was decided to limit the factors to the ones specified in the table.
To do this, arguments describing temperature and precipitation during different periods of plant growth and development have been introduced in the model.
Thus, the sample size of the initial data for the training of neural models of crop yield was 388 training samples. The parameters forming the input vector included: the amount of fertilizer applied (N -nitrogen, P -phosphorus, K -potash); tavgφavg -average temperatures and average rainfall at different periods of plant growth and development: 1 -sowing-seedling; 2seedling-tillering; 3 -tillering-earing; 4 -earing-full ripeness (harvesting). The output value of the modeled function Y is the yield of the studied culture.
In general, to process a sample of raw data, a researcher can use the following sequence of steps, which the authors formed empirically, using previous experience with neural networks. 1.
Choosing input parameters. a) logic and analysis of the subject area; b) analysis of the weights of input neurons; c) perturbation of input parameters and analysis of network response to these perturbations; d) alternate exclusion of input neurons and observation of the network generalization error.

2.
Identification of conflict samples.

3.
Determination of the number of necessary samples. 4.
Improving the quality of neural network training with insufficient sample size (multiple cross-checking, multiple sampling repetition and changing the way training samples are applied).
Adding noise to learning samples. 8.
Choosing the types of neural networks and activation functions. 9.
Decomposition of the network by the number of output neurons.
In the first step, a frequency analysis of each of the parameters was carried out to analyze the sample.
For parameters with multiple repetitive parameters, the frequency of sampling for each of the values was calculated.
For parameters with different values, the frequency of values was calculated in ranges. Frequency analysis made it possible to determine which ranges of parameters were covered with the most values and which areas were less represented in the sample.
In the current task, the ranges of fertilizer dose change are presented tightly and evenly, temperatures and precipitation have low distribution density, but cover a fairly wide range of changes.
Based on this analysis, the generalized expression of the sought model looks like: After identifying and excluding insignificant parameters from the training sample, the quality and accuracy of the neural network model tends to improve due to a decrease in its dimension and complexity. But we should remember that excessive reduction in the number of inputs and simplification of the neural network can prevent the identification of patterns in a particular task. It can also lead to conflicting (contradictory) samples, i.e. ones that have identical input vectors and different output vectors.
Because of the erroneously prepared data, the error will not fall below this level, whatever methods of teaching the neural network are applied.
Conflict examples may occur in the learning set due to a measurement error.
Search for conflict examples can be done with the aid of software, using special algorithms that search for matching learning samples, or by carefully reviewing the sample in search of repetitive input vectors.
In our case, no conflict examples were found in the processing of the original sample.
As mentioned above, successful neural network modeling requires the necessary number of training samples.
The study [12] provides an estimate of the minimum required sample size: where Nx is the number of arguments in the neural network model; Q is the number of samples of the learning sample. In the case under study, the number of arguments Nxis 11, meaning that the minimum required training sample volume should be 92 samples, which is several times less than the 388 samples used.
In the next phase of data preparation, outliers were excluded from the training sample. Outliers in the original sample are the values of parameters that differ significantly from other similar ones in terms of value.
In the simplest cases, outliers can be detected by carefully reviewing the sample, in more complex multiparametric dependencies and for large amounts of information simplest neural networks with minimal number of neurons in a hidden layer can be used to search for outliers.
As our sample size was large, we needed to use various algorithms to search for outliers [13].
Several examples of outliers were found in the sample. In order not to reduce the sample size, the detected outliers were corrected by approximating the type of dependency. This allowed to preserve the size of the learning sample. A significant step in preparing data for modeling is to bring it to a dimensionless form and normalize it. Dimensionless values were obtained by splitting the source data by maximizing each parameter within their ranges.
To normalize the data, a range of 0.1 was adopted, in the interval from 0 corresponding to the minimum value of the dimensionless parameter to 1-the maximum one.
After modeling, neural network responses are converted back from normalized to natural range of values. The process of normalization has been detailed in the works of a number of scientists [12,14,15,16].
In this case, the normalization of the data was particularly relevant because of the physical nature of the data.
After the initial sample was prepared for the simulation, the sample size was 388 samples. From this amount, a test set of 26 samples was selected that did not participate in the training. This set was used to determine the error of the constructed model.
In order to train the neural network, based on the accumulated international experience, we chose the perceptron-type networks with two-layer perceptrons [17,19,20].
As an activation function, we adopted logical sigmoid for both layers [20,21].
The selection of suitable network architecture is empirical. Without taking into account a number of existing network architecture guidelines to address dependency approximation, we had to choose the architectures that are most appropriate to each specific task and data.
Two-layer neural networks with 25 neurons in the first (hidden) layer were created to select the learning function. The accuracy of the simulation when changing training functions was tested: with the BFGS quasi-Newtonian method, with the Levenberg-Marquardt method of optimization and with the optimization of Levenberg-Marquardt with Bayesian regularization.
The first layer used logical sigmoid as an activation function. In the second layer we used neural networks with logical sigmoid (logsig) and linear activation function (purelin) for each learning function.
After training, errors were calculated for each type of neural network. In accordance with the minimum error, the training function with the method of optimization of Levenberg-Marquardt with Bayesian regularization was chosen.
Then a computational experiment was conducted in order to optimize the network architecture on the chosen learning function. We have built the following models with the number of neurons: build a neural network model of complex function it is enough to use a perceptron with one hidden layer of sigmoid neurons, the number of which is determined by the formulas below.
To determine the required number of synaptic weights of the neural network: where Nx is the number of neurons in the input layer (number of parameters); Nyis the number of neurons of the output layer (the number of simulated values); Q is the number of elements of many training samples, that is,the number of pairs of input and output vectors Xq and Dq;Nw is the necessary number of synaptic connections. According to the modelingdata, the synaptic bond range for the model ranges from 58 to 797. Accordingly, . Then the number of neurons in the hidden layer of the two-layer perceptron will be equalto: These computed estimates show that the optimal number of neurons in the hidden layer for the two-layer perceptron model ranges from 8 to 114 ( ). There is no strict theory of choosing the optimal number of hidden layers and neurons in hidden layers. In practice, perceptrons with one or two hidden layers are the ones most commonly used, while the number of neurons in hidden layers usually ranges from Nx/2 to 3Nx.
The computational experiment on neural networks showed that the minimum error is observed in a network with a number of neurons in the hidden layer equaling 30. This value fits in the range obtained in theoretical calculations using the above formulas.
The smallest error was obtained for a neural network with the following architecture: two-layer, number of neurons in the hidden layer 30, logical sigmoid activation function for all layers, learning function with the optimization of Levenberg-Marquardt with Bayesian regularization. The average margin of error for all crops studied did not exceed 3.3%. As an illustration of the results obtained, we demonstrate the dependence of the yield of winter triticale on doses of nitrogen and potassium fertilizers using the spline function in the figure 1. The choice in favor of two-layer networks was made due to the fact that the difference in the errors of the twolayer and three-layer neural networks is minimal, but obviously, models with the two layers of perceptrons are more economical in terms of the required computing power.
Checking the effects of normalizing when comparing a neural network trained on non-normalized data and a neural network trained on already normalized data was performed in our previous work [22].
Once the original data are normalized, the margin of error decreases by 3% on average compared to the same model trained on non-normalized data.

Conclusions and confirmations
Pre-processing the sample for neural network training is practical, as it significantly reduces modeling errors compared to neural networks created on unprepared raw data.
During the work with neural networks, various recommendations for the preparation and processing of raw data have been collected and studied. They include analysis of the subject area and weights of neurons, identification of conflicting examples and outliers, data normalization, determination of the required number of samples for training a neural network and improving the quality of training of neural networks with insufficient sample size, as well as the choice of types of neural networks and decomposition based on the number of output neurons.
The accumulated experience with neural networks has shown encouraging results, which indicates that this approach is prospective, especially when describing processes with a large number of arguments.
The high accuracy of the developed neural network models of winter triticale yield allows to use them in practice during the research of other cereals, as well as in calculating yield forecasts taking into account a different combination of a large number of biotic and abiotic factors.