Fundamentals of optimization of training algorithms for artificial neural networks

. In the modern IT industry, the basis for the nearest progress is artificial intelligence technologies and, in particular, artificial neuron systems. The so-called neural networks are constantly being improved within the framework of their many learning algorithms for a wide range of tasks. In the paper, a class of approximation problems is distinguished as one of the most common classes of problems in artificial intelligence systems. The aim of the paper is to study the most recommended learning algorithms, select the most optimal one and find ways to improve it according to various characteristics. Several of the most commonly used learning algorithms for approximation are considered. In the course of computational experiments, the most advantageous aspects of all the presented algorithms are revealed. A method is proposed for improving the computational characteristics of the algorithms under study.


Introduction
Modern digital life is unthinkable without artificial intelligence technologies. Inanimate helpers can already speak to us in different voices and recognize our speech and appearance. They can advise us on probable good purchases or customize our exercise schedule based on our digitized analyzes. Thanks to robotics, they become completely humanoid entities with their own complex behavior and digital intelligence. An artificial brain as a complex system consists of tiny interconnected objects, often called artificial neurons. Researchers have already taught such neurons to organize themselves into artificial neural networks in the middle of the 20th century. Any neural network (system) needs training, otherwise the numerical parameters with which it was originally created will not correspond to the exact desired solution to the problem.
To date, several dozens algorithms for training artificial neural systems have been developed. Each of these algorithms was created individually or became a logical development of an existing one. Scientists are still looking for ways to improve the algorithmic base for training artificial neural networks [1][2][3][4][5]. It is proposed to consider one of such methods in more detail in this work.

Theory
All the set of existing training algorithms for artificial neural networks (ANN) can be conventionally represented as -a group of algorithms used mainly for the i-th type of problems. To improve the computational capabilities of the algorithm, it is necessary to single out some typical problem and consider in detail the model of its solution.
As a test model, consider a model for solving the approximation problem. Given matrices , with dimension × . It is necessary to calculate the dependence as accurately and quickly as possible = ( ). Such calculations consist in building an artificial neural network that performs the basic transformation in the form: where X -input data vector, T -target data vector, Y -output data vector, W -settings vector ANN.
For solving this problem, it is recommended to use three standard training algorithms: 1) Levenberg-Marquardt algorithm [1]; 2) Bayesian regularization algorithm [1]; 3) Conjugate gradient algorithm [1]. Let's consider each of them in more detail. Let's accept as a variable matrix of parameters describing the state of ANN at the next step of calculations. Then the basic variants of the algorithms will take the following form.
Step k. Calculate

Experiment
Consider a way to optimize the operation of artificial neural networks using a standard example of approximation of analytical dependence in the MATLAB package [1][2][3][4]. The input and output data are matrices X, T with various configurations. Dependence needs to be identified as Y = f 2 (W, X).
To solve the problem, in the standard case, an artificial neural network with the following layers will be built: 1) one input layer; 2) one hidden layer containing 10 neurons; 3) one output layer.
To obtain the correct parameters, the newly created ANN should be trained for each of the three above algorithms. Here are typical results of solving this problem for each of the algorithms for three different data sets.
After performing training according to the first algorithm with standard parameters, the following results were obtained (Table 1). On the first dataset, training was completed in 16 iterations. The total training time (time) on the bench computer was 0,726 seconds. The error functional (EF) was taken in the form of the mean square error (performance = mse) and amounted to 2,0935x10-4. The gradient value was 0,0039. The EF-value did not improve on the validation set during the last 6 training epochs, that is, the most favorable values of the coefficients W, b were obtained after the 10th era of study. This element of learning can be carefully examined according to a special schedule (Figure 1). To study the distribution of the error by characteristic values, you can use the histogram of errors (Figure 2). When constructing a histogram, the 20 most common difference values are selected = − . For an approximate determination of the dependence = ( ) apply linear regression plot analysis (Figure 3). In the case under consideration, the dependence is obtained in the form = ⋅ + ⇒ = 1 ⋅ + 0,0034. To correctly assess the contribution of all points to the construction of the approximation curve, the construction of the resulting graph is used (Figure 4).   Let us next consider the main computational results of the Bayesian regularization algorithm. The main indicators of this algorithm are presented in the following form (Table  2). Let us conclude our consideration of the standard ANN tuning capabilities by analyzing the results of the conjugate gradient algorithm. The main performance indicators of this algorithm are summarized in Table 3. Examining all the above tables and graphs, one can come to a conclusion. Algorithm №2 is slower than Algorithm №1, but more accurate as a result. Algorithm №3 is faster than Algorithm №1, but less accurate as a result.
Thus, we can conclude that the Levenberg-Marquardt algorithm is currently considered the most optimal for solving such problems. Next, you can start modifying the selected most winning algorithm. Let us apply the decomposition-superposition method to optimize the Levenberg-Marquardt algorithm model with the possible use of the following modifications: 1) changing the base coefficients; 2) introduction of scaling factors. Consider the formula for the main computational step in the form of a decomposition into individual terms and factors. For example, it can be done like this: ). (2) It is necessary to perform certain analytical transformations in order to perform the desired optimization and possibly obtain some superposition of one or several functions 1−3 ( * ). In the course of the experiments, various variants of superpositions were obtained. Variations of the function were chosen as one of the promising directions 2 ( * ): In particular, the following modification of the function was obtained: where the control parameter p is given as 2 ≤ ≤ 5.
Step k. Calculate . If +1 ≈ 0, then +1 -the desired solution. This modification (Algorithm 4) gives a certain gain in the accuracy of ANN calculations while maintaining the minimum training time. These facts are confirmed by the following data (Table 4).
Let's consider all the main graphical tools for analyzing the performance of the new algorithm ( Figure 5-8).
All main characteristics of the new algorithm are either equivalent to the characteristics of standard algorithms or exceed them by 5-10%.

Conclusion
The article discusses the most commonly used algorithms for training artificial neural networks to approximate analytical dependencies of varying degrees of complexity. Much attention is paid to the consideration of the features of calculations at all stages of training. The parameters of the quality of training are analyzed to identify the most probable lines of optimization of algorithms. To improve such an analysis, a specific method is proposed, focused both on the decomposition of large blocks of the algorithm and on improving the computational characteristics of each step of the algorithm. The above method can be considered promising for the computational optimization of learning algorithms for modern ANNs used to solve problems of approximation of any degree of complexity.
The article discusses the most commonly used algorithms for training artificial neural networks to approximate analytical dependencies of varying degrees of complexity. The parameters of the quality of training are analyzed to identify the most probable lines of optimization of algorithms. To improve such an analysis, a specific method is proposed, focused both on the decomposition of large blocks of the algorithm and on improving the computational characteristics of each step of the algorithm. The above method can be considered promising for computational optimization of modern ANN training algorithms used to solve approximation problems of any degree of complexity.