Predicting anomaly conditions of energy equipment using neural networks

. In modern conditions for complex thermal power facilities, the issue of developing methods for predicting equipment failures is especially relevant. Methods based on the intellectualization of diagnostic systems and allowing to obtain predictive models based on the use of both current data received in real time from measuring equipment and retrospective information are considered promising. Intellectualization of the system in terms of the ability to learn allows to quickly adjust the parameters of forecasting models under changing conditions of equipment operation, to determine new deadlines for scheduled repairs and minimize equipment downtime. A limitation of the use of methods is the incompleteness of failure statistics, ie when equipment failures are rare or non-existent. Such diagnostics of energy equipment, especially thermal power facilities, contributes to a more environmentally friendly production.


Introduction
Modern technical facilities, including thermal power facilities, are complex systems built of different components -mechanical, electrical, electronic, etc., combined with the tasks of control of a particular process. Distinctive features of functioning of objects of heat power engineering are work in the conditions of irregular dynamic loadings applied practically to all elements of its design [1][2][3]. These loads can occur both due to internal factors and as a result of the influence of other equipment operating nearby [4].
At present, a large number of complex thermal power plants (nuclear and chemical reactors, turbines, powerful boilers, various furnaces for smelting metal, etc.) are equipped with automated control systems for technological parameters [5][6][7]. The controlled parameters, in addition to information about the flow of technological processes, also contains information about the current state of the equipment, the appearance and development of various faults [8][9][10]. Analysis of changes in these parameters, performed after accidents, unscheduled equipment stops, usually shows that there were signs of malfunctions that caused the accident or stop, long before the incident [11][12][13]. Thus, there is a fundamental possibility to identify abnormal conditions (faults) before they caused the accident and, thus, to prevent the accident or significantly reduce its consequences. Such cases are known to any operator who controls the technological processes of complex thermal power facilities (CTPF), but they are rare, exceptional due to limited human capabilities in the rapid processing of large amounts of information and lack of prior analysis of processes in case of failure. Automated systems for collecting and processing information significantly expand both the operator's ability to identify faults and allow to automate the process of their detection, ie diagnosis [14,15]. For this purpose it is necessary to have a technique and corresponding diagnostic algorithms which use the information from sensors of technological control [16]. Such techniques and algorithms can significantly increase the effectiveness of CTPF diagnosis.
One of the important tasks of diagnostics of power equipment is to minimize the impact of power equipment on the ecological state of the environment. In this case, it is necessary to solve the following tasks:  research and minimization of the negative impact on the environment at all stages of production, transmission and consumption of different types of energy [17][18][19][20][21][22];  implementation of activities for the conduct of environmental studies, including the assessment of the impact on the environment [23][24][25];  utilization / neutralization of energy equipment waste after completion of operation [26][27][28][29][30];  participation in programs for the development of renewable energy sources, taking into account the application of the world practice of using market mechanisms for their support on the basis of "green" tariffs [31-34];  formation of environmental requirements for the development of modern power equipment [35];  analysis of developments and implementation of techniques, standards, criteria, requirements in the field of environmental protection, environmental safety and rational use of natural resources [36-41];  development of educational programs in the field of environmental safety in the production, transmission and consumption of electricity etc [42][43][44][45][46]. Thus, the diagnostics of power equipment should become one of the mandatory steps in improving the environmental friendliness of enterprises in the fuel and energy complex.

Anomaly detection based on neural networks
The method of detecting anomalies based on neural networks includes two stages: 1) the neural network learns to recognize classes of normal behavior in a training sample; 2) each instance of the instance comes as an input signal to the neural network. A neural network-based system can recognize one or more classes of normal behavior.
Replicative neural networks are used to find anomalies by recognizing only one class [47]. Deep Learning neural network technology has also been used successfully to solve this problem [48].
Various methods can be used to search for anomalies [49][50][51], the advantages and disadvantages are presented in the Table 1.
Machine Learning is a class of methods of artificial intelligence, which is characterized by indirect problem solving, and learning in the process of applying solutions to many similar problems [52]. In the process of constructing such methods can be used mathematical statistics, numerical methods, optimization methods, data mining, probability theory, graph theory.
Using machine learning possible the study of patterns in the data, which are then used to detect abnormal behavior. Machine learning tasks are usually divided into the following categories depending on the availability of a training "signal" or "feedback" available to the training system.:  learning with a teacher using "reaction-stimulus" examples; o partial training with a teacher; o active learning; o reinforced training;  learning without a teacher, suitable for tasks in which objects are described in detail and it is necessary to establish internal relationships between objects.
Next, deep neural networks will be considered as methods used to detect anomalies. Table 1. Advantages and disadvantages of anomaly detection methods.

Method
Advantages Disadvantages Statistical analysis [52][53][54][55] Lack of requirement for a priori information on signs of anomalies, which makes it possible to detect zero-day vulnerabilities against which protective mechanisms have not yet been developed The difficulty of determining the threshold for optimal detection of anomalies, the impossibility of identifying anomalies as a result of malicious actions similar to normal actions or work and the requirement of statistical distributions in the absence of all elements of the process Machine learning [56] Improved system based on prior knowledge High computational costs, as well as the complexity of adaptation to the subject area Artificial neural networks [57] Resistance to inaccurate input, as well as independence from the availability of information about the dependencies of the examples of input data Complex and long training of neural networks, as well as demanding on the size of the training sample Genetic algorithms [58] Works well in solving large-scale optimization problems. Uses two decision mechanisms: deterministic and probabilistic. Uses multiple search space points The complexity of the selection rules for selecting the best solutions Hybrid methods [59][60][61][62] The most flexible, as it allows to reduce the impact from the disadvantages of one method taking advantage of another The advantage of deep neural networks is the automatic selection of these important features. An inverse error propagation algorithm based on the gradient descent method is used to train neural networks [63]. In a deep neural network with several hidden layers, an error is calculated that is transmitted from one layer to another. In the first stage, the value of the error at the output of the neural network is calculated, for which the correct values are known. Then the error at the input to the output layer of the network is calculated, which will be used as an error at the output of the hidden layer. Therefore, the calculation continues until the error on the input layer is known. However, this algorithm is often not effective when the training sample is large, because it takes a long time to process all its elements. In practice, the method of stochastic gradient descent or its modification is most often used to train neural networks [64,65].
However, the error back propagation algorithm is not sufficient for effective deep learning due to the problem of the vanishing gradient [66]. This problem is solved by the architecture of the neural network with long shortterm memory (LSTM) [67,68]. Such networks contain special types of nodes that allow to remember values for a long time. The LSTM network unit contains a special neuron that is used as a memory element (Fig.1).
The output of a neuron is connected to its own input with a single weight. As a result, the value in the neuron is rewritten at each stage and thus stored. Neuron control is performed by three valves: input, output and forgetting valve. When the inlet valve is open, the value at the inlet is written to the memory cell. If the inlet valve is closed, the inlet signals do not affect the contents of the cell. An open outlet valve allows to read values from a cell. When the value is no longer needed, it can be erased with the forget-me-not valve. Valves are connected to other nodes of a neural network which in the course of training define when it is necessary to open or close this or that valve. Through these networks cells LTSM can determine the importance of the events that occurred thousands of discrete time steps back and remember these events. Recurring networks previously used could remember the event for no longer than ten time steps [69].
The choice of neural network architecture is based on several factors. First, the sensors generate highly correlated multidimensional time series. In addition, these time series are aperiodic and synchronous (aligned in time), with fast (long-term) and slow (short-term) subprocesses. Under these conditions, conventional neural networks of direct propagation usually show poor results. An accurate data-driven predictive model can be developed using a neural network with LSTM cells [70].
Since not all time series can be predicted [71], it is possible to use an additional method of finding anomalies -autocoder, which allows to apply training without a teacher during using the method of reverse error propagation. A synchronous architecture was chosen for the autocoder, among the advantages of which is the possibility of using the streaming mode of data processing and a relatively smaller number of neural network parameters compared to other architectures.

The process of detecting anomalies
The process of detecting anomalies consists of the following stages: 1. Calculate the estimate of the anomaly of some observations or subsequences of a given time series using the detection method.
2. The obtained estimates are used to calculate estimates of anomalies of test time series. This application is performed in different ways, for example: (1) the average value of all anomaly indicators, (2) the average value of the upper anomaly estimate, (3) the average value of the logarithm of the anomaly estimates, (4) the anomaly estimates exceed the threshold, etc.
Test time series with the assessment anomaly that exceeds a threshold, denoted as abnormal.
The block diagram of failure forecasting is developed (Fig. 2) and the algorithm of detection of anomalies which consists of two parts is offered: forecasting and detection. The detection stage is based on the search for time points, where the root mean square error between the measured values of ( + 1) and the predicted ´( + 1) becomes greater than the pre-calculated threshold.
To detect anomalies, a recurrent neural network with a long-term memory architecture and an autoencoder were chosen, which allows to apply training without a teacher during using the method of error back propagation. A synchronous architecture was chosen for the autoencoder, among the advantages of which is the possibility of using the streaming mode of data processing and a relatively smaller number of neural network parameters compared to other architectures.
Finding neural network parameters is an iterative task. Most often, developers select the optimal values of the not anomaly continuation of processing Notification Evaluation of the anomalous structural parameters of the network and the size of the training sample, based on personal experience and repeated trials and errors. Therefore, the structural parameters of the network and the size of the training sample may not be optimal in terms of some function of the approximation error.
Several main rules for neuronal isolation have been presented, for example, the number of hidden neurons should be less than twice the size of the input layer [72,73]. Several statistical methods have also been developed, some of which are presented in Table 2. Although such methods are proposed, they are difficult to implement using commercial software packages. Typically, these recommendations are applicable to the specific cases of a particular network topology. Table 2. Review of methods for selecting the number of hidden neurons [74].

Shibata and Ikeda method [81]
= Hunter et al. Method [82] = 2 − 1 Sheela and Deepa method [83] = (4 + 3)/( − 8) The amount of training set is also a parameter that optimizes the quality of neural network model. The greater the amount of training set, the greater the amount of memory required to store it, time spent on training the neural network and gather information about the object. A training sample with a small amount of data is not informative enough to characterize the behavior of an object with acceptable quality. This leads to the fact that the network is often unable to predict the behavior of the object outside the examples of the training sample.
Insufficient number of neurons in the hidden layer will not allow to fully approximate the behavior of the object, and the prediction error will be large. But the more complex the neural network, the more time it takes to learn and work on the task. The predictive power of the network may also decrease due to the effect of retraining. The neural network will show insignificant or insignificant details in the studied dependence, such as noise, the output vector will change significantly with small deviations of the input vector and the neural network will not be able to generalize -prediction of the output vector with input data not included in the training sample.
Based on the above information, the decision to make a variation of the neural network training parameters:  training periods are 1, 4 or 12 months;  number of hidden layers -2 or 4;  the number of hidden neurons in the first layer -8, 15, 20, 24, 30, 45, 60, 90;  the number of hidden neurons is halved in each subsequent layer.

Neural network training
A total dataset (retrospective data) of 15,120,000 measurements over 35 months from more than 107 sensors [83] has the following values:  date in the format -day/month/year;  time in the format -hour: minute: second;  densor ID;  sensor_measure temperature value. Sensors generate highly correlated multidimensional time series, which are aperiodic and synchronous (aligned in time), have multiscale processes, fast (long-term) and slow (short-term) subprocesses.
Each created model is evaluated on the basis of a graphical overview of the model trend. Models are evaluated using the following hierarchical criteria:  suitability of the model on educational data;  suitability of the model on the data for monitoring the normal state;  detection of anomalies.
To be considered reliable, the model must meet all of the above criteria. The evaluation criteria are hierarchical, as each model must meet the previous criteria before evaluating the next criterion. Validation and verification of the used neural network models was performed. The criteria for evaluating forecasting models during network training based on the coefficient of determination are given (Table 3).
In Table 3, the coefficient of determination of the model of the dependence of the predicted values on the measured values is determined by the formula: where y -predicted values, -current values, -average value.
Determination coefficients together with scattering graphs provide a fairly simple tool for estimating model prediction. However, it should be emphasized that is a relative indicator of the adequacy of the model (test and training samples are representative, the network structure is not redundant) and should not be used only as an indicator of efficiency.
Since in most cases the result of errors depends on the shape of forecasting errors are more commonly used performance based on the likelihood of true or false prediction. Assume that for each set of test data, the predicted probability of success ( ) is compared with a fixed threshold C. If the probability is greater than С, it was assumed that the model successfully predicts; otherwise it was believed that the prediction is unsuccessful. The result of this procedure of comparing the true and predicted values of the dependent variable for n observations in the test data set can be presented in the form of a contingency table (table 4).
A simple indicator of the effectiveness of the model is accuracy, the proportion of correct predictions in the entire data set of testing: Metrics such as precision and completeness are also used to evaluate the model algorithm. Precision within a class is the proportion of objects that actually belong to this class relative to all objects that the system has assigned to this class: Recall of the system -a lot classifier found objects belonging to the class with respect to all objects of this class in the test sample: The resulting accuracy of the classifier is calculated as the arithmetic mean of its accuracy for all classes. The same with completeness. Technically, this approach is called macro-averaging.
The average harmonic value of precision and completeness determines the F-score: • + This formula gives the same weight of accuracy and completeness, so the F-score will fall equally with decreasing precision and recall. As a result, it was found that the performance of models with a training period of 12 months differs the most (Fig. 3). In the figure, the dotted lines are divided into three periods: training, normal operation and detection of anomalies. Table 5 and Figure 4 present the results of the calculation of precision, recall, F-score and efficiency of the 2C8 model. As a result, it is established that in order to diagnose the condition of the service station it is necessary to substantiate and build a model that provides detection of an anomaly with a minimum number of false alarms during operation in the normal state.

Model testing results
It is accepted that during using models, they must be rated "good" or "very good" to be considered accurate enough for use in the diagnostic process (Table 3).
It is shown that 13 of the 96 created models meet the established criteria. Model 2C8 with a training period of 12 months, 30 automatically selected independent variables, 4 hidden layers and 90 hidden neurons showed very good results and was chosen as a reference model (Fig. 5). This made it possible to predict the anomalous state of the service station with a coefficient of determination of 0.965 based on the measurement of the surface temperature of thermal power equipment, which can be used as an indicator of the technical condition. According to the results of testing the diagnostic system, it was found that models with a training period of one month show similar behavior to each other, regardless of other learning parameters. These models are characterized by satisfactory adaptation to these exercises. It is shown that the training period of one month is too short for the network to learn about the full features of the process.
It is shown that models with a training period of 4 months have better forecasting characteristics. Models 1B5-1B7 and 2B5-2B8 with 30 hidden neurons in the first hidden layer have relatively better overall values at all stages of training, monitoring and detection of abnormalities. These models are the group with the best performance, as they account for 7 out of 13 reliable models.
The accuracy of the models increases with the addition of two hidden layers. The set of variations among these models and the influence of the number of hidden neurons on the efficiency of the model are analyzed.

Conclusions
The structure of the neural network was developed on the basis of stacking algorithms of the recurrent neural network with the architecture of long short-term memory and autoencoder as part of the technical diagnostic system of the service station, which allowed to predict equipment failures in a small number of anomaly precedents. The use of the neural network in diagnostic systems increases the reliability of predicting anomaly conditions by 9%.
Models of neural networks are implemented and tested. Model 2C8 with a training period of 12 months, 30 automatically selected independent variables, 4 hidden layers and 90 hidden neurons showed very good results and was selected as a reference model. This made it possible to predict the anomalous condition of the service station with a coefficient of determination of 0.965 based on the measurement of the surface temperature of complex thermal power facilities, which can be used as an indicator of the technical condition.
According to the results of research, it is established that models of neural networks with a training period of 4 months have relatively better characteristics, while the number of hidden layers and neurons is recommended to be determined experimentally.
Such approach to diagnostics of energy equipment can make energy industries more ecological, because concentrations of different pollutants in exhaust gases of thermal power facilities can use as information parameter of the process. So, continuous diagnostics on the bases of exhaust gases analysis can reduce the concentration of pollutants which throw out into the atmosphere.