The use of predictive analysis algorithms and methods for the overhead contact line operation mode selection

. The article tells about the problem of the railway infrastructure objects reliable, safe, and effective operation on an example of the overhead contact line operation mode rational selection. The overhead contact line operation mode modelling results are described. The model is considering exploitation and climatic conditions and is based on the clusterization, classification, and neuronal network modelling methods for predictive analysis of a large amount of data.


Introduction
A large amount of railway infrastructure objects is operating under a high load condition and has no reservation.Their operation reliability is affecting the transportation process traffic implementation and train traffic safety.Infrastructure elements and object parameters may extend beyond their operating range under the effort of external factors and thus lead to normal operating mode malfunctioning.Therefore, the monitoring of a traction power network's technical condition, parameters, indicators, and characteristics is an actual problem.Nowadays the manual labour share reaches 100 % for a certain pre-failure and alarm condition detection operations, which indicates that the technology used for overhead contact line exploitation is underdeveloped.The transportation process traffic raise requires an increase of a high-qualified service personnel number, which is difficult to realize in sparsely populated areas, which are typical for zones of the lengthy railway lines location, and imminently leads to operating costs increase.
The railway traffic efficiency ensuring, which is an actual problem under a lack of resources, is possible without a total railway infrastructure objects reconstruction by the use of the hybrid decision support system based on modern Data Science technologies.Timely response to the object exploitation condition changes based on climatic conditions and technical preparedness allows optimization of a transportation process and increases its safety and reliability.The important requirement of the decision support system is information resources and transport system digital service safety, which are critically important to the operability of the railway transport equipment.The proposed system allows to reduce damage risk caused by emergencies and increases its attractiveness to the consumer.
The diagnosis and remote monitoring systems is controlling the technical conditions of the particular railway infrastructure components (transformers, overhead contact line equipment, overhead power lines, etc.) [1][2][3][4][5][6][7][8].One of the exiting diagnosis systems' serious shortcomings is a lack of a methodological foundation for a complex analysis of diagnostic information flow, retrospective data accumulation and storage methods, which have particular importance as a raw training set data for machine learning, clusterization and classification networks.
The system proposed in this article for the overhead contact line operation mode selection is based on the use of the modern analysis algorithms of streamed and accumulated data from the measuring devices.The predictive analysis method described in this work is based on artificial intelligence and allows to predict transport infrastructure objects' operability in the short and long term and automatically sends to service personnel detailed information about the current and predicted diagnosis objects' conditions.
In the case of the diagnosis data being insufficient, the work of the proposed complex is carried out by clustering data and objective observation.The collected pattern database of the overhead contact line equipment technical condition allows classification of events based on the reliability engineering theory and mathematical statistics methods, and the use of neural networks allows to predict an equipment technical condition following their entire work history.Due to the use of smart technologies, the complex allows taking into consideration the exploitation and climatic factors, as well as predicted consequences during the recommendation development for taking the operational decision.

Materials and Methods
The operation principle of the service personnel decision support complex proposed by the authors is based on an expert system, which is based on the knowledge of experts in the field of the overhead contact line and allows to rate the current controlled object condition.The predictive analysis method is based on artificial intelligence, and competing hypotheses methods are used to predict the operability of the analysed technical system in a short and long time.
The overhead contact line control system's diagnosis sensors collect the information and transmit it to the application server in the form of the <digital fingerprints= of the condition and equipment operational mode.The <digital fingerprints= formation principle is shown in fig. 1. Processed information in the form of the <digital fingerprints= is accumulated in the database under MySQL control, where they are further processed [9].If there is no information about the <digital fingerprint= affiliation to the specified group, the smart program clusterize the data, as the result of which the operable object condition clusters and their conditional coordinates in the clusterization area are defined.
After identifying by the expert (the railway technical service representative) the cause of uniting of the <digital fingerprints= into the cluster, the data engineer (programmer) assigns the cluster name containing the failure type, operation mode, etc.Therefore, by the use of clusterization algorithms and classification, the system adapts to new operation conditions appearances.The expert uses digital models to validate the collected diagnosis data.The object technical condition prediction is performed by the use of dynamic neural network models with short-term and long-term <memory=, which is required for problemsolving in cases when the neural network output depends not only on input parameters which characterize the current condition but also on previous events conditions [10].The work of the software and hardware decision support complex for the contact wire tension determination consists of the next steps.At the first, the computational experiment based on the finite element analysis model is carried out and the contact wire vibration array is modelled for various overhead line contact wire tension options.The electric rolling stock pantograph passage <digital fingerprints= is converted to frequency response spectres (fig.

2).
The machine learning goal is the comparison of the contact wire's tension and its vibration <digital fingerprints= stored with a specified discretization.The algorithms based on the <supervised learning= method are a priority for this goal.The <digital fingerprints= of various tensions are qualified for the considered mathematical model learning and getting the output data set.
Dozens of various classification algorithms based on the selected marked data can be used for the <digital fingerprints= allocation to the categories.The optimal algorithm chosen in terms of complexity, accuracy and learning speed is based not only on theoretical aspects but else on the <trial and error= method.The appropriate algorithm for the assigned problem solving is defined by the computational experiment results and the MATLAB <Classification Learner= app classification algorithms.The main classification model types used in the application are the decision tree, discriminant function analysis, support vector machine, logit model, nearest neighbours algorithm, naive Bayes classifiers and classification ensembles.The minimum amount of each input data type for the learning musts contain at least two contact wire tension options.Each obtained frequency of the digital signal spectres set transmits as input data to the model.Therefore, the input number is defined by the signal frequency number.
The selection of the best classification model in the software environment is performed automatically by the minimal model error criteria.The most effective option in the terms of modelling speed and accuracy is a decision tree-based architecture (MAPE 0.3%).The mathematical model structure for studying the contact wire tension influence on the vibration process is shown in fig. 3.
As the results of available algorithms analysis, the decision tree-based model has the highest efficiency for assigned problem-solving.
The decision tree model obtained by <Classification Learner= is saved to MATLAB workspace for further its parameters settings and test dataset classification quality analysis.The parameter settings are performed by the goal seek method, based on the known research and recommendations [11][12].The structure of the created decision tree model for the research is shown in fig. 4. The partitioning quality assessment function is based on the idea of reducing "uncleanness" (partitioning, in which the node will have as many examples of one class as possible and as few as possible of all others) and is formed in the Gini index [11].
The Gini index for the T dataset, which consists of n classes, is given by the formula: , where class probability i in T. At each step of the tree building according to the CART algorithm, the rule generated in the node divides the training sample into two partsthe part in which the rule is observed and the part in which the rule is not observed.The T dataset divides into two parts T1 and T2 and the number of parameters in each of them are N1 and N2, accordingly.Therefore, the partition quality index is given by the formula:

VIBRATION FAST FOURIER TRANSFORM DECISION TREE MODEL
wherenumber of parameters in parent-node; and -examples in the left and right child, accordingly; ý andnumber of exemplars of i-class in the left and right child.The partition, that has the lowest ÿ Ā ā ( ), is the best.The partition in each node of the tree is carried out by one variable.The input vector of influencing factors has a numeric type, so the rule formed in the node will match inequality: ≤ , where ccertain limit.
According to the CART algorithm tree clipping method, all possible subtrees are not considered, and the partition mechanism is limited only to the "best representatives" according to the evaluation of the tree complexity.The tree complexity is given by the formula: wherecertain parameter, that is changing from 0 to +∞; The full tree value includes two componentsthe tree classification error and the penalty for its complexity.If the classification error is constant, then the full tree value is increase with the increase.So, depending on , the tree, which has a smaller number of branches and gives a larger classification error, can cost less than the tree, which has a larger number of branches but give a less error.

Results
The data sample, which was stimulated by the use of a finite element model, was used to assess the simulation results' reliability.The test data sample has 5,500 readings to verify the quality of the model.The obtained decision tree model has an average absolute error less of than 5% in the test area.The classification algorithm effectiveness evaluation is performed by the use of a <confusion matrix=, which is the randomness table of correct and incorrect classifications.The results of the model quality valuation based on the decision tree model are shown for various wire tension in table 1.

Conclusions
As a result of the smart diagnosis support program, the solver uses the database and transmits recommendations and detailed information about the current state of the overhead contact line nodes through the graphical interface to the service personnel, when diagnostic parameters change.
The developed program complex allows simulation by the use of the prepared decision tree structure.By the <digital fingerprints= set inscribed in the model, that affect the wire tension, the information about the contact wire tension can be obtained by the combination of the influencing factors.The model based on the contact wire vibration allows calculating the uptime probability, deciding the need to increase the service personnel readiness or to reduce the rolling stock speed, etc.The developed model, based on the decision tree model, can be implemented using low-power computers, including microcontrollers, due to its simplicity.
It is important to note that the algorithms for obtaining, processing and interpreting information, as well as software that allows to effectively solve the problems of predicting contact network parameters and preparing operational decisions and control actions related to the subsystem of railway power supply have a key value for the effective operation of the proposed system.Diagnostic information must be uploaded to a single database on the application server to account for all influencing factors.

Fig. 2 .
Fig. 2. The surface of the frequency response spectres set, which is corresponding to the electric rolling stock pantograph passages.

Fig. 3 .
Fig. 3.The mathematical model structure for studying the contact wire tension influence on the vibration process.

E3SFig. 4 .
Fig. 4. The structure of the created tree decision model for contact wire tension definition.
( )tree classification error, that is a ratio between the number of incorrectly classified examples and the number of examples in the training sample; | |number of leaves (terminal nodes) of the tree.

Table 1 .
The valuation of the model quality based on the decision tree