Evaluation of data quality based on Bayesian networks in railway rolling stock monitoring systems

. The purpose of the research is to evaluate the quality of data based on the Bayesian network to justify the effectiveness of the Internet of Things technology in monitoring railway rolling stock. To achieve this, performed the following tasks: presented technological schemes for monitoring railway rolling stock; built Bayesian network; created probability tables; determined conditional probabilities of control events. The existing and proposed railway rolling stock monitoring systems in the Republic of Uzbekistan are investigated. To justify the effectiveness in data quality, made an evaluation based on the Bayesian network. Conditionally selected one section with a train that serves intermediate stations. Two systems monitor this train's railcars: the Automated operational transport management system (AOTMS) and the Automatic control system for rolling stock and containers (ACSRSC). Technological operations transformed into probabilistic relations. The model defines three control events for comparison. The results showed that the quality of data of the existing monitoring system is lower than that of the proposed one. It is due to the dependence on the transfer of information among themselves to the AOTMS. In the ACSRSC system, all information, regardless of previous information, will be received individually. In addition, the level of subjective interventions for the transfer of information is also significantly reduced. The capabilities of ASMSCS are significantly superior in terms of data quality. Subsequent applications of the Internet of Things technology will help improve the quality of management decision-making in the organization of train traffic.


Introduction
Qualitative data on the state of managed facilities are critical in making management decisions.In the operational management of transportation by rail, the primary information is the location and condition of rolling stock.Proper distribution of rolling stock in the organization of cargo transportation increases the income level by reducing the cost of transportation.
The Automated operational transport management system (AOTMS) is based on manually preparing primary data on operations with rolling stock.Tracking of railcars is carried out indirectly.That is, the location of railcars depends on the actions carried out with the trains.Depending on the operations performed at the stations, operators send the appropriate messages.The reliability and timeliness of reports depend on the operators of the station.The human factor, including subjective interference, presents a certain level of doubt about the output information of the system.After receiving the information, it is necessary to verify the information additionally.
More advanced monitoring technology is proposed, with better output data.The authors developed the MVP of the rolling stock monitoring system called the Automatic control system for rolling stock and containers (ACSRSC).The system operates automatically without the direct participation of a person on the principle of the Internet of Things.The energy-efficient long-range Lorawan network has been chosen.This is the most suitable technology for railcars and containers since the duration of the sensors in autonomous power supply mode is the longest.
In this research, we recommend using data quality as the criterion to demonstrate the advantages of the suggested Internet of Things technology in monitoring railway rolling stock.Data qualitymeasures how well a data set fits an organization's requirements.The main criteria for data quality include completeness, reliability, accuracy, consistency, availability and timeliness.
Modern methods of assessing the quality of documents and classifying evaluation methods are considered in detail in the works [1].The authors of the articles [2,3] propose methods for evaluating data in organizations and web portals.When choosing a methodology, attitudes to the problem and the specifics of the system's work are decisive.
There are some tools to assess the quality of data.They are software [10][11] that make it possible to analyze and compare data parameters.
The considered methods are more suitable for specific areas of activity and solve a relatively narrow range of problems in evaluating data quality.To solve the problem of evaluating the quality of data in the AOTMS and ACSRSC systems, the considered methods' capabilities do not meet the set requirements.The considered methods are more suitable for specific areas of activity and solve a rather narrow range of problems in assessing the quality of data.To solve the problem of assessing the quality of data in the ASOPA and ASMPSC systems, the considered possibilities of the considered methods do not meet the requirements.
The Bayesian network method is proposed to evaluate data quality in these systems [12,13].The Bayesian network provides a graphical probability model consisting of conditional probabilities.
Since the main result of the AOTMS and ACSRSC is output data, it is advisable to evaluate their effectiveness according to the data quality criterion.The research purpose is to evaluate the data quality with the Bayesian network to justify the effectiveness of Internet of Things technology in monitoring railway rolling stock.
Achieving this purpose includes the following tasks: 1. Description of their schemes of operation of monitoring systems for railway rolling stock; 2. Building of the Bayesian network; 3. Creating probability tables; 4. Determination of conditional probabilities of control events.

Evaluation of the data quality with the Bayesian network 2.1 Description of technological schemes of operation of monitoring systems of railway rolling stock
AOTMS works on the principle of client-server (Fig. 1).Primary data on operations with trains sending from subscriber points.Most subscriber points are located at stations.Employees must timely and reliably issue a special message and send it to the server.The server preliminarily carries out the format and logical control, then processes the information and stores it in the database.The messages are interrelated.The last message must logically correspond to the previous one.For example, if the message about the departure of the train is not sent, then the neighboring station cannot send messages about the arrival of this train at the station.ACSRSC works on the technology of the Internet of Things.The authors developed the MVP of the ACSRSC system.The purpose of the system is to monitor the location of rolling stock and containers, as well as the condition of railcars (empty or loaded).Sensors installed on tracking objects send information to the gateways, which should be installed at the station and for train locomotives.Gateways forward information to a network server (Lorawan network).After processing the information, the network server sends ACSRSC to the server (Fig. 2).

Building of the Bayesian network
Comparisons of data quality in the two systems will be compared under the same conditions.The object of the research will be a railway section with a train.The train will pass through the section performing certain operations at passing stations.Stages of operations with the train: 1.
Stage 1 includes the departure of a consolidated train from the formation station L. The total number of railcars is 52.

3.
Stage 3 includes receiving the train at station P for crossing with the passing train (Fig. 3).We use the Bayesian network method to compare the data quality in these systems.A Bayesian network is a probabilistic graphical model representing variables and their conditional dependencies via a directed acyclic graph (DAG).Vertices can represent variables of any type, be weighted parameters, hidden variables or hypotheses.
The complete co-distribution of probabilities for a Bayesian network is as follows: (  )states of all variables are ancestors for the variable   .This expression is called the chain rule for total probability.The Bayesian network for the AOTMS system consists of three sets of variables. ′ ,  ′ and  ′ .The set  ′ includes operations with trains, set  ′ includes sending messages about completed operations with trains and set variables,  ′ control events to evaluate the data quality.
The Bayesian network for the ACSRSC system also consists of three sets of variables  ′′ ,  ′′ and  ′′ since the same operations are investigated.Set  ′′ includes operations with trains, set  ′′ involves sending information from sensors and set S' variables,  ′′ control events to assess the data quality.
The Bayesian network of monitoring systems was built based on operations with the train on the section.In Figure 6, we can see that the messages sent to the ACSRSC server are logically interrelated.In Fig. 7, we may notice no links between the information sent from the sensors.The information is related only to operations.

Creating a probability table
The tables is created based on a priori probabilities.Fig. 8 shows the conditional probabilities of events related to monitoring using AOTMS.

Fig.8. Table of the probability of Bayesian network AOTMS
Fig. 9 shows the conditional probabilities of events related to monitoring using ACSRSC  Let us make up the total probabilities of events for each system separately.Total probability of events for monitoring railcars based on AOTMS: At ACSRSC, information is collected directly from sensors attached to the wagons.Therefore, the incoming information to the server does not have a logical connection.Because there is no complete relationship between the events, the events are divided into four groups.The total probability of the four groups: Control events for comparing data quality will be sets  ′ and  ′′ .Let us set the conditions for calculating the probabilities.All operations with trains are performed (Fig. 5), we should determine the probability and the location and condition of the train cars.
The probability of elements set  ′ under given conditions is determined as follows: The probability of elements set  ′′ under given conditions is determined as follows:

Results
Using the GeNie tool, we will determine the probabilities of all events in the AOTMS (Fig. 10) according to the specified conditions (  ′ 1 = 1,  ′ 2 = 1,  ′ 3 = 1,  ′ 4 = 1,  ′ 5 = 1,  ′ 6 = 1).Using the GeNie tool, we will determine the probabilities of events in the ACSRSC (Fig. 11) according to the specified conditions (  ′′ 1 = 1,  ′′ 2 = 1,  ′′ 3 = 1,  ′′ 4 = 1,  ′′ 5 = 1,  ′′ 6 = 1).With the total probability of the Bayesian network, the results of the sensitivity of control events are presented in Fig. 12-17.GeNIe implements an algorithm proposed by Kjaerulff and van der Gaag (2000) that performs simple sensitivity analysis in Bayesian networks.Roughly speaking, with a given set of target nodes, the algorithm efficiently calculates the complete set of derived posterior probability distributions from the target nodes for each numerical parameter of the Bayesian network.These derivatives indicate the importance of the accuracy of the network's numerical parameters for calculating the targets' posterior probabilities.

2 .
Stage 2 is performed at station C. Four operations are performed sequentially: • the arrival of the train at the station; • uncoupling of two wagons for loading; • the departure of the train after uncoupling, in the amount of 50 cars; • loading of two unhooked railcars.

Fig. 5 .
Fig. 5. Operations with the train on the section

Fig. 9 .
Fig. 9.Table of the probability of Bayesian network ACSRSC

Fig. 10 .
Fig. 10.Probabilities of events in the AOTMS Bayesian network under given conditions

Fig. 11 .
Fig. 11.Probabilities of events in the ACSRSC Bayesian network under given conditions