Detection of internal security incidents in cyber-physical systems

. This paper addresses the issue of internal security breaches in cyber-physical systems framing it as an anomaly detection problem within the framework of machine learning models. The use of powerful mathematical apparatus embedded in the structure of machine learning models, including models based on artificial neural networks, allows building an autonomous system for detecting internal security breaches with minimal reliance on expert assessments. The determination of user abnormality is made on the basis of average data on log entries of actions in the system identified as abnormal, as well as on statistical data on the number of such entries for each user. The results presented here demonstrate the successful application of these models to the task of identifying insider threats to system access subjects.


Introduction
In today's world, cyber-physical systems, which combine the physical world and computer technology, are playing an increasingly important role in various fields, ranging from industrial enterprises to government management systems.However, as the cyber-physical systems continue to develop, there are more and more threats to information security [1][2][3][4][5] that can cause serious damage not only to the systems themselves, but also to the environment and human lives.
One of the major threats is a cyberattack, where malefactors attempt to breach a system by exploiting various types of software and hardware vulnerabilities.Such attacks can lead to paralyzing the system or even controlling it, which can have catastrophic consequences.For example, in the case of industrial control systems, cyberattacks can trigger industrial accidents, loss of material assets, and even threaten the lives of employees.Attackers can manipulate data within a cyber-physical system through control actions, which leads to unpredictable outcomes.For example, in the case of a transportation management system, manipulating data can lead to accidents or traffic congestion.Such attacks can be particularly dangerous for systems that control critical facilities, such as power plants or nuclear reactors.
Given these challenges, one of the primary principles in designing the segmentation of cyber-physical systems is to isolate the primary control interfaces from external information networks as much as possible and to build a multi-layered defense against external attackers.However, equally important is the problem of increasing the effectiveness of defense systems capable of detecting internal threats, i.e., threats originating from legitimate access subjects.This paper discusses the results of an experiment to expand the feature space of the internal threat detection model through the use of artificial neural networks (ANN) in the context of analyzing the work of the operator of a cyber-physical system.

Machine learning model synthesis
Currently, there is no publicly available real-world dataset from cyber-physical systems to build an automated subsystem to detect insider threats, so researchers use synthetically generated data in their work.Most sets are data in a given format, such as CSV format.These datasets typically include rows of records, which often contain date, time, user, or device information, as well as details about actions within the system.Therefore, a popular dataset used in this study is CERT [6], created by the Software Engineering Institute at Carnegie Mellon University.This "privacy constraint free" dataset is designed to allow insider threat researchers to experiment and evaluate their proposed approaches and techniques.
After analyzing the said dataset and reviewing published studies on similar problems [7][8][9][10], it was decided to investigate the most popular algorithms that have shown the best results: one-class SVM (OCSVM), autoencoder (AE), and long short-term memory (LSTM) network.
The synthesis and training of artificial neural network (ANN) models are markedly different from classical machine learning models.ANNs are a set of sequential layers, each of which allows for the selection of the activation function and the corresponding dimensions.Learning parameters are the initial learning rate, the error counting function, and the early learning stopping criteria.Thus, ANNs have a large number of hyperparameters determined during the initialization phase [11,12].The following parameters and network architectures yielded the best performance evaluations: In this paper, a sequential training mode is used, where the weights are adjusted for each training sample.For large datasets, adjusting on each sample is computationally expensive, so the training set is divided into batches and the weights are adjusted on each of these batches.
The dimensionality of the input layer of models should coincide with the dimensionality of one training example, and the output dimensionality is determined from the problem conditions.In the proposed models, anomaly detection is based on the reconstruction error, i.e., the dimensionality of the output layer must align with the dimensionality of the input layer.Additionally, to reduce the overtraining of the network by adapting individual neurons, dropout layers with a certain coefficient are used.In the proposed approach, it is suggested to use a coefficient equal to 0.2 for model synthesis.
The architecture of the obtained models is presented in Figure 1.
The first synthesized architecture is the autoencoder network.The model consists of an encoder with three forward propagation layers with the number of neurons equal to 128, 64 and 32 respectively.All forward propagation layers in the network except the transmission layer between the encoder and decoder alternate with exclusion layers.The activation function at each layer is a linear rectifier (ReLu).The choice of the optimization function will determine the algorithm to calculate the error at each step.The comparison of the optimizers shows that when using ADAM the final training error is smaller than when using the SGD optimizer due to the use of impulsive momentum in ADAM.Here and hereafter ADAM is selected as an optimizer.
The parameters of the model are: error calculation using the mean square error formula, initial learning rate of 0.01, early stopping with minimum delta of 0.001, batch size of 5000.
After training the network, it is necessary to correctly estimate the value of anomaly for each record, which in this case is the network reconstruction error for each test sample and is calculated by the formula (1): (1) where X test is the network input, Y test is the network output for a particular sample.An example of the resulting prediction of the autoencoder network when trained on all features is shown as a histogram in Figure 2(a).A 90% quantile of the anomaly values was used to calculate the automatic threshold for deciding the anomaly of each sample.In deciding the anomaly of each user for the automatic threshold, the average number of anomalous records for each user was calculated; the resulting histogram of the number of anomalous records for each user when trained on all features is shown in Figure 2(b).This histogram shows a noticeable increase in the number of anomalous records for each user, which explains the estimates for this model in deciding whether each user is anomalous.The next ANN architecture synthesized is the LSTM model.Since the dataset is a sequence of temporal data to solve the anomaly detection problem, it is possible to apply the LSTM network to account for the timescale.To represent the data in the LSTM network, it is necessary to transform the data set into a sequence of arrays of equal length, which would correspond to a discrete time window.Each sample submitted to the LSTM network is a sequence of log records from  to  � , where  is the size of a discrete time step, and  is an iterator going from 0 to  � , where  is the length of the dataset.This approach to data representation allows LSTM to detect anomalies in a given discrete time interval.Also, this approach multiplies the amount of allocated memory for storing training samples and the network is trained noticeably longer than the autoencoder network.
After training the network, it is necessary to estimate the value of anomaly for each record, which in this case is the error of reconstruction of the record in a given time window for each test sample and is calculated by the formula (2): (2) where X test is the network input, Y test is the network output, t is the time step.
The results of the work are presented in Figure 3.After obtaining the predictions for each model, i.e., the anomaly values for each record, it is necessary to obtain estimates of classification performance.In the case of anomaly detection, the features for model performance evaluation are overall model accuracy, classification precision and recall which are used in evaluating most of the classification algorithms.Since it is important for us to detect the maximum number of insider threats among all the important evaluation is the recall, which is calculated as the ratio of anomalous samples (users) detected correctly to the total number of anomalous samples (users).Also, a significant evaluation is the overall accuracy of the model, which is calculated as the number of correct predictions to the total number of predictions.In this study, the task of anomaly detection was reduced to identifying two classes, normal and anomalous.
Table 1 shows the results of the evaluation performed on the above metrics.Based on the obtained results, it can be argued that with a larger number of attributes, i.e., with more information about users' actions, the single-class SVM models may lose generalization ability, which is due to the peculiarities of calculating the anomaly value.At the same time, the automatically calculated threshold and the threshold with the best estimates are close enough, which results in close automatic and best estimates, which can be an advantage for an automatic system for detecting insider threats with minimal participation of experts.
In contrast, for ANN-based models, a larger number of features provide more information to the networks, which increases their generalization ability.Moreover, the estimates in deciding whether each record is anomalous are noticeably smaller than those in deciding whether each user is anomalous, further confirming the high generalization ability.The decision-making of machine learning models is based on making a decision about each sample without the context of nearby samples, while ANN-based models due to the large number of adjustable parameters can average the output values, also due to the effect of overtraining.
In general, the presented results indicate the possibility of successful application of these models in the task of detecting insider threats.Meanwhile, the choice of a machine learning model depends on the task at hand and the type of analysis to be performed.Machine learning models are able to detect internal threats in each record with high accuracy, while ANNbased models have high scores in detecting internal threats on a per-user basis.

Conclusion
Information security threats to cyber-physical systems pose a serious danger to modern society.However, with appropriate measures and technologies, it is possible to ensure reliable protection of these systems and minimize the possible consequences of threats.It is crucial to understand that the security of information technology and cyber-physical systems is a global problem, and requires active coordination of efforts on the part of governments, the scientific community, and organizations.In order to protect cyber-physical systems from such threats, a wide range of measures must be taken.Firstly, it is essential to develop and enact suitable information security policies, which should encompass employee training and regular updates of protective measures.Secondly, modern technologies and protection methods such as layered defenses, data encryption, and security monitoring systems should be used.The presented results indicate the effectiveness of applying ANN models for automated anomaly detection in the action logs of cyber-physical system operators.In addition, robust security standards for cyber-physical systems should be developed and implemented, with regular reviews and audits conducted to ensure ongoing compliance.
The article is based on the results of research carried out at the expense of budgetary funds on the state assignment of the Financial University.

Figure 2 .
Figure 2. Prediction result of the autoencoder model (a) on the dataset with all features (b) for each user when deciding on the anomaly of each user.

Fig. 3 .
Fig. 3. LSTM model prediction result (a) on the dataset with all features (b) for each user when deciding on the anomaly of each user.

Table 1 .
Evaluation of the classification performance of the synthesized models.