Machine Learning based Intrusion Detection for Cyber-Security in IoT Networks

IoT network is a promising technology, IoT implementation is growing rapidly but cybersecurity is still a loophole, detection of attacks in IOT infrastructures is a growing concern in the field of IoT. With the increased use of Internet of Things in different areas, cyber-attacks are also increasing proportionately and can cause failures in the system. IDS becomes the leading security solution. Anomaly based network intrusion detection (IDS) detection plays a major role in protecting networks against various malicious activities. Improving the security of loT networks has become one of the most critical issues. This is due to the large-scale development and deployment of loT devices and the insufficiency of Intrusion Detection Systems (IDS) to be deployed for the use of special purpose networks. In this article, the performance of several machine learning models has been compared to accurately predict attacks on IoT systems, the case of imbalanced classes was subsequently treated using the SMOTE technique. The Nystrom based kernel SVM is the first time used to detect attacks in the IoT network and the results are promising. The evaluation metrics used in the performance comparison are accuracy, precision, recall, f1 score, and auc-roc curve.


Introduction
With and the growth of the IoT network system due to increasing demand, Internet of Thing is the latest promising emerging technology that connects everything in the world through the Internet. A large number of IoT devices and networks have been used in recent years for different areas of use [5]. Health [3], smart cities [4], supply chain [1] and agriculture [2]. With this extended use of IoT, new protocols are being deployed [6]. So IoT technology guarantees to improve and help our personal, professional and society life [1]. IoT models are getting more complicated every day [7,8]. The IoT is a network of smart objects around the world through the internet without any human interference, which is great, but it is susceptible to cyber-attacks like any other network. research is geared towards machine learning based applications alongside IoT. Intrusion Detection System (IDS) is an effective technique for detecting cyber-attacks in any network. Most of the latest IDS are based on a machine learning algorithm for the detection of cyber-attacks in the network. currently their use is in all areas of human life [9].
The IoT network consists of connections between different types of smart objects ranging from supercomputers to small devices which can have very low computing power, so securing this type of network is difficult and therefore cybersecurity is a big loophole [10]. * Statista has estimated the impressive number of connected loT devices in 2020 and this number will double by 2025 [11]. Columbus reported that the world currently has a 20% adoption rate for loT devices and expects to jump to 80% in less than a decade, which means that 20% of all services offered in the world involve technology related to loT. With the massive increase in the use of loT devices, many researchers have expressed concern about the security of loT user information [12]. In 2016, a series of distributed denial of service attacks, targeting loT networks and websites like Twitter, Netflix and PayPal really pointed out to the world how urgent it is to develop a strong security solution. Developing an IDS for loT environments to mitigate these malicious attacks is crucial. An IDS designed for loT networks must be able to analyse data and generate instant responses in real time and adapt to improved IoT infrastructure. Many researchers have shown promising results in detecting network intrusions, however, only a limited number target their research on loT scenarios [12,13].
We will present solutions based on machine learning that can detect and protect the system when it is in an abnormal state. we will also study the impact of data oversampling on the performance of the various machine learning models used, several machine learning classifiers were used. we will treat the binary and multi-class case and we will present a comparative study of the different techniques used after the use of the SMOTE technique to resampling and balance the data set. Further analysis and comparison with other work will be briefly described in the following sections. Section 2 provides a description of other researchs on anomaly detection in IoT network traffic. The description of the dataset, the different types of attacks as well as the different models and metrics used are detailed in section 3. Section 4 presents the results of our analysis and a comparative study. Finally, the conclusions and perspectives are presented in section 5.

Review
Recently, machine learning algorithms have shown promising performance in detecting abnormal and malicious activity in the loT network. Several similar works have been carried out in the fields of IoT.
Pahl et al. [7] mainly developed a detector and firewall for an anomaly of IoT microservices in an IoT site. Liu et al. [9] proposed a detector for On and Offattack by a malicious network node in an industrial IoT site. By On and Off-attack, they meant that the IoT network could be attacked by a malicious node when it is in an active or On state. In addition, the IoT network behaves normally when its malicious node is in the idle or disabled state. In [15] the authors represented an intrusion detection system for the IoT. To this end, several ML classifiers have been used to successfully identify network analysis surveys and simple forms of Denial of Service (DoS) attacks. To generate the dataset, network traffic is taken for four consecutive days using Wireshark software. To apply ML classifiers, Weka software was used.
Ukil et al. [16] discussed the detection of anomalies in IoT-based healthcare analytics. A model for detecting heart abnormalities via a smartphone was also presented in this article. For the detection of anomalies in healthcare; IoT sensors, medical image analysis, biomedical signal analysis, big data mining, and predictive analytics have been used.
Pajouh et al. [17] presented an intrusion detection model based on a two-layer dimension reduction and two-level classification module. This model was also designed to identify malicious activities such as User to Root (U2R) and Remote to Local (R2L) attacks. For dimension reduction, component analysis and linear discriminant analysis were used. The NSL-KDD dataset was used to perform the entire experiment. To detect suspicious behavior with the two-level classification module, Naive Bayes and the Certainty Factor version of K-Nearest Neighbor were applied.
Alrashdi et al. [19] conducted their research on the design of an anomaly detection IDS for Smart city. For the binary classification, they were able to obtain an accuracy rate of 99%. in the classification they applied just one classifier which is Random Forest (RF). In [14], Bakhtiar et al. have applied the J48 algorithm to create an intrusion detection system. Their experience was able to detect only denial of service attacks.
Several research works based on CNN, LSTM have been applied for the detection of anomalies in IOT networks [23,24]. Monika et al. used a hybrid method which combines these two algorithms for the detection of cyber-attacks on an IoT infrastructure [25]. Diro et al. [21] presented a comparative study of a deep and shallow neural network. The main objective of this work was to detect four classes of attacks and anomalies. For four classes, the system achieved an accuracy of 98.27% for the deep neural network model and an accuracy of 96.75% for the shallow neural network model. A detailed description of a smart home system where security vulnerabilities have been detected by the Dense Random Neural Network (DRNN) deep learning method has been introduced in [22]. They mainly described the denial-of-service attack and the denial of sleep attack in a simple IoT site.

description:
UNSW-NB15 is a network traffic data set with different categories for normal activities and synthetic attack behaviours. The raw network packets of the UNSW-NB15 dataset was created by the IXIA Perfect Storm tool in the Cyber Range Lab of the Australian Centre for Cyber Security (ACCS) for generating a hybrid of real modern normal activities and synthetic contemporary attack behaviours.

Data pre-processing:
All machine learning research requires exploratory data analysis and data observation. The first task of this research was to make the dataset feedable by any classifier. For this reason, the first step was therefore to deal with the missing data. In the dataset First, we need to check if there are any missing values in the dataset. A basic strategy for working with incomplete datasets is to remove entire rows and / or columns with missing values. In fact, there are strategies for imputing missing values [26]. For simplicity, we will reject the four rows with missing values. For feature selection, no machine learning approach has been taken here as this will not have a significant impact on data analysis.
In the characteristic engineering steps, it is necessary to first determine the type of entities in the dataset. The dataset contains categorical and numeric data. Categorical data can be classified into ordinal and nominal values respectively, while numeric data set into discrete and continuous values. The next vital task is to convert the categorical data into vectors. Categorical data can be converted to vectors in several ways. Label Encoder and One Hot Encoder are popular among them.

Evaluation criteria
Evaluating a model is an essential part of building an effective machine learning model. There are several evaluation metrics The following metrics were calculated to evaluate the performance of the developed system. Using these metrics, one can decide which technique is best suited for this job.

Recall
The recall is known as the actual positive rate which means the number of positives in the model claims compared to the actual number of positives there are throughout the data. The recall value for a single class is given in the following equation:

F1 score
The F1 score can also measure a model's performance. It is a weighted average of the precision and recall of a model. The F1 Score value for a single class is given in Eq. (4).

Linear Discriminant Analysis
The goal of the LDA technique is to project the original data matrix onto a lower dimensional space. To achieve this goal, three steps had to be carried out. The first step is to calculate the separability between the different classes (ie the distance between the means of the different classes), which is called the inter-class variance or matrix inter-classes . The second step is to calculate the distance between the mean and the samples of each class, which is called the within-class variance or the within-class matrix . The third step is to construct the lower dimensional space which maximizes the variance between classes and minimizes the intra-class variance.
Given the original data matrix = 1 , 2 , . . . , , where represents the ith sample, and N is the total number of samples. Each sample is represented by M Features ( ∈ ). In other words, each sample is represented as a point in M-dimensional space.
Assume the data matrix is partitioned into c classes as follows, = [ 1 , 2 , . . . , ] , the separation distance between the different classes which is noted ( − ) will be calculated as follows: where represents the projection of the mean of the ith class and is calculated as follows, = Π . where is the projection of the total mean of all the classes and it is calculated as follows, = Π . Π represents the transformation matrix of LDA.
(1 × ) represents the mean of the ith class and it is calculated as in equation (8), et (1 × ) is the total mean of all the classes.
The term ( − )( − ) in the equation (7) represents the separation distance between the mean of the ith class ( )and the total mean ( ), or simply it represents the inter-class variance of the ith class ( ). Replace in the equation (7)as follows: The total inter-class variance is calculated as follows ( = ∑ ) The within-class variance for each class can be calculated as follows, After having calculated the inter-class variance( )and the intra-class variance ( ), the transformation matrix (Π) of the technique LDA can be calculated as in the equation (11), called Fisher criterion, we try to optimize it by maximizing ( )and minimizing ( ) .This formula can be reformulated as in the equation (12) argmax Π Π Π Π (11)

Support Vector Machine (SVM)
Support Vector Machine is another discriminative model like LR. It is a supervised learning model for analysing the data used for classification, regression, and outliers detection. SVM is most applicable in the case of Non-Linear data. Given Input , Class or Label and LaGrange multipliers a, weight vector can be calculated by following equation: The target of the SVM is to optimize the following equation: In Eq. (14), < xi, xj > is a vector which can be obtained by different kernels like polynomial kernel, RBF kernel and Sigmoid Kernel.
We will also implement SVM with the Nystrom approximation which is a low-rang approximation technique to approximate the kernel matrix. This approach is the first time used to detect attacks in the IoT network system.

Decision Tree (DT)
A decision tree is a decision support tool representing a set of choices in the graphic form of a tree. The various possible decisions are located at the ends of the branches (the "leaves" of the tree), and are reached according to decisions taken at each stage. A DT generally starts with a single node and then it branches into possible outcomes. Each of these outcomes lead to additional nodes, which branch off into other instances. So, from there, it became a treelike shape; in other words, a flowchart-like structure. Considering a binary tree Fig. 3 where a parent node is split into two children node a left child and a right child. Parent node, left child and right child contains data , , , respectively. Given, features X, impurity measure I(data), the number of samples in parent node , the number of samples in left child and the number of samples in right child ; DT's target is to maximize following Information Gain in Eq.17.
Impurity Measure ( ) can be calculated in three techniques Gini Index , Entropy and Classification Error .

Random Forest (RF)
As the name suggests, the random forest algorithm creates the forest with many decision trees. It is a supervised classification algorithm. Many decision trees come together to form a random forest, and it predicts by averaging the predictions from each component tree. It generally has better predictive accuracy than a single decision tree. In general, the more trees there are in the forest, the more robust the forest appears.

Logistic Regression (LR)
LR is a discriminative model which depends on the quality of the dataset. Given the features { = where the predicted Value is P.

Machine Learning algorithms
We will implement the different machine learning algorithms seen above on our database for the two cases: binary (see table 1.b) and multi-class (see table  1.a).

Resampling data using SMOTE technique
Machine learning algorithms trained with unbalanced data cannot effectively recognize attacks if they are minority data. One way to solve this problem is to use resampling, which adjusts the ratio between the different classes, the lens and to make the data more balanced. we will investigate the influence of oversampling on the performance of the different classifiers that we use.

Results:
Metrics

Conclusion
On the basis of this study, it was found that in the multi-class case, LDA, RF, DT techniques gave better performance than others, using this type of data to detect cyber-attacks on IoT network traffic, because they predicted attacks with higher accuracy compared to other approaches. in the binary case the DT, RF techniques and the Nystrom-SVM-which is applied the first time for attacks detection in the IoT network traffic, presented better performance. When we trained our algorithms with balanced data, we noticed a better efficiency in the detection of attacks. This depth study is necessary to develop a robust detection algorithm. It has been assumed that the network traffic is that of the loT network traffic, because very few IoT network traffic datasets are publicly available. In addition, in the case of streaming data, different issues may arise. More empirical study is needed on this problem focusing on streaming learning.