Security Framework Connection Assistance for IoT Device Secure Data communication

Today, Internet of Things (IoT) services has been increasing extensively because of their optimum device sizes and their developed network infrastructure that includes devices based on internet embedded with various sensors, actuators, communication, and storage components providing connection and data exchange. Presently number of industries use vast number of IoT devices, there are some challenges like reducing the risks and threats that exposure, accommodating the huge number of IoT devices in network and providing secure vulnerabilities have risen. Supervised learning has recently been gaining popularity to provide device classification. But this supervised learning became unrealistic as producing millions of new IoT devices each year, and insufficient training data. In this paper, security framework connection assistance for IoT device secured data communication is proposed. A multi-level security support architecture which combines clustering technique with deep neural networks for designing the resource oriented IoT devices with high security and these are enabling both the seen and unseen device classification. The datasets dimensions are reduced by considering the technique as auto encoder. Therefore in between accuracy and overhead classification good balancing is established. The comparative results are describes that proposed security system is better than remaining existing systems.


Introduction:
The Internet of Things (IoT) technology is widely spread around us because of its high level of security and provides best privacy to the system [1]. As much as the best, facilities of the IoT devices are used. If there is increment in connected devices in a network through internet then estimation is created by IoT as billions of users are crossed till 2020 [2]. Therefore security issues are raised by increasing the number of devices in IoT wireless and security devices. Number of devices is connected with internet through the Internet of Things (IoT). So there is chance of threats from unauthorized user on a large scale which can manipulate the data [3]. Therefore data confidentiality, privacy, authorization and authentication are IoT main security issues [16]. In the following mentioned layers attackers can enter into the communication as cloud layer, network layer and hardware layer [6]. The attacker was entered into the communication at hardware layer of IoT device and security parameters are retrieved or hacked which are stored in the IoT device [4]. By using these stolen security parameters virtual IoT device or duplicate one is recreated by the attacker. False data is uploaded to the server by this duplicate IoT device and users secure information is retrieved from network to which IoT device is connected [5]. Once the attacker starts to retrieve security parameters of the IoT device, there are some extra security issues are raised without being physical connection with device. ECC (Elliptic Curve Cryptography) and RSA (Rivest-Shamir-Adleman) based encryption keys are stolen by the side channel attacks based on electromagnetic which is exhibited by the researchers. From IoT devices AES encryption keys are stolen by using side channel attacks because all these IoT devices are connected to the internet so weak strength is acquired by IoT devices which causes to interferences in the form of attacks [17]. One example of such attacks is MIRAI malware in which most of the IoT devices outside of the network are attacked. Other internet services and websites are attacked by using network zombies which are from outside of the network. The device performance is detected first in the proposed architecture and then securing operation is being processed to the IoT devices. The combination of clustering technique with supervised learning is proposed in this paper for enabling device seen and unseen type classification, hence the difference between secured IoT networks and unauthorized device accessing networks are detected [9]. Datasets dimensionality is reduced with proposed auto encoder technique which resulting the good accuracy and load balancing.

IOT security threats
An IoT technology non-standardization with weakness intensification is gives the IoT systems with great security [7]. Some generic threats brief discussion is described below.

2.1.1: Hardware Vulnerabilities:
The IoT products which are commercially developed are considering one main parameter as security while other devices which are functionality centric are not. So the addition of security features with devices is later. Hence, hardware vulnerability like open physical interfaces and boot process vulnerabilities remain in such devices, which can be exploited remotely [8].

2.1.2: Vulnerabilities of Social Engineering:
IoT devices interactions with humans and socialization are maintain a great impact on user's life. Social engineering attacks are attracted by IoT users because of large amount of collected data. Smart TVs, Google Glasses, Fitbits and refrigerators are some smart devices which are also controlled by hackers [20].

2.1.3: Legislation Challenges:
IoT data security is cannot guarantee by legislation so data misuse may results to damage of system then it can compensate. Till now there are not drafts for secure data policy and standardized legislation. Health Insurance Portability and Accountability Act (HIPPA) and General Data Protection Regulation (GDPR) are safety measures which are provided from different countries. 2.1.4: User Unawareness: Users are the conventional or traditional attack vectors for the network. Lacking of security awareness and training cause deficiency in security in phishing/spear-phishing or social engineering networks and in this end user as well as employees both are susceptible. Sensitive data transmission in public networks through mobile devices is also results the security degradation [10].

IoT security challenges
IoT is having different types of security issues or challenges. Three categories of challenges are divided as named as end applications, IoT data and communication related security [11]. The detail explanations related to these issues are mentioned below after layer and generic wise IoT threats discussion [18]. Confidentiality, integrity and availability are can be short formed as CIA. In any organization security of the information may follow the guidelines of CIA which are basic ones. So security of the system is defined by these three variables mostly.

Confidentiality:
The information availability is limited by these set of rules. The sensitive data cannot handled by unwanted people and make it for selection of right owner of data for doing further actions with these set of measures. The IoT services trustworthiness such as societal, manufacturer and personal are greatly depends on data genuineness which has the output with its undeviating effect [13]. End nodes of IoT must be confidential and authentic for secure transmission of data among the IoT applications and services.

2.2.2
Integrity: trustworthiness and correct data is explained through integrity. Over the data complete life cycle trustworthiness, accuracy and information consistency are involved in this integrity parameter [19]. The data should be same during the transmission and make sure with different measures that this information is cannot be changed are break by any unauthorized participants.

Availability:
The accessibility of data to authorized users is called as availability. This hardware is best practiced with strict maintenance. So, operating-system with proper working circumstances is provided which is free from software frays. Time to time up gradation of the system is also being done with the availability of data [15].

Security Framework Connection Assistance for IoT
The proposed architecture is shown in fig. 1 which is used for enabling the security operation for IoT devices without increasing the processing load. It is containing the IoT devices Gateways, clustering, a platform and applications (APP) and classification processing. End users are got the services from service providers by using IoT devices and APP. But service providers are connected with network infrastructures by the carrier with gateways and platform.
One of the extra network infrastructure advantages is enabling security-operation from the carrier standpoint which is called as "device assistance". The traffic in network can be captured and desired features are extracted by the data processing module when there is a connection of device with network. Each known device type uses the creation of one-vs-rest binary classifier and white list method is used in the Train module. Predict modules and labels used in the input as processed data when classifier ready for acceptance [12]. Classifier models are directly used by the Prediction module from Train for feature vector labeling and device type prediction. The labeling process of feature vector is observed by the discriminator [22]. If labeled then action module is receives the feature vector. Where the improvement https://doi.org/10.1051/e3sconf/202130 E3S Web of Conferences 309, 01061 (2021) ICMED 2021 901061 strategy is applied that module is called as action module. Several mitigation strategies are given to different categories and phases by the action module. If not labeled then clustering module is receives the feature vector and then continuously fed to the active module [14]. Clustering module detail explanation is described below.

Fig. 1: Security Framework for IoT Devices
The proposed architecture is uses the referred assistant technology as a key which is not only deals with "device assistance" but also concentration on processing load on the gateway. Device management function is arranged on both sides of gateway sides and platform sides for achieving high response speed. All devices are can be managed by management function from platform side and connected devices are can be managed by the management function from gateway side. Assistance determination function is also arranged at platform side for simultaneously assisting the policy by enabling the service provider. Security assistance function is maintained at gateway side for doing the mechanism of session resumption. Estimation function for device performance is kept at platform side this is because it requires interaction with devices like sending packets to devices. Therefore less amount of traffic is achieved at wide area network.

Gateway
The requirement of assistance is determined by the first step of gateway and provides the assists if required. One request for gateway's processing load ld td is for high performance devices, where, ld processing time for device performance determination is represented with td and processing load for device performance determination is represented with ld. One request for gateway's processing load is (ld td + lata ) for constrained devices. lc is one request for processing load as lata, here for device assisting involved processing load is represented with la and for device assisting involved processing time is represented with ta. Total number of devices are treated as N which access the gateway and assumed as n(0≤n≤N) high-performance devices. Therefore, the gateway's processing load of one request lp is ld td . According to the distance between data points, a database is created in ascending density order from the idea of OPTICS. Density-based clustering structure is represented by the distance between the points. reachability distance and core distance are two distances used in storing the clustering order. At point o core distance is CD (o) and defined as:

OPTICS
Where, the distance to the given nearest neighbor is denoted with . The core distance is undefined when number of other points are sufficiently isolated by the point o within radius is less than . If as the core distance otherwise. At point , reachability distance is and defined as: Where, neighborhood is , in above equation all distances are referred as Minkowski distance. The first processed points are having the smallest reachable distances i.e. high density. The data points at OPTICS output are sorted according to their reachability distance and processed order.

Auto Encoder (AE)
The input is reconstructed from output by the training of auto encoder (AE) which is a symmetrical artificial neural network. Two parts are existed in AE: one is encoder in which the features (bottlenecks) are mapped with input and another one is decoder which is reconstructs the input from features. The reconstructed is having the features same as the input x and this is possible by neural network parameters. Given a set of p input data vectors, , an input vector feedforwards to a bottleneck vector , where activation function is denoted with σ , bias vector is denoted by b and weight matrix denoted by W. Weight and bias form the parameter set θ = {W, b}. By using the vector is reconstructed from bottlenecks with the same dimensions of input vector. The decoder weights are considered from the transpose matrix of weights of encoder because of its symmetrical structure, i.e., . Now AE is back-propagated for parameters optimization and loss function minimization . The input space size should have more dimensionality than bottleneck space size. The input is directly copied as output when the input space is smaller than the hidden layer. Sparse bottleneck space is an alternative way instead of bottleneck neurons reduction. More hidden units are included in Sparse AE than inputs but at once hidden units are in small number. Regularizer is implements the sparsity constraints. After considering sparse space, loss function becomes: .

Random Forest
Huge collection of decision trees which are decorrelated are used the random forest algorithm for classification. The structure of decision tree is same as flowchart; a decision attribute is represented as internal node. As two branches every point is divided and decision result is represented with every branch, decision result class label is denoted with each leaf node. Branch split can exists in many positions in general. Gini Impurity is a measure for split quality, and defined as: Where, positive probability is , and negative probability is for the test. Separation effect is better when Gini impurity is small. Random forest takes the input as training data matrix S, feature number is denoted by n, data samples with p and for each data point class label with . Matrix S is defined as: The row of the matrix S is shuffled for creating M subset matrices randomly with same size of input matrix S. Therefore these obtained subsets are named as bootstrapped datasets. Then create an each subset decision tree now. Random forest accuracy is measured with the difference between original set and each subset decision trees. This accuracy is used in parameters fine tuning.

RESULTS
In experiment results, proposed security framework for Iot performance evaluation is divided into two parts: OPTICS unsupervised device type identification and Random forest supervised dimension reduction for anomaly detection.

Performance of Device Type Identification
Suricata named open-source IDS and IPS tool is used for capturing the network traffic in represented embedded sensor. From 12 production lines network packets are collected which are belongs to target factory. Controller's events, robotic arm events and computer events are three classes which are from labeling the data manually with 21,447 total records https://doi.org/10.1051/e3sconf/202130 E3S Web of Conferences 309, 01061 (2021) ICMED 2021 901061 and it is used as device identification first dataset. Several types of traffic are represented with events in Suricata and explained with different fields or other protocols. 110-dimensional features were used. According to our experimental results with ordering points used in identification of clustering structure, the test accuracy is 98.6%. Therefore device identification uses the OPTICS for obtaining good efficiency. Then device dimensions are reduced by using feature selection methods which are used for device identification model performance is improved as 98.6%. Top 10 important features are obtained after applying AE for feature selection and device identification confusion matrix is represented in below Table 1. As shown in Figure 2, we can see an improved accuracy of 97.8% with the feature selection method, that is improved by 4.8% compared to without feature selection case and achieved more enhanced accuracy rate of 98.6% device type identification with the proposed OPTICS feature selection method. Therefore, feature selection methods effectiveness is used in classification performance improvement.

Performance of Anomaly Detection method
There is no availability of real IoT attacking data so malicious network patterns are simulated first which are having different behaviors than normal device behavior. Normal packets and attacking packets are collected in two sites which are from devices. Anomaly detection using two datasets statistics are shown in below Table 2.

Fig. 3: PERFORMANCE COMPARISON OF ANOMALY DETECTION
The anomaly detection comparative performance is represented in Fig. 3 and Fig. 4 in terms of accuracy, F1-score and precision recall respectively. Best performance is observed when auto encoders are used for the reference dataset with achieved F1 score as 91.28% and 95.02% of accuracy. Learning normal behaviors effectiveness is observed clearly from above statements by using auto encoders. Therefore anomaly detection at any experiments are uses the auto encoders.

CONCLUSION
Security framework connection assistance for IOT device secured data communication was analyzed in this paper. The device performance is detected by the proposed architecture in first step and then it helps in controlling the IoT devices when there is a requirement. Unauthorized device accessing is eliminated by providing a secure IoT networks and any irregularities are detected by using network traffic in proposed hybrid learning framework. A multi-level security support architecture which combines clustering technique with deep neural networks for designing the resource oriented IoT devices with high security and these are enabling both the seen and unseen device classification. The datasets dimensions are reduced by considering the technique as auto encoder. In between accuracy and overhead classification good balancing is achieved.