An Air Pollution Prediction Scheme Using Long Short Term Memory Neural Network Model

In order to establish countermeasures for air pollution, it is first necessary to accurately grasp the air pollution state and predict the cause and change trend of the pollution situation. Due to the continuously strengthening regulations on the emissions of environmental pollutants, the forecasting and management of nitrogen oxides (NOx) emissions is receiving a lot of attention from industrial sites. In this study, a model for predicting ni trogen oxi de emissions ba sed on artificial i ntelligence w as propo sed. The propos ed m odel includes everything from data preprocessing to learning and evaluation of the model, and used a Long ShortTerm Memory (LSTM) neural network model, one of the recurrent neural networks, to predict NOx emissions with ti me-series ch aracteristics. The op timized L STM m odel showed m ore than 93% N Ox emissions prediction accuracy for both the training data and the evaluation data. The model proposed in this study is expected to be applied to the development of a model for predicting the emission of va rious air pollutants with time-series characteristics.


Introduction
With the development of the modern industry, which uses the fossil fuels as basic energy, an air pollution caused by air pollutants emitted during combustion of fossil fuels has worsened. In order to detect the degree of air pollution and establish improvement measures, it is important to detect the a mount of a ir pol lution a nd predict t he de gree of pollution in the future. Currently, the emissions of the air pollutants a re collected i n many c ountries, and s tandard pollutants such as SO2, NO2, O3, PM10, PM2.5, and CO have been measured at air measurement sites. In particular, nitrogen oxide (NOx) is a term collectively referred to as nitrogen oxides such as nitrogen monoxide (NO), nitrogen dioxide ( NO2) a nd dinitrogen pentoxide (N2O5), a nd i s mainly generated in the p rocess o f bu rning f uel in automobiles or industrial sites. When NOx is released into the a ir, i t ge nerates fine dust a nd ozone t hrough t he photochemical reaction o f s unlight, so i t h as been designated a s the s ubject of e nvironmental r egulations internationally [ 1,2]. As t he a ir p ollution p roblem i s important, studies ha ve been c onducted t o f orecast t he degree of pollution through various research methods.
Traditional t ime s eries f orecasting models i nclude Auto-Regressive (AR), Moving Average (MA), and Auto-Regressive I ntegrated M oving A verage ( ARIMA). In particular, the ARIMA model outperformed other time series models such a s AR a nd MA i n t he pr ediction a ccuracy [ 3,4]. Kohzadi e t a l. c ompared t he A RIMA model with a n artificial neural network (ANN), and Ho and Aladag et al. compared t he pe rformance of t he ARIMA m odel a nd Recurrent Neural Network (RNN) [5,6]. However, since these models assume the relationship between time series data as a linear relationship, there is a disadvantage of poor performance when applied to real data having nonlinearity [7,8]. McKendry a nd Z hao c ompared t he m ultilayer perceptron a nd t he r egression a lgorithms, a nd reported that t he s ingle r egression a lgorithm s howed better performance, but di d n ot show g ood pr edictive performance [9,10]. As deep learning has recently attracted a ttention, v arious networks and a lgorithms a re being utilized and developed. Among them, LSTM neural network proposed by Hochreiter et al. was developed for the analysis of text or speech as a kind of RNN [11]. The main characteristic of the LSTM neural network is that it helps predict future va lues by pr operly pr eserving da ta from the past in the LSTM neural network through input, forget and output gates. Due to this structural advantage, this method is a very widely used deep learning method for predicting of the time series data. RNN models have high pr edictive performance for n onlinear da ta and provide c ompetitive r esults c ompared t o t raditional t ime series prediction models [12][13][14]. In this work, in order to consider the time dependence of NOx emissions, a model for predicting of the NOx emissions was developed using a r ecurrent neural ne twork. The p roposed model c overs from data preprocessing to RNN model training, optimal parameters an d e valuation. The paper i s presented a s follows. In Section 2, the circulatory neural network and LSTM model used to handle time series data are described. In Section 3, the data used for verification were introduced, and p rediction p erformance was c ompared b y s electing neural ne twork pa rameters a nd c omparing p redictions through e mpirical da ta. Finally, t he c onclusions of t his paper are summarized and further studies are discussed.

The Recurrent Neural Network
A r ecurrent neural network i s a kind of a rtificial n eural network, and is a neural network used to manipulate time series data. Figure 1 shows the structure of the recurrent neural network. When the structure on the left is unfolded according to the passage of time, it is unfolded in the shape of the right side of the arrow. In figure 1, x represents the input layer, O represents the output layer, and h represents the hidden layer. And W, U, and v represent the weight of the connection, respectively. Looking at the process of the recurrent neural network, the input enters the hidden layer through the input layer over time. The output value of the previous hidden layer goes into the input of the current hidden layer, and the output value is out put based on t hese t wo va lues. This p rocess i s repeated until all inputs are entered.

LSTM Neural Network
The LSTM network is a special kind of recurrent neural network, a nd i s a ne twork t hat us es L STM bl ocks a s a hidden layer. The LSTM block is a structure in which the state value of the previous block enters the current block and is involved in the operation, like the structure of the recurrent n eural network. A t ypical L STM block is configured mainly by three gates: forget gate, input gate, and output gate. figure 2 shows the structure of an LSTM block. Now w ith figure 2, t he r ole of each g ate w ill b e described in order [15]. The first step is the forgetting door. The hidden layer outputs of the current and previous inputs go through this layer, denoted by σ. Choose the amount of information that will go through this layer. σ represents the sigmoid activity function. The mathematical expression of this function is as follows [16][17][18].
The value obtained through this function is between 0 and 1. In the output value, 1 means to pass all the input values of the function, and 0 in the output value means to delete all input values. The output of the forget gate is as follows.
Here, W denotes the weight of the layer, and b denotes the bias value of the layer. The second is the input gate. This is the process of selecting what value to add to the value passed through the forget gate. The candidates for the newly added value are determined by the output of the previous hidden layer and the output of the tanh activation function layer of the current input. The output of the tanh function has a value between -1 and 1. The candidate of the new value and the value of the cell state are as follows. Based on these two values, we need to update the state value of the current block. The operation can be expressed as an equation as follows.
Finally, the output gate determines the output of this block. First, the sigmoid function determines the state of the new output. Then, the block state value calculated in the previous step is passed through the tanh layer to make a n ew output can didate ag ain. The product of these t wo values exits the block as the output of the next hidden layer and the current hidden layer. This can be expressed as an equation as follows.
Through this process, the LSTM block receives input, processes the block status value, and sends out an output.

Supervised Learning
Methods of machine learning include supervised learning, unsupervised learning, and reinforcement learning. Since supervised l earning i s us ed i n t his p aper, s upervised learning is b riefly described. Supervised l earning i s a learning method that informs the input data and the correct answer to determine the learning direction when training a model. The model only accepts inputs and compares the corresponding outputs. Based on this error value, the bias value and weight in the model are adjusted. Schematic of this process is shown in figure 3. At this time, the method of expressing the error value varies depending on the loss function used. And based on this l oss f unction, t he o ptimization a lgorithm i s t he algorithm that allows the various coefficients in the model to quickly find the optimal value. There are various loss functions and optimization algorithms in machine learning. The l earning speed a nd pe rformance of t he model a re different depending on which function and algorithm are selected. The loss function and optimization algorithm of the model used in this paper are Mean Squared Error (MSE) function and Adam optimizer, respectively [19].

Experimental Setup
In the work, an experiment was conducted on a prediction method based on and deep learning through empirical data. As e mpirical d ata, Air P ollution: Re al-time A ir Q uality Index (AQI) from the 2005 to 2019, yearbooks provided at http://aqicn.org/city/london/ was u sed. T he data u sed are shown in figure 4. When the d istribution d ifference b etween t he data i s large, the MinMax Scalar was used for all data except the predicted value to prevent the load from being trained. The range of the data was matched with a value between 0 and 1. I n order t o o btain a good predictive m odel, i t i s important to set the values of the parameters of the LSTM network. In L STM network, t here a re m any hyperparameters such as l earning rate, o ptimization method, dropout rate, epoch, and number of hidden. If the learning rate is low, the speed of learning is slow, and if the learning rate is large, learning may not be performed and vibration may occur. It is usually specified as a value between 0.0001 and 0.1. In this paper, it was set as 0.01. As an optimization method, the ADAM method was used, and the dropout ratio was set as 0.5.
In  f) The model is trained using the selected number of hidden units a nd t he e poch, and p redictions a re made based on this.
In the above process, the test data are not used to train or verify the model. In other words, the test data is only used when comparing it with the predicted value.
The training of LSTM was implemented on CPU: Intel Core i5-8400H 2.5GHz, a nd GPU: GTX1660 with

Experimental Result
Based o n t he training da ta, various learning p arameters and model structures were tested. For the evaluation of the model, the Root mean square error (RMSE) of Eq. (8) was used and is as follows. Here, i y is the actual value of the i-th sample, ˆi y is the predicted value of the i-th sample, and n is the number of samples. As mentioned above, the learning parameters adjusted through the experiment are the number of LSTM layers, the number of LSTM cells, epoch, and batch size. First, i t i s t he n umber of L STM l ayers. In t his work, experiments were conducted on one-, two-, and three-layer structures. One a nd t wo l ayer structure s howed performance that could not keep up with detailed data, but three layer result was satisfactory. Next is the number of LSTM cells. This parameter determines how many LSTM blocks are set in the LSTM hidden layer.
In the work, the optimal value was set to 64 through an experiment. Even i f t he same data i s learned, the performance of learning tends to increase when iterative learning is performed. However, raising this v alue c an lead to overfitting. So, in this study, 100 epochs were set. The batch size is a parameter that determines how many datasets t o b e t rained. If t his v alue i s t oo l arge, i t was judged t hat i t w ould be difficult t o f ollow detailed d ata, and this value was set as 32 through an experiment. The training was c onducted t hrough the s et d ata a nd parameters, an d t he output of t he t rained m odel w as checked by inputting the test data. The results are shown in figure 5.  The acc uracy o f t he NOx prediction m odel was confirmed through the plot in figure 6, and figure 6 shows that t he l earning m odel w ith a n acc uracy o f 0.93 was secured as a result of learning with the training data. From these r esults, it w as co nfirmed t hat a N Ox prediction model with high accuracy was developed for not only the training data but also the evaluation data.

CONCLUSION
In t his work, t he co ncepts o f d eep l earning, ci rculatory neural networks, and LSTM in artificial neural networks and supervised l earning a s a l earning method were investigated. In a ddition, a model f or p redicting NOx emissions was presented using L STM, a n artificial intelligence t echnique, a nd i ts v alidity w as v erified b y applying i t t o e mpirical d ata. The L STM b ased NOx prediction model was trained using the training data, and the trained NOx prediction model was evaluated using the evaluation data. As a result, it was possible to secure more than 9 3% p rediction a ccuracy i n t he t raining data a nd evaluation data, and confirmed the validity of the proposed model. Based on the results of this study, it is judged that the research model can be applied not only to NOx but also to t he a rtificial i ntelligence-based emission p rediction model o f o ther air pollutants w ith time-series characteristics.